Page MenuHomeSoftware Heritage

No OneTemporary

This file is larger than 256 KB, so syntax highlighting was skipped.
This document is not UTF8. It was detected as ISO-8859-1 (Latin 1) and converted to UTF8 for display.
diff --git a/third_party/law-2.5.1/COPYING b/third_party/law-2.5.1/COPYING
new file mode 100644
index 0000000..94a9ed0
--- /dev/null
+++ b/third_party/law-2.5.1/COPYING
@@ -0,0 +1,674 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+ The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works. By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users. We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors. You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+ To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights. Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received. You must make sure that they, too, receive
+or can get the source code. And you must show them these terms so they
+know their rights.
+
+ Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+ For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software. For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+ Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so. This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software. The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable. Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products. If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+ Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary. To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 0. Definitions.
+
+ "This License" refers to version 3 of the GNU General Public License.
+
+ "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+ "The Program" refers to any copyrightable work licensed under this
+License. Each licensee is addressed as "you". "Licensees" and
+"recipients" may be individuals or organizations.
+
+ To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy. The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+ A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+ To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy. Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+ To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies. Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+ An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License. If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+ 1. Source Code.
+
+ The "source code" for a work means the preferred form of the work
+for making modifications to it. "Object code" means any non-source
+form of a work.
+
+ A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+ The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form. A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+ The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities. However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work. For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+ The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+ The Corresponding Source for a work in source code form is that
+same work.
+
+ 2. Basic Permissions.
+
+ All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met. This License explicitly affirms your unlimited
+permission to run the unmodified Program. The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work. This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+ You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force. You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright. Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+ Conveying under any other circumstances is permitted solely under
+the conditions stated below. Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+ No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+ When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+ 4. Conveying Verbatim Copies.
+
+ You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+ You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+ 5. Conveying Modified Source Versions.
+
+ You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+ a) The work must carry prominent notices stating that you modified
+ it, and giving a relevant date.
+
+ b) The work must carry prominent notices stating that it is
+ released under this License and any conditions added under section
+ 7. This requirement modifies the requirement in section 4 to
+ "keep intact all notices".
+
+ c) You must license the entire work, as a whole, under this
+ License to anyone who comes into possession of a copy. This
+ License will therefore apply, along with any applicable section 7
+ additional terms, to the whole of the work, and all its parts,
+ regardless of how they are packaged. This License gives no
+ permission to license the work in any other way, but it does not
+ invalidate such permission if you have separately received it.
+
+ d) If the work has interactive user interfaces, each must display
+ Appropriate Legal Notices; however, if the Program has interactive
+ interfaces that do not display Appropriate Legal Notices, your
+ work need not make them do so.
+
+ A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit. Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+ 6. Conveying Non-Source Forms.
+
+ You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+ a) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by the
+ Corresponding Source fixed on a durable physical medium
+ customarily used for software interchange.
+
+ b) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by a
+ written offer, valid for at least three years and valid for as
+ long as you offer spare parts or customer support for that product
+ model, to give anyone who possesses the object code either (1) a
+ copy of the Corresponding Source for all the software in the
+ product that is covered by this License, on a durable physical
+ medium customarily used for software interchange, for a price no
+ more than your reasonable cost of physically performing this
+ conveying of source, or (2) access to copy the
+ Corresponding Source from a network server at no charge.
+
+ c) Convey individual copies of the object code with a copy of the
+ written offer to provide the Corresponding Source. This
+ alternative is allowed only occasionally and noncommercially, and
+ only if you received the object code with such an offer, in accord
+ with subsection 6b.
+
+ d) Convey the object code by offering access from a designated
+ place (gratis or for a charge), and offer equivalent access to the
+ Corresponding Source in the same way through the same place at no
+ further charge. You need not require recipients to copy the
+ Corresponding Source along with the object code. If the place to
+ copy the object code is a network server, the Corresponding Source
+ may be on a different server (operated by you or a third party)
+ that supports equivalent copying facilities, provided you maintain
+ clear directions next to the object code saying where to find the
+ Corresponding Source. Regardless of what server hosts the
+ Corresponding Source, you remain obligated to ensure that it is
+ available for as long as needed to satisfy these requirements.
+
+ e) Convey the object code using peer-to-peer transmission, provided
+ you inform other peers where the object code and Corresponding
+ Source of the work are being offered to the general public at no
+ charge under subsection 6d.
+
+ A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+ A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling. In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage. For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product. A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+ "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source. The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+ If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information. But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+ The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed. Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+ Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+ 7. Additional Terms.
+
+ "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law. If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+ When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it. (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.) You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+ Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+ a) Disclaiming warranty or limiting liability differently from the
+ terms of sections 15 and 16 of this License; or
+
+ b) Requiring preservation of specified reasonable legal notices or
+ author attributions in that material or in the Appropriate Legal
+ Notices displayed by works containing it; or
+
+ c) Prohibiting misrepresentation of the origin of that material, or
+ requiring that modified versions of such material be marked in
+ reasonable ways as different from the original version; or
+
+ d) Limiting the use for publicity purposes of names of licensors or
+ authors of the material; or
+
+ e) Declining to grant rights under trademark law for use of some
+ trade names, trademarks, or service marks; or
+
+ f) Requiring indemnification of licensors and authors of that
+ material by anyone who conveys the material (or modified versions of
+ it) with contractual assumptions of liability to the recipient, for
+ any liability that these contractual assumptions directly impose on
+ those licensors and authors.
+
+ All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10. If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term. If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+ If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+ Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+ 8. Termination.
+
+ You may not propagate or modify a covered work except as expressly
+provided under this License. Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+ However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+ Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License. If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+ 9. Acceptance Not Required for Having Copies.
+
+ You are not required to accept this License in order to receive or
+run a copy of the Program. Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance. However,
+nothing other than this License grants you permission to propagate or
+modify any covered work. These actions infringe copyright if you do
+not accept this License. Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+ 10. Automatic Licensing of Downstream Recipients.
+
+ Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License. You are not responsible
+for enforcing compliance by third parties with this License.
+
+ An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations. If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+ You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License. For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+ 11. Patents.
+
+ A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based. The
+work thus licensed is called the contributor's "contributor version".
+
+ A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version. For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+ In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement). To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+ If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients. "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+ If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+ A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License. You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+ Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+ 12. No Surrender of Others' Freedom.
+
+ If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all. For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+ 13. Use with the GNU Affero General Public License.
+
+ Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work. The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+ 14. Revised Versions of this License.
+
+ The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation. If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+ If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+ Later license versions may give you additional or different
+permissions. However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+ 15. Disclaimer of Warranty.
+
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. Limitation of Liability.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+ 17. Interpretation of Sections 15 and 16.
+
+ If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+ If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+ <program> Copyright (C) <year> <name of author>
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+ You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+ The GNU General Public License does not permit incorporating your program
+into proprietary programs. If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License. But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/third_party/law-2.5.1/COPYING.LESSER b/third_party/law-2.5.1/COPYING.LESSER
new file mode 100644
index 0000000..65c5ca8
--- /dev/null
+++ b/third_party/law-2.5.1/COPYING.LESSER
@@ -0,0 +1,165 @@
+ GNU LESSER GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+
+ This version of the GNU Lesser General Public License incorporates
+the terms and conditions of version 3 of the GNU General Public
+License, supplemented by the additional permissions listed below.
+
+ 0. Additional Definitions.
+
+ As used herein, "this License" refers to version 3 of the GNU Lesser
+General Public License, and the "GNU GPL" refers to version 3 of the GNU
+General Public License.
+
+ "The Library" refers to a covered work governed by this License,
+other than an Application or a Combined Work as defined below.
+
+ An "Application" is any work that makes use of an interface provided
+by the Library, but which is not otherwise based on the Library.
+Defining a subclass of a class defined by the Library is deemed a mode
+of using an interface provided by the Library.
+
+ A "Combined Work" is a work produced by combining or linking an
+Application with the Library. The particular version of the Library
+with which the Combined Work was made is also called the "Linked
+Version".
+
+ The "Minimal Corresponding Source" for a Combined Work means the
+Corresponding Source for the Combined Work, excluding any source code
+for portions of the Combined Work that, considered in isolation, are
+based on the Application, and not on the Linked Version.
+
+ The "Corresponding Application Code" for a Combined Work means the
+object code and/or source code for the Application, including any data
+and utility programs needed for reproducing the Combined Work from the
+Application, but excluding the System Libraries of the Combined Work.
+
+ 1. Exception to Section 3 of the GNU GPL.
+
+ You may convey a covered work under sections 3 and 4 of this License
+without being bound by section 3 of the GNU GPL.
+
+ 2. Conveying Modified Versions.
+
+ If you modify a copy of the Library, and, in your modifications, a
+facility refers to a function or data to be supplied by an Application
+that uses the facility (other than as an argument passed when the
+facility is invoked), then you may convey a copy of the modified
+version:
+
+ a) under this License, provided that you make a good faith effort to
+ ensure that, in the event an Application does not supply the
+ function or data, the facility still operates, and performs
+ whatever part of its purpose remains meaningful, or
+
+ b) under the GNU GPL, with none of the additional permissions of
+ this License applicable to that copy.
+
+ 3. Object Code Incorporating Material from Library Header Files.
+
+ The object code form of an Application may incorporate material from
+a header file that is part of the Library. You may convey such object
+code under terms of your choice, provided that, if the incorporated
+material is not limited to numerical parameters, data structure
+layouts and accessors, or small macros, inline functions and templates
+(ten or fewer lines in length), you do both of the following:
+
+ a) Give prominent notice with each copy of the object code that the
+ Library is used in it and that the Library and its use are
+ covered by this License.
+
+ b) Accompany the object code with a copy of the GNU GPL and this license
+ document.
+
+ 4. Combined Works.
+
+ You may convey a Combined Work under terms of your choice that,
+taken together, effectively do not restrict modification of the
+portions of the Library contained in the Combined Work and reverse
+engineering for debugging such modifications, if you also do each of
+the following:
+
+ a) Give prominent notice with each copy of the Combined Work that
+ the Library is used in it and that the Library and its use are
+ covered by this License.
+
+ b) Accompany the Combined Work with a copy of the GNU GPL and this license
+ document.
+
+ c) For a Combined Work that displays copyright notices during
+ execution, include the copyright notice for the Library among
+ these notices, as well as a reference directing the user to the
+ copies of the GNU GPL and this license document.
+
+ d) Do one of the following:
+
+ 0) Convey the Minimal Corresponding Source under the terms of this
+ License, and the Corresponding Application Code in a form
+ suitable for, and under terms that permit, the user to
+ recombine or relink the Application with a modified version of
+ the Linked Version to produce a modified Combined Work, in the
+ manner specified by section 6 of the GNU GPL for conveying
+ Corresponding Source.
+
+ 1) Use a suitable shared library mechanism for linking with the
+ Library. A suitable mechanism is one that (a) uses at run time
+ a copy of the Library already present on the user's computer
+ system, and (b) will operate properly with a modified version
+ of the Library that is interface-compatible with the Linked
+ Version.
+
+ e) Provide Installation Information, but only if you would otherwise
+ be required to provide such information under section 6 of the
+ GNU GPL, and only to the extent that such information is
+ necessary to install and execute a modified version of the
+ Combined Work produced by recombining or relinking the
+ Application with a modified version of the Linked Version. (If
+ you use option 4d0, the Installation Information must accompany
+ the Minimal Corresponding Source and Corresponding Application
+ Code. If you use option 4d1, you must provide the Installation
+ Information in the manner specified by section 6 of the GNU GPL
+ for conveying Corresponding Source.)
+
+ 5. Combined Libraries.
+
+ You may place library facilities that are a work based on the
+Library side by side in a single library together with other library
+facilities that are not Applications and are not covered by this
+License, and convey such a combined library under terms of your
+choice, if you do both of the following:
+
+ a) Accompany the combined library with a copy of the same work based
+ on the Library, uncombined with any other library facilities,
+ conveyed under the terms of this License.
+
+ b) Give prominent notice with the combined library that part of it
+ is a work based on the Library, and explaining where to find the
+ accompanying uncombined form of the same work.
+
+ 6. Revised Versions of the GNU Lesser General Public License.
+
+ The Free Software Foundation may publish revised and/or new versions
+of the GNU Lesser General Public License from time to time. Such new
+versions will be similar in spirit to the present version, but may
+differ in detail to address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Library as you received it specifies that a certain numbered version
+of the GNU Lesser General Public License "or any later version"
+applies to it, you have the option of following the terms and
+conditions either of that published version or of any later version
+published by the Free Software Foundation. If the Library as you
+received it does not specify a version number of the GNU Lesser
+General Public License, you may choose any version of the GNU Lesser
+General Public License ever published by the Free Software Foundation.
+
+ If the Library as you received it specifies that a proxy can decide
+whether future versions of the GNU Lesser General Public License shall
+apply, that proxy's public statement of acceptance of any version is
+permanent authorization for you to choose that version for the
+Library.
diff --git a/third_party/law-2.5.1/build.properties b/third_party/law-2.5.1/build.properties
new file mode 100644
index 0000000..58d34bc
--- /dev/null
+++ b/third_party/law-2.5.1/build.properties
@@ -0,0 +1,38 @@
+# Miscellany
+
+version=2.5.1
+
+build=build
+dist=dist
+docs=docs
+outcomes=outcomes
+reports=reports
+src=src
+test=test
+
+# Whenever it necessary to add new jar to the project, the following data must be updated:
+
+# 1) the list of local javadocs
+# 2) the list of remote javadocs
+# 3) the list of javadocs referenced by the javadoc target
+# 4) the list of jars in the fingbugs target
+
+commons-configuration.apiurl=http://commons.apache.org/configuration/apidocs/
+commons-io.apiurl=http://commons.apache.org/proper/commons-io/javadocs/api-release/
+commons-lang.apiurl=http://commons.apache.org/proper/commons-lang/javadocs/api-release/
+commons-collections.apiurl=http://commons.apache.org/proper/commons-collections/javadocs/api-release/
+dsiutils.apiurl=http://dsiutils.di.unimi.it/docs/
+fastutil.apiurl=http://fastutil.di.unimi.it/docs/
+j2se.apiurl=http://download.oracle.com/javase/6/docs/api/
+jsap.apiurl=http://www.martiansoftware.com/jsap/doc/javadoc/
+junit.apiurl=http://junit.sourceforge.net/javadoc_40/
+log4j.apiurl=http://logging.apache.org/log4j/docs/api/
+servletapi5.apiurl=http://jakarta.apache.org/tomcat/tomcat-5.5-doc/servletapi/
+sux4j.apiurl=http://sux.di.unimi.it/docs/
+slf4j.apiurl=http://www.slf4j.org/apidocs/
+mg4j.apiurl=http://mg4j.di.unimi.it/docs/
+mg4j-big.apiurl=http://mg4j.di.unimi.it/docs-big/
+webgraph.apiurl=http://webgraph.di.unimi.it/docs/
+webgraph-big.apiurl=http://webgraph.di.unimi.it/docs-big/
+velocity-tools.apiurl=http://velocity.apache.org/tools/releases/1.4/javadoc/
+velocity.apiurl=http://jakarta.apache.org/velocity/docs/api/
diff --git a/third_party/law-2.5.1/build.xml b/third_party/law-2.5.1/build.xml
new file mode 100644
index 0000000..5b350ae
--- /dev/null
+++ b/third_party/law-2.5.1/build.xml
@@ -0,0 +1,234 @@
+<?xml version="1.0"?>
+
+<!--
+
+ If local is defined (e.g., -Dlocal=) Javadoc documentation
+ will be linked to the local versions in ${javadoc.base} that
+ is taken from the environment varabile $JAVADOC_HOME, if set,
+ or defaults to /usr/share/javadoc
+
+ If testonly is defined, junit will run only on that package.
+
+-->
+
+<project name="law" default="jar" basedir="." xmlns:ivy="antlib:org.apache.ivy.ant" xmlns:jacoco="antlib:org.jacoco.ant">
+
+ <!-- === using ivy to setup the classpath === [ -->
+
+ <property name="build.sysclasspath" value="ignore"/>
+ <property name="jars.dir" value="${basedir}/jars"/>
+
+ <property environment="env"/>
+ <condition property="ivy.settings.file" value="${env.LOCAL_IVY_SETTINGS}"><isset property="env.LOCAL_IVY_SETTINGS"/></condition>
+
+ <target name="ivy-setupjars" description="Downloads dependencies with ivy and generate report">
+ <ivy:retrieve symlink="true" sync="true" pattern="${jars.dir}/[conf]/[artifact](-[classifier]).[ext]"/>
+ </target>
+
+ <target name="ivy-clean" description="Clean ivy cache, jars dir and ivy installation">
+ <!-- this is very aggressive <ivy:cleancache /> -->
+ <delete dir="${jars.dir}"/>
+ </target>
+
+ <target name="ivy-report" depends='ivy-setupjars' description="Compute the resolution report (saving it in ${reports}/ivy)">
+ <ivy:report todir="${reports}/ivy"/>
+ </target>
+
+ <path id="compile.classpath">
+ <fileset dir="${jars.dir}/compile" erroronmissingdir="false"/>
+ </path>
+ <path id="test.classpath">
+ <fileset dir="${jars.dir}/test" erroronmissingdir="false"/>
+ </path>
+ <path id="project.classpath">
+ <fileset dir="${jars.dir}/runtime" erroronmissingdir="false"/>
+ </path>
+
+ <!-- ] === using ivy to setup the classpath === -->
+
+ <!-- === defining ivy and jacoco tasks === [ -->
+
+ <taskdef resource="org/apache/ivy/ant/antlib.xml" uri="antlib:org.apache.ivy.ant" onerror="report"/>
+ <taskdef uri="antlib:org.jacoco.ant" resource="org/jacoco/ant/antlib.xml" classpathref="test.classpath" onerror="report"/>
+
+ <!-- ] === defining ivy and jacoco tasks === -->
+
+ <!-- === getting additional properties from file, fixing references for local/remote javadoc === [ -->
+
+ <property file="build.properties"/>
+
+ <property name="jarfile" value="law-${version}.jar"/>
+
+ <!-- ] === getting additional properties from file, fixing references for local/remote javadoc === -->
+
+ <!-- === manual dependencies === [ -->
+
+ <condition property="requires-javacc">
+ <available file="${src}/it/unimi/dsi/law/warc/filters/parser/FilterParser.jj" type="file"/>
+ </condition>
+
+ <!-- ] === manual dependencies === -->
+
+ <!-- ============= Generic targets. ============ -->
+
+ <target name="all" depends="jar,javadoc"/>
+
+ <target name="init">
+ <available property="ivy.set.up" file="${jars.dir}"/>
+ <fail message="It appears that Ivy has not been set up properly. Please run &quot;ant ivy-setupjars&quot; and try again." unless="ivy.set.up"/>
+ <mkdir dir="${build}"/>
+ <mkdir dir="${dist}"/>
+ <mkdir dir="${docs}"/>
+ <mkdir dir="${docs}/it/unimi/dsi/law/warc/filters/parser/"/>
+ <mkdir dir="${outcomes}"/>
+ <mkdir dir="${reports}"/>
+ </target>
+
+ <target name="clean">
+ <delete dir="${build}"/>
+ <delete dir="${dist}"/>
+ <delete dir="${docs}"/>
+ <delete dir="${outcomes}"/>
+ <delete dir="${reports}"/>
+ <delete>
+ <fileset dir="." includes="law-*.jar"/>
+ <fileset dir="${src}">
+ <containsregexp expression="Generated By:JavaCC: Do not edit this line"/>
+ </fileset>
+ </delete>
+ </target>
+
+ <target name="compile" depends="init,javacc" description="Compile sources (without tests)">
+ <javac srcdir="${src}" debug="on" optimize="on" destdir="${build}" encoding="UTF-8" source="1.9" classpathref="compile.classpath"/>
+ </target>
+
+ <target name="compile-tests" depends="init,javacc,compile" description="Compile sources (tests)">
+ <javac srcdir="${src}:${test}" debug="on" optimize="on" destdir="${build}" encoding="UTF-8" source="1.9" classpathref="test.classpath"/>
+ </target>
+
+ <target name="jar" depends="compile" description="Creates jar (without tests)">
+ <jar basedir="${build}" jarfile="${jarfile}"/>
+ </target>
+
+ <target name="javadoc" description="Generates documentation" depends="init,javacc-docs">
+ <javadoc destdir="${docs}"
+ additionalparam="-Xdoclint:none"
+ encoding="UTF-8"
+ docencoding="UTF-8"
+ classpathref="project.classpath"
+ sourcepath="${src}"
+ packagenames="it.unimi.dsi.law.*"
+ protected="on"
+ overview="${src}/overview.html"
+ source="1.8"
+ windowtitle="LAW ${version}">
+ <link href="${j2se.apiurl}"/>
+ <link href="${junit.apiurl}"/>
+ <link href="${fastutil.apiurl}"/>
+ <link href="${mg4j.apiurl}"/>
+ <link href="${mg4j-big.apiurl}"/>
+ <link href="${dsiutils.apiurl}"/>
+ <link href="${sux4j.apiurl}"/>
+ <link href="${slf4j.apiurl}"/>
+ <link href="${webgraph.apiurl}"/>
+ <link href="${webgraph-big.apiurl}"/>
+ <link href="${commons-lang.apiurl}"/>
+ <link href="${commons-io.apiurl}"/>
+ <link href="${commons-collections.apiurl}"/>
+ <link href="${commons-configuration.apiurl}"/>
+ <link href="${jsap.apiurl}"/>
+ </javadoc>
+ </target>
+
+ <!-- javacc stuff -->
+
+ <target name="javacc" depends="init" if="requires-javacc">
+ <javacc target="${src}/it/unimi/dsi/law/warc/filters/parser/FilterParser.jj" javacchome="${jars.dir}/compile"/>
+ </target>
+
+ <target name="javacc-docs" depends="init,javacc" if="requires-javacc">
+ <jjdoc target="${src}/it/unimi/dsi/law/warc/filters/parser/FilterParser.jj" javacchome="${jars.dir}/compile"
+ outputfile="${docs}/it/unimi/dsi/law/warc/filters/parser/FilterParser.doc.html"/>
+ </target>
+
+ <!-- junit / jacoco stuff -->
+
+ <target name="-testony-present" if="testonly">
+ <echo>Testing only package: ${testonly}</echo>
+ <property name="testonlydir" value="it/unimi/dsi/law/${testonly}/**"/>
+ <property name="coverage.output" value="jacoco-${testonly}.exec"/>
+ </target>
+ <target name="-testony-absent" unless="testonly">
+ <echo>Testing all pagkages (specify -Dtestonly=X to restrict tests to the specific package it.unimi.dsi.law.X)</echo>
+ <property name="testonlydir" value="**"/>
+ <property name="coverage.output" value="jacoco-ALL.exec"/>
+ </target>
+ <target name="generate-outcomes" depends="init,-testony-present,-testony-absent,compile-tests" description="Runs JUnit tests">
+ <jacoco:coverage destfile="${outcomes}/${coverage.output}">
+ <junit fork="yes" forkmode="once" printsummary="true" outputtoformatters="false">
+ <jvmarg value="-Xmx1G"/>
+ <jvmarg value="-Dit.unimi.dsi.law.data=data"/>
+ <assertions>
+ <enable/>
+ </assertions>
+ <classpath>
+ <path refid="test.classpath" />
+ <pathelement location="${build}"/>
+ </classpath>
+ <formatter type="xml"/>
+ <formatter type="plain"/>
+ <batchtest todir="${outcomes}">
+ <fileset dir="${test}">
+ <containsregexp expression="@Test"/>
+ <include name="${testonlydir}/*.java"/>
+ </fileset>
+ </batchtest>
+ </junit>
+ </jacoco:coverage>
+ </target>
+ <target name="merge-outcomes" depends="init" description="Merges junit and jacoco outcomes (used by Jenkins)">
+ <delete file="${outcomes}/junit.xml"/>
+ <junitreport tofile="${outcomes}/junit.xml">
+ <fileset file="${outcomes}/TEST-*.xml"/>
+ </junitreport>
+ <jacoco:merge destfile="${outcomes}/jacoco.exec">
+ <fileset dir="${outcomes}" includes="*-*.exec"/>
+ </jacoco:merge>
+ </target>
+ <target name="reports" depends="init" description="Generate junit and jacoco html reports">
+ <junitreport tofile="${outcomes}/junit.xml">
+ <fileset file="${outcomes}/TEST-*.xml"/>
+ <report todir="${reports}/junit"/>
+ </junitreport>
+ <jacoco:report>
+ <executiondata>
+ <file file="${outcomes}/jacoco.exec"/>
+ </executiondata>
+ <structure name="LAW Project">
+ <classfiles>
+ <fileset dir="${build}"/>
+ </classfiles>
+ <sourcefiles encoding="UTF-8">
+ <fileset dir="${src}"/>
+ </sourcefiles>
+ </structure>
+ <html destdir="${reports}/jacoco"/>
+ </jacoco:report>
+ </target>
+ <target name="test" depends="generate-outcomes,merge-outcomes,reports" description="Runs JUnit tests"/>
+
+ <!-- snapshot stuff -->
+
+ <target name="snapshot" description="Publishes a snapshot version on jars.law.di.unimi.it" depends="jar">
+ <move file="law-${version}.jar" tofile="${build}/law-library-${version}-SNAPSHOT.jar"/>
+ <ivy:resolve/>
+ <ivy:deliver deliverpattern="${build}/[artifact]-[revision].[ext]" pubrevision="${version}-SNAPSHOT" status="integration"/>
+ <ivy:makepom ivyfile="${build}/ivy-${version}-SNAPSHOT.xml" pomfile="${build}/law-library-${version}-SNAPSHOT.pom">
+ <dependency group="ch.qos.logback" artifact="logback-classic.jar" optional="true"/>
+ </ivy:makepom>
+ <ivy:publish resolver="law-snapshots" pubrevision="${version}-SNAPSHOT" overwrite="true" publishivy="false">
+ <artifacts pattern="${build}/[artifact]-[revision].[ext]"/>
+ </ivy:publish>
+ </target>
+
+</project>
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/bubing/tool/test.store b/third_party/law-2.5.1/data/it/unimi/dsi/law/bubing/tool/test.store
new file mode 100644
index 0000000..5160862
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/bubing/tool/test.store
@@ -0,0 +1,67 @@
+
+⇒⇒PAGE000000d9
+
+→→PURL0000000b
+http://c/ x
+→→DATE0000000e
+7/5/06 9:56 AM
+→→DGST00000020
+091d88f6f7101856f61bab18b48ce752
+→→CSET0000000a
+iso-8859-1
+→→STAT0000000f
+HTTP/1.1 200 OK
+→→CONT0000000f
+HTTP/1.1 200 OK
+⇒⇒PAGE000000db
+
+→→PURL0000000d
+http://c/%20x
+→→DATE0000000e
+7/5/06 9:56 AM
+→→DGST00000020
+191d88f6f7101856f61bab18b48ce752
+→→CSET0000000a
+iso-8859-1
+→→STAT0000000f
+HTTP/1.1 200 OK
+→→CONT0000000f
+HTTP/1.1 200 OK
+⇒⇒PAGE000000b9
+
+→→PURL0000000b
+http://b/ x
+→→DUPL0000000d
+http://c/%20x
+→→DATE0000000e
+7/5/06 9:56 AM
+→→DGST00000020
+191d88f6f7101856f61bab18b48ce752
+→→STAT0000000f
+HTTP/1.1 200 OK
+⇒⇒PAGE000000db
+
+→→PURL0000000d
+http://b/%20x
+→→DATE0000000e
+7/5/06 9:56 AM
+→→DGST00000020
+291d88f6f7101856f61bab18b48ce752
+→→CSET0000000a
+iso-8859-1
+→→STAT0000000f
+HTTP/1.1 200 OK
+→→CONT0000000f
+HTTP/1.1 200 OK
+⇒⇒PAGE000000b7
+
+→→PURL00000009
+http://a/
+→→DUPL0000000d
+http://b/%20x
+→→DATE0000000e
+7/5/06 9:56 AM
+→→DGST00000020
+291d88f6f7101856f61bab18b48ce752
+→→STAT0000000f
+HTTP/1.1 200 OK
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/bubing/util/bubing-test.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/bubing/util/bubing-test.properties
new file mode 100644
index 0000000..57b6922
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/bubing/util/bubing-test.properties
@@ -0,0 +1,48 @@
+name=agent
+group=group
+weight=1
+storeDir=store
+frontierDir=frontier
+responseCacheDir=cache
+sieveDir=sieve
+maxUrlsPerAuthority=100000
+parsingThreads=64
+dnsThreads=10
+fetchingThreads=1000
+fetchFilter=true
+scheduleFilter=true
+followFilter=true
+parseFilter=true
+storeFilter=true
+urlDelay=4000
+ipDelay=500
+maxUrls=100M
+bloomFilterPrecision=1E-8
+seed=http://0.0.0.0/,http://1.1.1.1/
+socketTimeout=60000
+connectionTimeout=60000
+fetchedResponseBufferSize=200000
+proxyHost=localhost
+proxyPort=8080
+cookiePolicy=compatibility
+maxCookieSize=2000
+userAgent=BUbiNG (+http://law.di.unimi.it/BUbiNG.html)
+userAgentFrom=law@di.unimi.it
+robotsExpiration=3600000
+storeDir=/tmp
+maxResponseBodyLength=2000000
+digestAlgorithm=MD5
+startPaused=false
+# One hour
+minFreezingTime=3600000
+# One day
+maxFreezingTime=86400000
+workbenchSize=2G
+urlCacheSize=2G
+sieveSize=256Mi
+sieveStoreIOBufferSize=4Mi
+sieveAuxFileIOBufferSize=4Mi
+gzipdConnections=false
+parserSpec=HTMLParser(MD5)
+queueFetchInterval=10s
+storeClass=it.unimi.dsi.law.bubing.WarcStore
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.frequencies b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.frequencies
new file mode 100644
index 0000000..2e30e01
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.frequencies
@@ -0,0 +1 @@
+)
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.globcounts b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.globcounts
new file mode 100644
index 0000000..2ae520c
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.globcounts
@@ -0,0 +1 @@
+)›
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.index b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.index
new file mode 100644
index 0000000..44bc2c8
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.index
@@ -0,0 +1 @@
+&þ0ut¼J 
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.offsets
new file mode 100644
index 0000000..c73459e
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.offsets
@@ -0,0 +1 @@
+Œ
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.properties
new file mode 100644
index 0000000..b43a4a5
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.properties
@@ -0,0 +1,12 @@
+#SecondPass properties
+#Mon Mar 14 17:22:57 CET 2005
+iscasesensitive=true
+occurrences=11
+batches=1
+terms=3
+compressionflags=
+maxdocpos=2
+documents=6
+indexclass=it.unimi.dsi.mg4j.index.Index
+maxdocsize=3
+occsperbatch=2097152
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.sizes b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.sizes
new file mode 100644
index 0000000..67acafd
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.sizes
@@ -0,0 +1 @@
+!¨
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.terms b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.terms
new file mode 100644
index 0000000..de98044
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/MG4J2MatrixTest.terms
@@ -0,0 +1,3 @@
+a
+b
+c
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/first.ranks b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/first.ranks
new file mode 100644
index 0000000..9be2cfd
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/first.ranks differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/iol-id.txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/iol-id.txt
new file mode 100644
index 0000000..bc856da
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/iol-id.txt
@@ -0,0 +1,4 @@
+0
+1
+2
+3
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/iol-sim.txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/iol-sim.txt
new file mode 100644
index 0000000..403c00b
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/iol-sim.txt
@@ -0,0 +1,4 @@
+0 ### 0,1.0000000 | 1,0.8889233 | 2,0.5322735 | 3,0.8059810
+1 ### 0,0.8889233 | 1,1.0000000 | 2,0.4571895 | 3,0.7620517
+2 ### 0,0.5322735 | 1,0.4571895 | 2,1.0000000 | 3,0.6878737
+3 ### 0,0.8059810 | 1,0.7620517 | 2,0.6878737 | 3,1.0000000
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/second.ranks b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/second.ranks
new file mode 100644
index 0000000..9bf2831
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/second.ranks differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/text b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/text
new file mode 100644
index 0000000..376f502
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/cluster/tool/text
@@ -0,0 +1,6 @@
+a b b
+b b a
+c a b
+c
+
+a
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/WorksheetForPRTest.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/WorksheetForPRTest.mw
new file mode 100644
index 0000000..d68bba8
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/WorksheetForPRTest.mw
@@ -0,0 +1,183 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<Worksheet><Version major="6" minor="1"/><View-Properties><Zoom percentage="100"/></View-Properties><Styles><Layout alignment="left" bullet="none" name="Warning"/><Layout alignment="left" bullet="none" firstindent="0.0" leftmargin="0.0" linebreak="space" linespacing="0.0" name="Normal" rightmargin="0.0" spaceabove="0.0" spacebelow="0.0"/><Layout alignment="centred" bullet="none" linespacing="0.5" name="Maple Output"/><Font background="[0,0,0]" bold="true" executable="true" family="Monospaced" foreground="[255,0,0]" name="Maple Input" opaque="false" size="12"/><Font background="[0,0,0]" family="Monospaced" foreground="[0,0,255]" name="Warning" opaque="false" readonly="true" size="12"/><Font background="[0,0,0]" family="Times New Roman" foreground="[0,0,255]" name="2D Output" opaque="false" readonly="true" size="12"/></Styles><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">with(networks): with(ListTools): with(LinearAlgebra):</Text-field></Input><Output><Text-field layout="Warning" style="Warning">Warning, the assigned name Group now has a global binding</Text-field></Output><Output><Text-field layout="Warning" style="Warning">Warning, the names DotProduct and Transpose have been rebound</Text-field></Output></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">Digits:=100;
+
+# This function substitutes edges (networks package)
+# to work around a bug.
+#
+# edgez([x,y],G) returns the set of edges from x to y in G
+#
+edgez :=
+ proc(x,G)
+ if (x[1]&lt;&gt;x[2]) then
+ edges(x,G);
+ else
+ edges({x[1],x[2]},G,'all');
+ fi;
+ end:
+
+# This function substitutes outdegree (networks package)
+# to work around a bug.
+#
+# outdegree(x,G) returns the outdegree of G in x
+#
+outdegreez := (x,G)-&gt;outdegree(x,G)+nops(edgez([x,x],G)):
+
+#
+# A graph is specified as a list of pairs or triples of values: the first two values
+# represent (the two endpoints of) an arc and the third, if present, represents its colour.
+# This function returns the graph (as a network).
+# The second argument, n, is the number of nodes: it is present because we want to allow
+# graphs with isolated nodes; if n is greater than the larger node number present in the
+# edge list, it will be ignored.
+#
+buildGraph :=
+ proc(E,nv)
+ local G, fe, mn, n, i, j;
+ new(G);
+ fe:=sort(Flatten(E));
+ if (nops(fe)&gt;0) then mn:=fe[nops(fe)];
+ else mn:=-1; fi;
+ n:=max(nv,mn);
+ addvertex({seq(k,k=1..n)},G);
+ for i from 1 to nops(E) do
+ if (nops(E[i])&gt;2) then
+ addedge([E[i][1],E[i][2]],weights=E[i][3],G);
+ else
+ addedge(E[i],G);
+ end;
+ end;
+ G;
+ end:
+</Text-field></Input><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiM+SSdEaWdpdHNHNiIiJCsi</Equation></Text-field></Output></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">#
+# Re-colours a given graph with its PageRank colouring. For nodes that do not have any outgoing
+# link, the patching vector u is used: if i does not have any outgoing link, then for every
+# j we will have an arc from i to j with color u[j] iff u[j]&gt;0.
+#
+pageRankColour :=
+ proc(G,u)
+ local s, i, j, k, e, w, n;
+ n:=nops(vertices(G));
+ for i from 1 to n do
+ if (outdegreez(i,G)&gt;0) then
+ w:=1/outdegreez(i,G);
+ for j from 1 to n do
+ e := edgez([i,j],G);
+ for k from 1 to nops(e) do
+ delete({e[k]},G);
+ addedge([i,j],weights=w,G);
+ end;
+ end;
+ else
+ for j from 1 to n do
+ if (u[j]&gt;0) then
+ addedge([i,j],weights=u[j],G);
+ end;
+ end;
+ end;
+
+ end;
+ end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">#
+# Given a graph, returns the corresponding matrix of weights: the entry (i,j) is the sum of weights of all
+# arcs from i to j.
+#
+graphMatrix :=
+ proc(G)
+ local A, n, i, j, k, e, s;
+ n:=nops(vertices(G));
+ A:=Matrix(n,n);
+ for i from 1 to n do
+ for j from 1 to n do
+ e := edgez([i,j],G);
+ s := 0;
+ for k from 1 to nops(e) do
+ s := s + eweight(e[k],G);
+ end;
+ A[i,j]:=s;
+ end;
+ end;
+ A;
+ end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">#
+# Returns the identity matrix of size n.
+#
+idMatrix :=
+ proc(n)
+ Matrix(n,n,shape=identity);
+ end:
+#
+# Returns the vector of ones of size n.
+#
+oneVector :=
+ proc(n)
+ local f;
+ f:=(i)-&gt;1;
+ Vector(n,f);
+ end:
+
+
+#
+# Returns the matrix A perturbed using the vector v.
+#
+perturbed :=
+ proc(A,v,alpha)
+ local n;
+ n := RowDimension(A);
+ simplify(alpha*A+(1-alpha)*oneVector(n).Transpose(v));
+ end:
+
+#
+# Computes the stationary distrubution of the perturbed matrix associated to A, using the vector v.
+#
+stationary :=
+ proc(A,v,alpha)
+ local n;
+ n:=RowDimension(A);
+ simplify((1-alpha)*Transpose(v).(idMatrix(n)-alpha*A)^(-1));
+ end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">
+</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">dir:="/home/vigna/law/java/data/it/unimi/dsi/law/rank/":
+pref:="test50-.6-7-3-2-10-graph":
+suffixes:=["1stHalf","2ndHalf","alternate","uniform"]:
+read(cat(dir,pref,".mw")):
+</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">for k from 1 to nops(suffixes) do
+ fn:=cat(dir,pref,"-preferenceVector-",suffixes[k]):
+ read(cat(fn,".mw")):
+ G:=buildGraph(eG,50): N:=nops(vertices(G)): pageRankColour(G,oneVector(N)/N): wpr:=stationary(graphMatrix(G),v,0.85):
+ f:=fopen(cat(fn,"-w.out"),WRITE,TEXT):
+ for i from 1 to 50 do fprintf(f,"%.30g\n",wpr[i]): end: fclose(f):
+ G:=buildGraph(eG,50): N:=nops(vertices(G)): pageRankColour(G,v): spr:=stationary(graphMatrix(G),v,0.85):
+ f:=fopen(cat(fn,"-s.out"),WRITE,TEXT):
+ for i from 1 to 50 do fprintf(f,"%.30g\n",spr[i]): end: fclose(f):
+end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">dir:="/home/vigna/law/java/data/it/unimi/dsi/law/rank/":
+pref:="test10-.7-2-2-2-5-graph":
+suffixes:=["1stHalf","2ndHalf","alternate","uniform"]:
+read(cat(dir,pref,".mw")):
+</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">for k from 1 to nops(suffixes) do
+ G:=buildGraph(eG,10): N:=nops(vertices(G)):
+ fn:=cat(dir,pref,"-preferenceVector-",suffixes[k]):
+ read(cat(fn,".mw")):
+ pageRankColour(G,oneVector(N)/N): wpra:=stationary(graphMatrix(G),v,a):
+ wpr:=eval(wpra,a=0.85):
+ wprad:=[seq(diff(wpra[i],a),i=1..10)]:
+ wprd:=eval(wprad,a=0.85):
+ wpradd:=[seq(diff(wprad[i],a),i=1..10)]:
+ wprdd:=eval(wpradd,a=0.85):
+ f:=fopen(cat(fn,"-w.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",wpr[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-wd1.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",wprd[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-wd2.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",wprdd[i]): end: fclose(f):
+
+ G:=buildGraph(eG,10): N:=nops(vertices(G)):
+ pageRankColour(G,v): spra:=stationary(graphMatrix(G),v,a):
+ spr:=eval(spra,a=0.85):
+ sprad:=[seq(diff(spra[i],a),i=1..10)]:
+ sprd:=eval(sprad,a=0.85):
+ spradd:=[seq(diff(sprad[i],a),i=1..10)]:
+ sprdd:=eval(spradd,a=0.85):
+ f:=fopen(cat(fn,"-s.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",spr[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-sd1.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",sprd[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-sd2.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",sprdd[i]): end: fclose(f):
+end:</Text-field></Input><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpa0RhUS1JJ01BVFJJWEdGJTYjNyw3IyQiJysrPyEiJ0YsRixGLEYsNyMkIiIhRjJGMEYwRjBGMCZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0Y2SShfc3lzbGliR0YlNiNJJ2NvbHVtbkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpIT0vZSQtSSdNQVRSSVhHRiU2IzcsNywkIicrKz8hIidGLUYtRi1GLSIiIUYwRjBGMEYwNywjIiIiIiIjRjBGMEYwRjJGMEYwRjBGMEYwNyxGMkYyRjBGMEYwRjBGMEYwRjBGMEYsNyxGMkYwRjJGMEYwRjBGMEYwRjBGMEYsRiw3LEYwRjBGMEYwRjBGMEYzRjBGMEYwRiw3LEYwRjBGMEYwRjBGMEYyRjJGMEYwSSdNYXRyaXhHNiRJKnByb3RlY3RlZEdGO0koX3N5c2xpYkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpITMnR08tSSdWRUNUT1JHRiU2IzcsLCQqJiwmSSJhR0YlIiIiJEYwIiIhRjBGMCwmRi9GMCQiIzVGMkYwISIiJCIiI0YyLCQqJEYzRjZGN0Y5LCQqJiwmRi9GMCQhIiNGMkYwRjBGM0Y2JEY2RjJGOSRGMkYyRkFGQUZBRkEmSSdWZWN0b3JHNiRJKnByb3RlY3RlZEdGRUkoX3N5c2xpYkdGJTYjSSRyb3dHRiU=</Equation></Text-field></Output><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpX3JaTS1JJ01BVFJJWEdGJTYjNyw3IyQiIiFGLkYsRixGLEYsNyMkIicrKz8hIidGL0YvRi9GLyZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0Y2SShfc3lzbGliR0YlNiNJJ2NvbHVtbkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpWycpPU8tSSdNQVRSSVhHRiU2IzcsNywiIiFGLUYtRi1GLSQiJysrPyEiJ0YuRi5GLkYuNywjIiIiIiIjRi1GLUYtRjJGLUYtRi1GLUYtNyxGMkYyRi1GLUYtRi1GLUYtRi1GLUYsNyxGMkYtRjJGLUYtRi1GLUYtRi1GLUYsRiw3LEYtRi1GLUYtRi1GLUYzRi1GLUYtRiw3LEYtRi1GLUYtRi1GLUYyRjJGLUYtSSdNYXRyaXhHNiRJKnByb3RlY3RlZEdGO0koX3N5c2xpYkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpOzMiZiQtSSdWRUNUT1JHRiU2IzcsJCIiIUYtRixGLEYsRiwqJCwoKiRJImFHRiUiIiMkIl9xKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrK10hJCsiRjEkRjJGLSQiIiZGLSIiIiEiIiomLChGMEY5RjEkIiIkRi1GNkY5RjksKEYwRjlGMSQiIiVGLSQiIzVGLUY5RjoqJiwmRjFGOUY2RjlGOUY/RjpGLiwkKiRGP0Y6RjYmSSdWZWN0b3JHNiRJKnByb3RlY3RlZEdGS0koX3N5c2xpYkdGJTYjSSRyb3dHRiU=</Equation></Text-field></Output><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpU0UpXCQtSSdNQVRSSVhHRiU2IzcsNyMkIicrKz8hIic3IyQiIiFGMkYsRjBGLEYwRixGMEYsRjAmSSdWZWN0b3JHNiRJKnByb3RlY3RlZEdGNkkoX3N5c2xpYkdGJTYjSSdjb2x1bW5HRiU=</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpKTM2dSQtSSdNQVRSSVhHRiU2IzcsNywkIicrKz8hIiciIiFGLUYwRi1GMEYtRjBGLUYwNywjIiIiIiIjRjBGMEYwRjJGMEYwRjBGMEYwNyxGMkYyRjBGMEYwRjBGMEYwRjBGMEYsNyxGMkYwRjJGMEYwRjBGMEYwRjBGMEYsRiw3LEYwRjBGMEYwRjBGMEYzRjBGMEYwRiw3LEYwRjBGMEYwRjBGMEYyRjJGMEYwSSdNYXRyaXhHNiRJKnByb3RlY3RlZEdGO0koX3N5c2xpYkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpS3peTy1JJ1ZFQ1RPUkdGJTYjNywsJComLCZJImFHRiUiIiIkIiIjIiIhRjBGMCwmRi9GMCQhIzVGM0YwISIiJEY3RjMsJCoqRi5GMEYvRjAsKCokRi9GMkYwRi9GMSQiIiVGM0YwRjdGNEY3JCEiI0YzLCQqKEYuRjBGO0Y3RjRGNyQhIiVGMyRGM0YzLCQqKCwmRjxGMEY9RjBGMEY7RjdGNEY3Rj9GRSomLCZGL0YwRj9GMEYwRjRGN0ZFRklGRSZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0ZOSShfc3lzbGliR0YlNiNJJHJvd0dGJQ==</Equation></Text-field></Output><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpV0JbTy1JJ01BVFJJWEdGJTYjNyw3IyQiJysrNSEiJ0YsRixGLEYsRixGLEYsRixGLCZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0YzSShfc3lzbGliR0YlNiNJJ2NvbHVtbkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpdzdTTS1JJ01BVFJJWEdGJTYjNyw3LCQiJysrNSEiJ0YtRi1GLUYtRi1GLUYtRi1GLTcsIyIiIiIiIyIiIUY0RjRGMUY0RjRGNEY0RjQ3LEYxRjFGNEY0RjRGNEY0RjRGNEY0Riw3LEYxRjRGMUY0RjRGNEY0RjRGNEY0RixGLDcsRjRGNEY0RjRGNEY0RjJGNEY0RjRGLDcsRjRGNEY0RjRGNEY0RjFGMUY0RjRJJ01hdHJpeEc2JEkqcHJvdGVjdGVkR0Y7SShfc3lzbGliR0Yl</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation></Equation></Text-field></Output></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Text-field/><Text-field/><Text-field/><Text-field/><RTable handle="38542564" >TTdSMApJNVJUQUJMRV9TQVZFLzM4NTQyNTY0WColKWFueXRoaW5nRzYiNiJbZ2whIyUhISEiKyIrJCInKys/ISInRidGJ0YnRickIiIhRitGKkYqCkYqRipGJgo=</RTable><RTable handle="35804180" >TTdSMApJNVJUQUJMRV9TQVZFLzM1ODA0MTgwWCwlKWFueXRoaW5nRzYiNiJbZ2whIiUhISEjX3EiKyIrJCInKys/ISInIyIiIiIiI0YqRidGKkYnCkYnIiIhRidGLUYnRi1GKkYnRi1GJ0YnRi1GJ0YtRidGLUYtRidGKkYnRidGLUYnRi1GJ0YtRi1GJ0YtRidGJ0YtRidGLUYnRipGLUYnRi0KRidGJ0YtRidGLUYtRi1GLUYtRi1GLUYtRi1GLUYtRi1GLUYtRi1GLUYtRi1GK0YtRipGLUYtRi1GLUYtRi1GLUYtRi1GKkYtRi1GLUYtRgotRi1GLUYtRi1GLUYtRi1GLUYtRi1GLUYtRi1GLUYtRiYK</RTable><RTable handle="36286080" >TTdSMApJNVJUQUJMRV9TQVZFLzM2Mjg2MDgwWColKWFueXRoaW5nRzYiNiJbZ2whJCUhISEiKyIrLCQqJiwmJSJhRyIiIiRGKyIiIUYrRissJkYqCkYrJCIjNUYtRishIiIkIiIjRi0sJCokRi5GMUYyRjQsJComLCZGKkYrJCEiI0YtRitGK0YuRjEkRjFGLUY0JEYtRi1GPEY8RjxGPEYmCg==</RTable><RTable handle="34477152" >TTdSMApJNVJUQUJMRV9TQVZFLzM0NDc3MTUyWColKWFueXRoaW5nRzYiNiJbZ2whIyUhISEiKyIrJCIiIUYoRidGJ0YnRickIicrKz8hIidGKUYpCkYpRilGJgo=</RTable><RTable handle="36188648" >TTdSMApJNVJUQUJMRV9TQVZFLzM2MTg4NjQ4WCwlKWFueXRoaW5nRzYiNiJbZ2whIiUhISEjX3EiKyIrIiIhIyIiIiIiI0YoRidGKEYnRidGJ0YnCkYnRidGJ0YoRidGJ0YnRidGJ0YnRidGJ0YnRidGJ0YoRidGJ0YnRidGJ0YnRidGJ0YnRidGJ0YnRidGJ0YnRidGKEYnRidGJ0YnRidGJ0YKJ0YnJCInKys/ISInRidGJ0YrRidGK0YrRidGK0YnRitGJ0YnRitGJ0YrRitGKUYrRihGK0YnRidGK0YnRitGK0YnRitGKEYrRidGJ0YrRgonRitGK0YnRitGJ0YrRidGJ0YrRidGK0YrRidGK0YnRiYK</RTable><RTable handle="35910816" >TTdSMApJNVJUQUJMRV9TQVZFLzM1OTEwODE2WColKWFueXRoaW5nRzYiNiJbZ2whJCUhISEiKyIrJCIiIUYoRidGJ0YnRicqJCwoKiQlImFHIiIjCiQiX3ErKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrXSEkKyJGLCRGLUYoJCIiJkYoIiIiISIiKiYKLChGK0Y0RiwkIiIkRihGMUY0RjQsKEYrRjRGLCQiIiVGKCQiIzVGKEY0RjUqJiwmRixGNEYxRjRGNEY6RjVGKSwkKiRGOkY1RjFGJgo=</RTable><RTable handle="34982640" >TTdSMApJNVJUQUJMRV9TQVZFLzM0OTgyNjQwWColKWFueXRoaW5nRzYiNiJbZ2whIyUhISEiKyIrJCInKys/ISInJCIiIUYrRidGKkYnRipGJ0YqCkYnRipGJgo=</RTable><RTable handle="37411088" >TTdSMApJNVJUQUJMRV9TQVZFLzM3NDExMDg4WCwlKWFueXRoaW5nRzYiNiJbZ2whIiUhISEjX3EiKyIrJCInKys/ISInIyIiIiIiI0YqRidGKkYnCkYnIiIhRidGLUYtRi1GKkYtRi1GLUYtRi1GLUYtRidGLUYtRidGKkYnRidGLUYnRi1GLUYtRi1GLUYtRi1GLUYtRi1GLUYnRipGLUYnRi0KRidGJ0YtRidGLUYtRi1GLUYtRi1GLUYtRi1GLUYtRidGLUYtRidGLUYnRidGK0YnRipGLUYtRi1GLUYtRi1GLUYtRi1GKkYnRi1GLUYnRgotRidGJ0YtRidGLUYtRi1GLUYtRi1GLUYtRi1GLUYtRiYK</RTable><RTable handle="36517932" >TTdSMApJNVJUQUJMRV9TQVZFLzM2NTE3OTMyWColKWFueXRoaW5nRzYiNiJbZ2whJCUhISEiKyIrLCQqJiwmJSJhRyIiIiQiIiMiIiFGK0YrLCZGCipGKyQhIzVGLkYrISIiJEYyRi4sJCoqRilGK0YqRissKCokRipGLUYrRipGLCQiIiVGLkYrRjJGL0YyJCEiI0YuLCQqKEYpRitGNkYyRi8KRjIkISIlRi4kRi5GLiwkKigsJkY3RitGOEYrRitGNkYyRi9GMkY6RkAqJiwmRipGK0Y6RitGK0YvRjJGQEZERkBGJgo=</RTable><RTable handle="36482344" >TTdSMApJNVJUQUJMRV9TQVZFLzM2NDgyMzQ0WColKWFueXRoaW5nRzYiNiJbZ2whIyUhISEiKyIrJCInKys1ISInRidGJ0YnRidGJ0YnRidGJ0YnCkYmCg==</RTable><RTable handle="34401276" >TTdSMApJNVJUQUJMRV9TQVZFLzM0NDAxMjc2WCwlKWFueXRoaW5nRzYiNiJbZ2whIiUhISEjX3EiKyIrJCInKys1ISInIyIiIiIiI0YqRidGKkYnCkYnIiIhRidGLUYnRi1GKkYnRi1GJ0YnRi1GJ0YtRidGLUYtRidGKkYnRidGLUYnRi1GJ0YtRi1GJ0YtRidGJ0YtRidGLUYnRipGLUYnRi0KRidGJ0YtRidGLUYnRi1GLUYnRi1GJ0YnRi1GJ0YtRidGLUYtRidGLUYnRidGK0YnRipGJ0YtRi1GJ0YtRidGJ0YtRidGKkYnRi1GLUYnRgotRidGJ0YtRidGLUYnRi1GLUYnRi1GJ0YnRi1GJ0YtRiYK</RTable><RTable handle="67252448" ></RTable></Worksheet>
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/WorksheetForPRTest_MAS.bak b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/WorksheetForPRTest_MAS.bak
new file mode 100644
index 0000000..c5c1c92
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/WorksheetForPRTest_MAS.bak
@@ -0,0 +1,185 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<Worksheet><Version major="6" minor="1"/><View-Properties><Zoom percentage="100"/></View-Properties><Styles><Layout alignment="left" bullet="none" name="Warning"/><Layout alignment="left" bullet="none" firstindent="0.0" leftmargin="0.0" linebreak="space" linespacing="0.0" name="Normal" rightmargin="0.0" spaceabove="0.0" spacebelow="0.0"/><Layout alignment="centred" bullet="none" linespacing="0.5" name="Maple Output"/><Font background="[0,0,0]" bold="true" executable="true" family="Monospaced" foreground="[255,0,0]" name="Maple Input" opaque="false" size="12"/><Font background="[0,0,0]" family="Monospaced" foreground="[0,0,255]" name="Warning" opaque="false" readonly="true" size="12"/><Font background="[0,0,0]" family="Times New Roman" foreground="[0,0,255]" name="2D Output" opaque="false" readonly="true" size="12"/></Styles><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">with(networks): with(ListTools): with(LinearAlgebra):</Text-field></Input><Output><Text-field layout="Warning" style="Warning">Warning, the assigned name Group now has a global binding</Text-field></Output><Output><Text-field layout="Warning" style="Warning">Warning, the names DotProduct and Transpose have been rebound</Text-field></Output></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">Digits:=100;
+
+# This function substitutes edges (networks package)
+# to work around a bug.
+#
+# edgez([x,y],G) returns the set of edges from x to y in G
+#
+edgez :=
+ proc(x,G)
+ if (x[1]&lt;&gt;x[2]) then
+ edges(x,G);
+ else
+ edges({x[1],x[2]},G,'all');
+ fi;
+ end:
+
+# This function substitutes outdegree (networks package)
+# to work around a bug.
+#
+# outdegree(x,G) returns the outdegree of G in x
+#
+outdegreez := (x,G)-&gt;outdegree(x,G)+nops(edgez([x,x],G)):
+
+#
+# A graph is specified as a list of pairs or triples of values: the first two values
+# represent (the two endpoints of) an arc and the third, if present, represents its colour.
+# This function returns the graph (as a network).
+# The second argument, n, is the number of nodes: it is present because we want to allow
+# graphs with isolated nodes; if n is greater than the larger node number present in the
+# edge list, it will be ignored.
+#
+buildGraph :=
+ proc(E,nv)
+ local G, fe, mn, n, i, j;
+ new(G);
+ fe:=sort(Flatten(E));
+ if (nops(fe)&gt;0) then mn:=fe[nops(fe)];
+ else mn:=-1; fi;
+ n:=max(nv,mn);
+ addvertex({seq(k,k=1..n)},G);
+ for i from 1 to nops(E) do
+ if (nops(E[i])&gt;2) then
+ addedge([E[i][1],E[i][2]],weights=E[i][3],G);
+ else
+ addedge(E[i],G);
+ end;
+ end;
+ G;
+ end:
+</Text-field></Input><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiM+SSdEaWdpdHNHNiIiJCsi</Equation></Text-field></Output></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">#
+# Re-colours a given graph with its PageRank colouring. For nodes that do not have any outgoing
+# link, the patching vector u is used: if i does not have any outgoing link, then for every
+# j we will have an arc from i to j with color u[j] iff u[j]&gt;0.
+#
+pageRankColour :=
+ proc(G,u)
+ local s, i, j, k, e, w, n;
+ n:=nops(vertices(G));
+ for i from 1 to n do
+ if (outdegreez(i,G)&gt;0) then
+ w:=1/outdegreez(i,G);
+ for j from 1 to n do
+ e := edgez([i,j],G);
+ for k from 1 to nops(e) do
+ delete({e[k]},G);
+ addedge([i,j],weights=w,G);
+ end;
+ end;
+ else
+ for j from 1 to n do
+ if (u[j]&gt;0) then
+ addedge([i,j],weights=u[j],G);
+ end;
+ end;
+ end;
+
+ end;
+ end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">#
+# Given a graph, returns the corresponding matrix of weights: the entry (i,j) is the sum of weights of all
+# arcs from i to j.
+#
+graphMatrix :=
+ proc(G)
+ local A, n, i, j, k, e, s;
+ n:=nops(vertices(G));
+ A:=Matrix(n,n);
+ for i from 1 to n do
+ for j from 1 to n do
+ e := edgez([i,j],G);
+ s := 0;
+ for k from 1 to nops(e) do
+ s := s + eweight(e[k],G);
+ end;
+ A[i,j]:=s;
+ end;
+ end;
+ A;
+ end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">#
+# Returns the identity matrix of size n.
+#
+idMatrix :=
+ proc(n)
+ Matrix(n,n,shape=identity);
+ end:
+#
+# Returns the vector of ones of size n.
+#
+oneVector :=
+ proc(n)
+ local f;
+ f:=(i)-&gt;1;
+ Vector(n,f);
+ end:
+
+
+#
+# Returns the matrix A perturbed using the vector v.
+#
+perturbed :=
+ proc(A,v,alpha)
+ local n;
+ n := RowDimension(A);
+ simplify(alpha*A+(1-alpha)*oneVector(n).Transpose(v));
+ end:
+
+#
+# Computes the stationary distrubution of the perturbed matrix associated to A, using the vector v.
+#
+stationary :=
+ proc(A,v,alpha)
+ local n;
+ n:=RowDimension(A);
+ simplify((1-alpha)*Transpose(v).(idMatrix(n)-alpha*A)^(-1));
+ end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">
+</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">dir:="/home/vigna/law/java/data/it/unimi/dsi/law/rank/":
+pref:="test50-.6-7-3-2-10-graph":
+suffixes:=["1stHalf","2ndHalf","alternate","uniform"]:
+read(cat(dir,pref,".mw")):
+</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">for k from 1 to nops(suffixes) do
+ fn:=cat(dir,pref,"-preferenceVector-",suffixes[k]):
+ read(cat(fn,".mw")):
+ G:=buildGraph(eG,50): N:=nops(vertices(G)): pageRankColour(G,oneVector(N)/N): wpr:=stationary(graphMatrix(G),v,0.85):
+ f:=fopen(cat(fn,"-w.out"),WRITE,TEXT):
+ for i from 1 to 50 do fprintf(f,"%.30g\n",wpr[i]): end: fclose(f):
+ G:=buildGraph(eG,50): N:=nops(vertices(G)): pageRankColour(G,v): spr:=stationary(graphMatrix(G),v,0.85):
+ f:=fopen(cat(fn,"-s.out"),WRITE,TEXT):
+ for i from 1 to 50 do fprintf(f,"%.30g\n",spr[i]): end: fclose(f):
+end:</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">dir:="/home/vigna/law/java/data/it/unimi/dsi/law/rank/":
+pref:="test10-.7-2-2-2-5-graph":
+suffixes:=["1stHalf","2ndHalf","alternate","uniform"]:
+read(cat(dir,pref,".mw")):
+</Text-field></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input">for k from 1 to nops(suffixes) do
+ G:=buildGraph(eG,10): N:=nops(vertices(G)):
+ fn:=cat(dir,pref,"-preferenceVector-",suffixes[k]):
+ read(cat(fn,".mw")):
+ pageRankColour(G,oneVector(N)/N): wpra:=stationary(graphMatrix(G),v,a):
+ wpr:=eval(wpra,a=0.85):
+ wprad:=[seq(diff(wpra[i],a),i=1..10)]:
+ wprd:=eval(wprad,a=0.85):
+ wpradd:=[seq(diff(wprad[i],a),i=1..10)]:
+ wprdd:=eval(wpradd,a=0.85):
+ f:=fopen(cat(fn,"-w.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",wpr[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-wd1.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",wprd[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-wd2.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",wprdd[i]): end: fclose(f):
+
+ pageRankColour(G,v): spra:=stationary(graphMatrix(G),v,a):
+ print(v);
+ print(graphMatrix(G));
+ print(spra);
+ spr:=eval(spra,a=0.85):
+ sprad:=[seq(diff(spra[i],a),i=1..10)]:
+ sprd:=eval(sprad,a=0.85):
+ spradd:=[seq(diff(sprad[i],a),i=1..10)]:
+ sprdd:=eval(spradd,a=0.85):
+ f:=fopen(cat(fn,"-s.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",spr[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-sd1.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",sprd[i]): end: fclose(f):
+ f:=fopen(cat(fn,"-sd2.out"),WRITE,TEXT):
+ for i from 1 to 10 do fprintf(f,"%.30g\n",sprdd[i]): end: fclose(f):
+end:</Text-field></Input><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpTyNRZSQtSSdNQVRSSVhHRiU2IzcsNyMkIicrKz8hIidGLEYsRixGLDcjJCIiIUYyRjBGMEYwRjAmSSdWZWN0b3JHNiRJKnByb3RlY3RlZEdGNkkoX3N5c2xpYkdGJTYjSSdjb2x1bW5HRiU=</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpY0tQUC1JJ01BVFJJWEdGJTYjNyw3LCMiIiIiIzVGLUYtRi1GLUYtRi1GLUYtRi03LCNGLiIiIyIiIUYzRjNGMUYzRjNGM0YzRjM3LEYxRjFGM0YzRjNGM0YzRjNGM0YzRiw3LEYxRjNGMUYzRjNGM0YzRjNGM0YzRixGLDcsRjNGM0YzRjNGM0YzRi5GM0YzRjNGLDcsRjNGM0YzRjNGM0YzRjFGMUYzRjNJJ01hdHJpeEc2JEkqcHJvdGVjdGVkR0Y6SShfc3lzbGliR0Yl</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpK1NlUS1JJ1ZFQ1RPUkdGJTYjNywsJComLCwqJEkiYUdGJSIiJSIiIiokRjAiIiQkRjEiIiEqJEYwIiIjJCIiKkY2RjAkISM5RjYkISM/RjZGMkYyLChGM0YyRjckRjhGNiQhI1NGNkYyISIiJCJfcSsrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKytTISQrIiwkKiYsKkYzRjJGNyRGNEY2RjAkIiInRjZGPUYyRjJGP0ZDRkRGRywkKiYsKkYwJCEjS0Y2Ri9GMkYzRjIkIiNTRjZGMkYyRj9GQyQhX3ErKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrP0ZGRkcsJCooLCZGMEYyRjVGMkYyRjBGMkY/RkMkIV9xKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrK1NGRiwkKigsKkYzRjJGNyQiIihGNkYwJCIjOUY2JCIiKUY2RjJGMkYwRjJGP0ZDRlQsJCooLChGN0YyRlxvRjJGMEZLRjJGMEYyRj9GQ0ZURlZGViZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0Zkb0koX3N5c2xpYkdGJTYjSSRyb3dHRiU=</Equation></Text-field></Output><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpKSl5cU8tSSdNQVRSSVhHRiU2IzcsNyMkIiIhRi5GLEYsRixGLDcjJCInKys/ISInRi9GL0YvRi8mSSdWZWN0b3JHNiRJKnByb3RlY3RlZEdGNkkoX3N5c2xpYkdGJTYjSSdjb2x1bW5HRiU=</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpWypwZyQtSSdNQVRSSVhHRiU2IzcsNywjIiIiIiM1Ri1GLUYtRi1GLUYtRi1GLUYtNywjRi4iIiMiIiFGM0YzRjFGM0YzRjNGM0YzNyxGMUYxRjNGM0YzRjNGM0YzRjNGM0YsNyxGMUYzRjFGM0YzRjNGM0YzRjNGM0YsRiw3LEYzRjNGM0YzRjNGM0YuRjNGM0YzRiw3LEYzRjNGM0YzRjNGM0YxRjFGM0YzSSdNYXRyaXhHNiRJKnByb3RlY3RlZEdGOkkoX3N5c2xpYkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpTyh5eCQtSSdWRUNUT1JHRiU2IzcsLCQqKiwoJCIiJyIiISIiIkkiYUdGJSQiIiRGMSokRjMiIiNGMkYyRjNGMiwmRjNGMiRGMkYxRjJGMiwoKiRGM0Y1RjJGNiRGN0YxJCEjU0YxRjIhIiIkIV9xKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrK1MhJCsiLCQqKEYuRjJGM0YyRjpGP0ZARkMsJCoqRi5GMiwmRjNGMiQhIiNGMUYyRjJGM0YyRjpGPyQiX3ErKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrP0ZCRkMsJComLChGNkYyRjMkIiM5RjEkISM/RjFGMkYyRjpGPyQiX3ErKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrU0ZCLCQqJiwsKiRGMyIiJUYyRjskIiM8RjFGNiQiI0NGMUYzJCEjS0YxRj1GMkYyRjpGP0ZKLCQqJiwqRjtGMkYzJCIiKUYxRjYkIiM7RjFGPUYyRjJGOkY/RkpGTEZMJkknVmVjdG9yRzYkSSpwcm90ZWN0ZWRHRmRvSShfc3lzbGliR0YlNiNJJHJvd0dGJQ==</Equation></Text-field></Output><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpV3ReTi1JJ01BVFJJWEdGJTYjNyw3IyQiJysrPyEiJzcjJCIiIUYyRixGMEYsRjBGLEYwRixGMCZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0Y2SShfc3lzbGliR0YlNiNJJ2NvbHVtbkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpKUdAcyQtSSdNQVRSSVhHRiU2IzcsNywjIiIiIiM1Ri1GLUYtRi1GLUYtRi1GLUYtNywjRi4iIiMiIiFGM0YzRjFGM0YzRjNGM0YzNyxGMUYxRjNGM0YzRjNGM0YzRjNGM0YsNyxGMUYzRjFGM0YzRjNGM0YzRjNGM0YsRiw3LEYzRjNGM0YzRjNGM0YuRjNGM0YzRiw3LEYzRjNGM0YzRjNGM0YxRjFGM0YzSSdNYXRyaXhHNiRJKnByb3RlY3RlZEdGOkkoX3N5c2xpYkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpR01iUC1JJ1ZFQ1RPUkdGJTYjNywsJComLCwqJEkiYUdGJSIiJSIiIiokRjAiIiQkIiImIiIhKiRGMCIiIyQiIidGN0YwJCEjN0Y3JCEjU0Y3RjJGMiwoRjNGMkY4JEY5RjdGPkYyISIiJCJfcSsrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKys/ISQrIiwkKiosLEY4JCIiKUY3RjAkISM/RjckISNrRjdGMkYzRjVGL0YyRjJGMEYyRkBGQiwoRjhGMkYwRkEkRjFGN0YyRkIkIl9xKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrK1NGRSwkKigsLEYzRlBGOCQiIzdGN0YwRjxGPkYyRi9GMkYyRkBGQkZPRkIkIl9xKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKyEpRkUsJCooLCZGMEYyJCEiJ0Y3RjJGMkYwRjJGQEZCRlEsJCooLC5GM0Y6RjgkISM7RjdGMCQiIztGNyokRjBGNkYyRi8kRjRGNyQhIyEpRjdGMkYyRkBGQkZPRkJGUUZaLCQqJiwqRjNGQUY4JCIiKEY3RjAkISM5RjckIiM/RjdGMkYyRkBGQiQhX3ErKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrU0ZFLCQqKCwoRjhGMkY8RjJGMCQhIiVGN0YyRjBGMkZARkJGQywkKiYsLEY4RmJwRjAkISNHRjdGL0YyRjNGMiQiI1NGN0YyRjJGQEZCJCFfcSsrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKys/RkVGWiZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0ZgcUkoX3N5c2xpYkdGJTYjSSRyb3dHRiU=</Equation></Text-field></Output><Output><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpY0cmeiQtSSdNQVRSSVhHRiU2IzcsNyMkIicrKzUhIidGLEYsRixGLEYsRixGLEYsRiwmSSdWZWN0b3JHNiRJKnByb3RlY3RlZEdGM0koX3N5c2xpYkdGJTYjSSdjb2x1bW5HRiU=</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpIz4jNE0tSSdNQVRSSVhHRiU2IzcsNywjIiIiIiM1Ri1GLUYtRi1GLUYtRi1GLUYtNywjRi4iIiMiIiFGM0YzRjFGM0YzRjNGM0YzNyxGMUYxRjNGM0YzRjNGM0YzRjNGM0YsNyxGMUYzRjFGM0YzRjNGM0YzRjNGM0YsRiw3LEYzRjNGM0YzRjNGM0YuRjNGM0YzRiw3LEYzRjNGM0YzRjNGM0YxRjFGM0YzSSdNYXRyaXhHNiRJKnByb3RlY3RlZEdGOkkoX3N5c2xpYkdGJQ==</Equation></Text-field><Text-field layout="Maple Output" style="2D Output"><Equation>NiMtSSdSVEFCTEVHNiI2JSIpbzpJTi1JJ1ZFQ1RPUkdGJTYjNywsJComLCZJImFHRiUiIiIkRjAiIiFGMEYwLCgqJEYvIiIkRjAqJEYvIiIjJEY3RjIkISNTRjJGMCEiIiQhIiVGMiwkKiRGM0Y7RjxGPiwkKiYsJkYvRjAkISIjRjJGMEYwRjNGO0Y4Rj5GQComLCpGNEYwRjZGMEYvRjxGPEYwRjBGM0Y7KiYsJkY2RjBGPEYwRjBGM0Y7RkBGQCZJJ1ZlY3Rvckc2JEkqcHJvdGVjdGVkR0ZMSShfc3lzbGliR0YlNiNJJHJvd0dGJQ==</Equation></Text-field></Output></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Group><Input><Text-field layout="Normal" prompt="&gt; " style="Maple Input"/></Input></Group><Text-field/><Text-field/><Text-field/><Text-field/></Worksheet>
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-s.out
new file mode 100644
index 0000000..6ab1c97
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-s.out
@@ -0,0 +1,10 @@
+0.341013824884792626728110599078
+0.18433179723502304147465437788
+0.18433179723502304147465437788
+0.105990783410138248847926267281
+0.18433179723502304147465437788
+0
+0
+0
+0
+0
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-sd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-sd1.out
new file mode 100644
index 0000000..cce4375
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-sd1.out
@@ -0,0 +1,10 @@
+0.152901951623521416891418377965
+-0.0169891057359468240990464864406
+-0.0169891057359468240990464864406
+-0.101934634415680944594278918643
+-0.0169891057359468240990464864406
+0
+0
+0
+0
+0
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-sd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-sd2.out
new file mode 100644
index 0000000..1c31998
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-sd2.out
@@ -0,0 +1,10 @@
+-0.0281846915435062519615517747401
+0.00313163239372291688461686386001
+0.00313163239372291688461686386001
+0.0187897943623375013077011831601
+0.00313163239372291688461686386001
+0
+0
+0
+0
+0
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-w.out
new file mode 100644
index 0000000..2cc2230
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-w.out
@@ -0,0 +1,10 @@
+0.236357160977441875022650373772
+0.127760627555373986498729931769
+0.127760627555373986498729931769
+0.0734623608443400422367697107671
+0.127760627555373986498729931769
+0.0434623608443400422367697107671
+0.11457764877589143634668415001
+0.0619338642031845601873968378431
+0.0434623608443400422367697107671
+0.0434623608443400422367697107671
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-wd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-wd1.out
new file mode 100644
index 0000000..f308429
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-wd1.out
@@ -0,0 +1,10 @@
+-0.0963257043032901154971094905962
+-0.121127746950629244322075363441
+-0.121127746950629244322075363441
+-0.133528768274298808734558299863
+-0.121127746950629244322075363441
+0.0664712317257011912654417001373
+0.27737133262107886472992950229
+0.116452685631294218671639278079
+0.0664712317257011912654417001373
+0.0664712317257011912654417001373
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-wd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-wd2.out
new file mode 100644
index 0000000..e79d94c
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf-wd2.out
@@ -0,0 +1,10 @@
+-0.467826231011341050340936938762
+-0.121930128167612195511776330747
+-0.121930128167612195511776330747
+0.0510179232542522319028039732613
+-0.121930128167612195511776330747
+0.0510179232542522319028039732613
+0.490373150134158087538112675922
+0.139171772363010621726937362035
+0.0510179232542522319028039732613
+0.0510179232542522319028039732613
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf.bin
new file mode 100644
index 0000000..9b8d546
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf.bin differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf.mw
new file mode 100644
index 0000000..19d0d18
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-1stHalf.mw
@@ -0,0 +1 @@
+v := Vector( [ 0.200000, 0.200000, 0.200000, 0.200000, 0.200000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000 ] );
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-s.out
new file mode 100644
index 0000000..1915bf5
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-s.out
@@ -0,0 +1,10 @@
+0
+0
+0
+0
+0
+0.141617985484156487873959992919
+0.373340414232607541157727031333
+0.20180562931492299522039298991
+0.141617985484156487873959992919
+0.141617985484156487873959992919
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-sd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-sd1.out
new file mode 100644
index 0000000..3c7d01b
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-sd1.out
@@ -0,0 +1,10 @@
+0
+0
+0
+0
+0
+-0.0571586133658836594711681330953
+0.182117871401956949222938992487
+-0.0106420313043059708094345932013
+-0.0571586133658836594711681330953
+-0.0571586133658836594711681330953
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-sd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-sd2.out
new file mode 100644
index 0000000..a75f4bf
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-sd2.out
@@ -0,0 +1,10 @@
+0
+0
+0
+0
+0
+0.0260840659516913253477356510142
+-0.0582633784703504551925621226427
+-0.0199888193847235208506448304
+0.0260840659516913253477356510142
+0.0260840659516913253477356510142
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-w.out
new file mode 100644
index 0000000..f945be9
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-w.out
@@ -0,0 +1,10 @@
+0.153723457880188582893778807157
+0.0830937610163181529155561119769
+0.0830937610163181529155561119769
+0.0477789125843829379264447643867
+0.0830937610163181529155561119769
+0.0777789125843829379264447643867
+0.205044658300579520108590010114
+0.110834950432745686545183789251
+0.0777789125843829379264447643867
+0.0777789125843829379264447643867
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-wd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-wd1.out
new file mode 100644
index 0000000..2efed17
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-wd1.out
@@ -0,0 +1,10 @@
+0.364421098663326819301376510395
+0.152068831160545225073416431578
+0.152068831160545225073416431578
+0.0458926974091544279594363921687
+0.152068831160545225073416431578
+-0.154107302590845572040563607831
+-0.223484931881816735164790614837
+-0.180713449899763471194580758966
+-0.154107302590845572040563607831
+-0.154107302590845572040563607831
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-wd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-wd2.out
new file mode 100644
index 0000000..eeb0a0e
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf-wd2.out
@@ -0,0 +1,10 @@
+0.640067273101262431012838122167
+0.181583573394687557224867707574
+0.181583573394687557224867707574
+-0.0476582764585998796691174997225
+0.181583573394687557224867707574
+-0.0476582764585998796691174997225
+-0.772164540906575183441915201064
+-0.222020346544350400569056044936
+-0.0476582764585998796691174997225
+-0.0476582764585998796691174997225
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf.bin
new file mode 100644
index 0000000..95d79ca
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf.bin differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf.mw
new file mode 100644
index 0000000..8a559d3
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-2ndHalf.mw
@@ -0,0 +1 @@
+v := Vector( [ 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.200000, 0.200000, 0.200000, 0.200000, 0.200000 ] );
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-s.out
new file mode 100644
index 0000000..5d81242
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-s.out
@@ -0,0 +1,10 @@
+0.311475409836065573770491803279
+0.0824458071967787427652527933941
+0.193990134580655865330006572692
+0
+0.160722528167920583161571235007
+0
+0.125683060109289617486338797814
+0
+0.125683060109289617486338797814
+0
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-sd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-sd1.out
new file mode 100644
index 0000000..8dd4b87
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-sd1.out
@@ -0,0 +1,10 @@
+0.143330645883723013526829705276
+0.0874368907818417940529636076246
+-0.0224898270787909143812698322857
+0
+-0.0171701817418098751627505402467
+0
+-0.0955537639224820090178864701842
+0
+-0.0955537639224820090178864701842
+0
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-sd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-sd2.out
new file mode 100644
index 0000000..985658b
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-sd2.out
@@ -0,0 +1,10 @@
+0.0313291029254039373829135967817
+-0.0288583180492188421742568077191
+-0.0149846846363010065717340598432
+0
+0.0542860369939878278736287331562
+0
+-0.0208860686169359582552757311878
+0
+-0.0208860686169359582552757311878
+0
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-w.out
new file mode 100644
index 0000000..e8ea91f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-w.out
@@ -0,0 +1,10 @@
+0.22283286165645889821992771648
+0.0999416371067599990588396586403
+0.126566783077955016575383060742
+0.0461507542986291170143018578248
+0.118625950069002116614308712747
+0.0461507542986291170143018578248
+0.151664926019761009728953272691
+0.0657648248755464917453801474004
+0.0761507542986291170143018578248
+0.0461507542986291170143018578248
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-wd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-wd1.out
new file mode 100644
index 0000000..cdbddf9
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-wd1.out
@@ -0,0 +1,10 @@
+-0.0278901855249164676397303024376
+0.0650484425557264787000439344416
+-0.118449111370847529055691462643
+0.0521059233493591702610212756938
+-0.0702776700110770767620402228483
+0.0521059233493591702610212756938
+0.0458185130315265375842267039362
+0.0973263179221513761291062467761
+-0.147894076650640829738978724306
+0.0521059233493591702610212756938
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-wd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-wd2.out
new file mode 100644
index 0000000..4b77f0c
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate-wd2.out
@@ -0,0 +1,10 @@
+-0.198118208258142936697349618259
+-0.138747724929092205560406319814
+-0.060175267612742244897213616938
+0.00527587517717077757660093002749
+0.011356534638033068913472178548
+0.00527587517717077757660093002749
+0.304957119976433679627416055371
+0.059624045476827528307677600983
+0.00527587517717077757660093002749
+0.00527587517717077757660093002749
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate.bin
new file mode 100644
index 0000000..359fc7e
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate.bin differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate.mw
new file mode 100644
index 0000000..21fce52
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-alternate.mw
@@ -0,0 +1 @@
+v := Vector( [ 0.200000, 0.000000, 0.200000, 0.000000, 0.200000, 0.000000, 0.200000, 0.000000, 0.200000, 0.000000 ] );
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-s.out
new file mode 100644
index 0000000..ef6ecfc
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-s.out
@@ -0,0 +1,10 @@
+0.195040309428815228958214590465
+0.105427194285846069707143021873
+0.105427194285846069707143021873
+0.0606206367143614900816072375769
+0.105427194285846069707143021873
+0.0606206367143614900816072375769
+0.159811153538235478227637080062
+0.0863844073179651233662903135471
+0.0606206367143614900816072375769
+0.0606206367143614900816072375769
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-sd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-sd1.out
new file mode 100644
index 0000000..6e0b7cd
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-sd1.out
@@ -0,0 +1,10 @@
+0.1340476971800183519021335099
+0.0154705421049579903756705340685
+0.0154705421049579903756705340685
+-0.043818035432572190387560953847
+0.0154705421049579903756705340685
+-0.043818035432572190387560953847
+0.0269432003696310647825694437265
+-0.0321303821342346262614707404436
+-0.043818035432572190387560953847
+-0.043818035432572190387560953847
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-sd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-sd2.out
new file mode 100644
index 0000000..0387e87
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-sd2.out
@@ -0,0 +1,10 @@
+0.0861205210449606903359505917025
+0.0298267226135376808565456884138
+0.0298267226135376808565456884138
+0.0016798233978261761168432367694
+0.0298267226135376808565456884138
+0.0016798233978261761168432367694
+-0.140895695386208547951901262571
+-0.0414242870906698894210593414506
+0.0016798233978261761168432367694
+0.0016798233978261761168432367694
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-w.out
new file mode 100644
index 0000000..ef6ecfc
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-w.out
@@ -0,0 +1,10 @@
+0.195040309428815228958214590465
+0.105427194285846069707143021873
+0.105427194285846069707143021873
+0.0606206367143614900816072375769
+0.105427194285846069707143021873
+0.0606206367143614900816072375769
+0.159811153538235478227637080062
+0.0863844073179651233662903135471
+0.0606206367143614900816072375769
+0.0606206367143614900816072375769
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-wd1.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-wd1.out
new file mode 100644
index 0000000..6e0b7cd
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-wd1.out
@@ -0,0 +1,10 @@
+0.1340476971800183519021335099
+0.0154705421049579903756705340685
+0.0154705421049579903756705340685
+-0.043818035432572190387560953847
+0.0154705421049579903756705340685
+-0.043818035432572190387560953847
+0.0269432003696310647825694437265
+-0.0321303821342346262614707404436
+-0.043818035432572190387560953847
+-0.043818035432572190387560953847
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-wd2.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-wd2.out
new file mode 100644
index 0000000..0387e87
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform-wd2.out
@@ -0,0 +1,10 @@
+0.0861205210449606903359505917025
+0.0298267226135376808565456884138
+0.0298267226135376808565456884138
+0.0016798233978261761168432367694
+0.0298267226135376808565456884138
+0.0016798233978261761168432367694
+-0.140895695386208547951901262571
+-0.0414242870906698894210593414506
+0.0016798233978261761168432367694
+0.0016798233978261761168432367694
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform.bin
new file mode 100644
index 0000000..e8d5157
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform.bin
@@ -0,0 +1 @@
+?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š?¹™™™™™š
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform.mw
new file mode 100644
index 0000000..f356814
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph-preferenceVector-uniform.mw
@@ -0,0 +1 @@
+v := Vector( [ 0.100000, 0.100000, 0.100000, 0.100000, 0.100000, 0.100000, 0.100000, 0.100000, 0.100000, 0.100000 ] );
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.graph
new file mode 100644
index 0000000..89f0d1f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.graph
@@ -0,0 +1 @@
+¾±ù/AZêþ€
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.graph-txt
new file mode 100644
index 0000000..605335f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.graph-txt
@@ -0,0 +1,11 @@
+10
+
+0 4
+0 1
+
+0 2
+
+
+6
+
+6 7
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.mw
new file mode 100644
index 0000000..c7c0df3
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.mw
@@ -0,0 +1,5 @@
+eG := [ [2,1],[2,5],
+[3,1],[3,2],
+[5,1],[5,3],
+[8,7],
+[10,7],[10,8] ];
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.offsets
new file mode 100644
index 0000000..7feb998
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.offsets
@@ -0,0 +1 @@
+¡ÃPE!Hh
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.properties
new file mode 100644
index 0000000..9ab2ef2
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graph.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Wed Apr 19 15:25:01 CEST 2006
+bitspernode=7.200
+arcs=9
+nodes=10
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=8.000
+avgdist=0.000
+compressionflags=
+version=0
+zetak=3
+avgref=0.000
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.graph
new file mode 100644
index 0000000..89f0d1f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.graph
@@ -0,0 +1 @@
+¾±ù/AZêþ€
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.graph-txt
new file mode 100644
index 0000000..605335f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.graph-txt
@@ -0,0 +1,11 @@
+10
+
+0 4
+0 1
+
+0 2
+
+
+6
+
+6 7
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.offsets
new file mode 100644
index 0000000..7feb998
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.offsets
@@ -0,0 +1 @@
+¡ÃPE!Hh
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.properties
new file mode 100644
index 0000000..9ab2ef2
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test10-.7-2-2-2-5-graphT.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Wed Apr 19 15:25:01 CEST 2006
+bitspernode=7.200
+arcs=9
+nodes=10
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=8.000
+avgdist=0.000
+compressionflags=
+version=0
+zetak=3
+avgref=0.000
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf-s.out
new file mode 100644
index 0000000..be14c01
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf-s.out
@@ -0,0 +1,50 @@
+0.0115878506989996808239301388505
+0.0204810498645048675429689092548
+0.0231293561178089982015411934276
+0.0180776596219730939460348228867
+0.0340080865307967795549584469812
+0.0372039692484836782967728825343
+0.0287674369904163031676681676372
+0.0155782720782595921092487304282
+0.0243956255776320057916224493048
+0.0115878506989996808239301388505
+0.0175482066970895378493168540328
+0.0139612659024092540047351070488
+0.0203960843367891358795345104172
+0.0287082675733375347426451323526
+0.0296169184207411770026450738875
+0.0234730669368230075606975975159
+0.0155782720782595921092487304282
+0.0231758393114443262853858630124
+0.0190539692108333552527882030123
+0.0218781554790541794170204973275
+0.0189542521352214129254996990051
+0.0189542521352214129254996990051
+0.0338961096524414599538124374367
+0.0337221493695600783029699782665
+0.020119168173559620708071140525
+0.00884396157464312810594608964072
+0.0330816368601849997770049498558
+0.0165730557720167821247128180257
+0.00563673859879957198999426258196
+0.0146413319651451475962671516815
+0.0136572369609719680258372802445
+0.0330246567317904335413781501861
+0.0179377786925584404724320870036
+0.0247767053643979908024332335372
+0.0271061499651860911723476337656
+0.0208093213521014778963437403311
+0.0106749349711568926769230650761
+0.0190047950992735582027033629289
+0.0203917622961041151840851490647
+0.0141398315062015341013467491207
+0.0248029800463271393808208228773
+0.018144562191385615293667653551
+0.0184019700725037525893437549395
+0
+0.0242834702011414640699275898304
+0.0209589329036361155816569781321
+0.014957058655260407240851585628
+0.00265663316531372178635248896119
+0.00707677410018115106116463619008
+0.0245645861130587361479123634172
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf-w.out
new file mode 100644
index 0000000..4d11573
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf-w.out
@@ -0,0 +1,50 @@
+0.00936728878527018253762845760888
+0.0165562979442682515453213101518
+0.0186971133647394677591854795334
+0.0146134656580843158749112723689
+0.0274911694880510123736356466923
+0.0300746301416311939564596973003
+0.0232547775166412394453727241989
+0.0125930318851248012580245257744
+0.0197207295657278127241666769941
+0.00936728878527018253762845760888
+0.0141854709786207322894213635679
+0.0112858901027351596838897079625
+0.0164876142292653613442895757225
+0.0232069466434767476373947929232
+0.0239414741338374189252961395023
+0.0189749594109095218846827539147
+0.0125930318851248012580245257744
+0.0187346890558450960679083576005
+0.0154026865498819588377576769386
+0.0176856783699355857287361559357
+0.0153220781032991667103202655779
+0.0153220781032991667103202655779
+0.027400650565769946346800284594
+0.0272600260229409759769339653674
+0.0162637630822615595647877911694
+0.0138126042642194959714941544703
+0.0385570535253510274752056597269
+0.0202623192373839933885263456583
+0.00943848553275340011899518755047
+0.0180903392558543314732102308003
+0.0182020514133866572969771032252
+0.0315293238135186321962650451154
+0.0220355865602386732449641871629
+0.0281463798267966339473300108327
+0.0331784951381305721452836661392
+0.024577212891645443373801639674
+0.0154528989485157291580648261999
+0.0239789642704780590875694684176
+0.0258257851090512031631003569802
+0.0185741281987090491972857323004
+0.0310262747339360610584812631984
+0.0234065566239947889053300757058
+0.0234982302176788279448644816159
+0.00336728878527018253762845760888
+0.0280082719339055827389227414683
+0.0249795871962769769144376099265
+0.0195601822476071915244491661349
+0.00551483544447664651897499082965
+0.0120092075292782105963676969162
+0.0311651069295309710435720319811
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf.bin
new file mode 100644
index 0000000..5c675ca
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf.bin differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf.mw
new file mode 100644
index 0000000..aa76011
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-1stHalf.mw
@@ -0,0 +1 @@
+v := Vector
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf-s.out
new file mode 100644
index 0000000..ed94d87
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf-s.out
@@ -0,0 +1,50 @@
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0
+0.0347724841290027082358275091314
+0.0616547247972807832924172501163
+0.0358252252168887293600002972506
+0.025475895342709792270067712819
+0.0326397409702669080667077461841
+0.0373740408040071885224715063402
+0.0252213640178616608865476015284
+0.0393219092003688256539595051548
+0.0423611214228209874881179522823
+0.0587942685543639347248827871052
+0.0404718060323544688876836495094
+0.0356084136877415261290114116541
+0.044962157220592121694817470942
+0.0487488392422863792049469839885
+0.037279905966698928430028335939
+0.0572788183690566596775747175965
+0.0456039207830283092442890740839
+0.0449964556035858936940331966967
+0.0175719665652520589585619580899
+0.0437210936674258746882162225996
+0.041940442749116608058287530983
+0.0389781447629524042770651455034
+0.0175719665652520589585619580899
+0.0328163411015515757827939233499
+0.059008953227533613813128553062
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf-w.out
new file mode 100644
index 0000000..1dd69b7
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf-w.out
@@ -0,0 +1,50 @@
+0.00459859606240910212059388766595
+0.00812782954383827038322258612391
+0.00917880017030216532365280197167
+0.00717405293825142483939568700154
+0.0134959844472357919267387510994
+0.014764258785870906706180094027
+0.0114162518922642193388755909568
+0.00618218015566997023782479910449
+0.0096813145625866342825818197196
+0.00459859606240910212059388766595
+0.0069639414862795175983257600077
+0.00554047718362542424167938273006
+0.00809411128569504076077945551024
+0.0113927707260444168160515565574
+0.0117533654832184795661210102862
+0.00931520054859334186606418493258
+0.00618218015566997023782479910449
+0.00919724685526317126207378164074
+0.00756149781889806758397875991946
+0.00868226578440794975972500827009
+0.00752192546305958463908546828179
+0.00752192546305958463908546828179
+0.0134515468336298375861289851896
+0.0133825113332032612644116747565
+0.00798421812164567888053170662246
+0.0244828450216524139321546147811
+0.0503155996142353692694165880079
+0.0281850723618244543771149512253
+0.0176027987302432429666039171095
+0.0254971383487362099545417998546
+0.0279621141578582390406729103046
+0.0283180721201843967648519778939
+0.0308356954144293750310255675412
+0.0353828103842598922250949733208
+0.0462189548784701616798181307327
+0.0326688210805724983801428873297
+0.0257136540790295182776830879445
+0.0346610732368360058126508889476
+0.0374954372289688724991476580428
+0.0280968522614002633057084419919
+0.0443909010051204275398545128505
+0.0347067750071763417844044459886
+0.0344425315307129608608286328211
+0.0105985960624091021205938876659
+0.0360073440639413851039327126011
+0.0336140074967920769678838713524
+0.0294454648157419931784993239055
+0.0116528711933729245914176386702
+0.0226016883125891491066135351163
+0.0453398324303118112478161365728
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf.bin
new file mode 100644
index 0000000..abd40d0
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf.bin differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf.mw
new file mode 100644
index 0000000..31345ce
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-2ndHalf.mw
@@ -0,0 +1 @@
+v := Vector
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate-s.out
new file mode 100644
index 0000000..da63252
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate-s.out
@@ -0,0 +1,50 @@
+0.0146953256029343402202542904793
+0.00728845883439376244836673978426
+0.0221418886793613702034519423563
+0.000200767291470815917245602657377
+0.0287760226894784848896865380813
+0.00793600883687232933383232437839
+0.0226719220238892500399610368664
+0.000930024953136867851946541721674
+0.0210282066093785298293792982454
+0
+0.015625350556071208072200832201
+0
+0.0197019537704139934315730917625
+0.00707598440246788384054054328909
+0.0251048098237213151887515578069
+0.00547073501845216383497965718632
+0.015625350556071208072200832201
+0.00538958252463251385710117435868
+0.0189300330593103931144574621335
+0.00325451866948852547140917078743
+0.0181118342801306349500413007336
+0.00341650867719629472978701025429
+0.0336740352224247724280193535748
+0.0123544760528577216466315336523
+0.0201207725517620717861438885972
+0.00914256224510773520059025628225
+0.0494153320487165676342901163425
+0.0161686880509500361414011783997
+0.0208329676067646001155076232529
+0.0196838398799254293082900869107
+0.0266968172968816465734211803004
+0.0183249706636254908867084952043
+0.033839256921373425682777278246
+0.0230038979601006689701786115804
+0.0531273801647176199406458105484
+0.0215119111649593769425653088994
+0.0285914990739058454595731946556
+0.0217645760401581665142166538262
+0.0457903325819121189445038749964
+0.017163037525478609679224364849
+0.0452235353465301129816187564863
+0.0208527020614804146614239750313
+0.0365035305406631650490489517279
+0
+0.0414019349461977742651936242856
+0.0257761025654752771752741010889
+0.0321885153897915683820757477351
+0.000395191552723606664385399309902
+0.0254839305464902366791135448333
+0.0375929171401540549900101420983
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate-w.out
new file mode 100644
index 0000000..7e2f303
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate-w.out
@@ -0,0 +1,50 @@
+0.0101318552363143932003092894922
+0.0102787117451086159857091124809
+0.0172875655343530689853911583351
+0.00652788447690012215871915599535
+0.0238752423811057305794960853916
+0.0165059576782714455875944236582
+0.0195143329750887831997967827641
+0.00593443405437913529224732855403
+0.0172843678366215928912198780913
+0.00413185523631439320030928949215
+0.0126368494445525821362999077677
+0.00497813883893300385579432468934
+0.0153167600583691137910092739845
+0.0131255213919885687569428617364
+0.0208105620530567352014142795053
+0.0106034048121455417172825827169
+0.011934434054379135292247328554
+0.0104642889879744972250640819062
+0.0145230350295488246987532807768
+0.0091298447950785241011600228966
+0.0141534142109307323582987295878
+0.00815341421093073235829872958777
+0.0258351397297589548375738199311
+0.0170684822449670501772971915548
+0.0153890214692703668003088998645
+0.015062685902795183587703484186
+0.0464692201361375204209939803272
+0.02093489174055876411964778201
+0.0165062141451484979904809239655
+0.0209322816371888888569188837833
+0.0245579539336033055891018286349
+0.0251880176764154832832226639947
+0.0294584862651117654348944456121
+0.0281876629372427486250914870644
+0.0451815522089726022928226149472
+0.0257196015681900950291628110445
+0.0238529785033090047555321733272
+0.0262351836499053170136335501252
+0.0374296788826566193845640741132
+0.0208153203999924190636791261133
+0.0407768890477619387291545535032
+0.0257070440477764890883508169904
+0.0320461138807363127253476363982
+0.00413185523631439320030928949215
+0.0358433651826575355406228322034
+0.0278593219712262413740871125626
+0.0276408384620476800196819367381
+0.0052404792471453568411644351296
+0.0206446659198557955538676603741
+0.0379831789309083970914260780449
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate.bin
new file mode 100644
index 0000000..ac4ead9
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate.bin differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate.mw
new file mode 100644
index 0000000..28be12f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate.mw
@@ -0,0 +1 @@
+v := Vector
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate_MAS.bak b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate_MAS.bak
new file mode 100644
index 0000000..5420b31
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-alternate_MAS.bak
@@ -0,0 +1,2 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<Worksheet><Version major="6" minor="1"/><View-Properties><Zoom percentage="100"/></View-Properties><Styles/><Group><Text-field><Font background="[0,0,0]" family="Times New Roman">v := Vector( [ 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000, 0.040000, 0.000000</Font></Text-field></Group><Text-field/></Worksheet>
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform-s.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform-s.out
new file mode 100644
index 0000000..3c96aa9
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform-s.out
@@ -0,0 +1,50 @@
+0.00698294242383964232911117263742
+0.0123420637440532609642719481379
+0.0139379567675208165414191407526
+0.0108937592981678703571534796852
+0.0204935769676434021501871988959
+0.0224194444637510503313198956637
+0.0173355147044527293921241575778
+0.00938760602039738574792466243944
+0.0147010220641572235033742483569
+0.00698294242383964232911117263742
+0.0105747062324501249438735617878
+0.00841318364318029196278454534628
+0.0122908627574802010525345156164
+0.0172998586847605822267231747403
+0.0178474198085279492457085748943
+0.0141450799797514318753734694236
+0.00938760602039738574792466243944
+0.0139659679555541336649910696206
+0.011482092184390013210868218429
+0.0131839720771717677442305821029
+0.0114220017831793756747028669299
+0.0114220017831793756747028669299
+0.0204260986996998919664646348918
+0.0203212686780721186206728200619
+0.0121239906019536192226597488959
+0.0191477246429359549518243846257
+0.0444363265697931983723111238674
+0.0242236957996042238828206484418
+0.01352064213149832154279955233
+0.0217937388022952707138760153275
+0.0230820827856224481688250067649
+0.0299236979668515144805585115046
+0.0264356409873340241379948773521
+0.0317645951055282630862124920767
+0.0396987250083003669125508984359
+0.0286230169861089708769722635018
+0.0205832765137726237178739570722
+0.0293200187536570324501101786826
+0.0316606111690100378311240075115
+0.0233354902300546562514970871461
+0.0377085878695282442991678880245
+0.0290566658155855653448672608472
+0.0289703808741958944028465572185
+0.00698294242383964232911117263742
+0.0320078079989234839214277270347
+0.0292967973465345269411607406394
+0.0245028235316745923514742450202
+0.00858385331892478555519631474991
+0.0173054479209336798514906160162
+0.0382524696799213911456940842769
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform-w.out b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform-w.out
new file mode 100644
index 0000000..3c96aa9
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform-w.out
@@ -0,0 +1,50 @@
+0.00698294242383964232911117263742
+0.0123420637440532609642719481379
+0.0139379567675208165414191407526
+0.0108937592981678703571534796852
+0.0204935769676434021501871988959
+0.0224194444637510503313198956637
+0.0173355147044527293921241575778
+0.00938760602039738574792466243944
+0.0147010220641572235033742483569
+0.00698294242383964232911117263742
+0.0105747062324501249438735617878
+0.00841318364318029196278454534628
+0.0122908627574802010525345156164
+0.0172998586847605822267231747403
+0.0178474198085279492457085748943
+0.0141450799797514318753734694236
+0.00938760602039738574792466243944
+0.0139659679555541336649910696206
+0.011482092184390013210868218429
+0.0131839720771717677442305821029
+0.0114220017831793756747028669299
+0.0114220017831793756747028669299
+0.0204260986996998919664646348918
+0.0203212686780721186206728200619
+0.0121239906019536192226597488959
+0.0191477246429359549518243846257
+0.0444363265697931983723111238674
+0.0242236957996042238828206484418
+0.01352064213149832154279955233
+0.0217937388022952707138760153275
+0.0230820827856224481688250067649
+0.0299236979668515144805585115046
+0.0264356409873340241379948773521
+0.0317645951055282630862124920767
+0.0396987250083003669125508984359
+0.0286230169861089708769722635018
+0.0205832765137726237178739570722
+0.0293200187536570324501101786826
+0.0316606111690100378311240075115
+0.0233354902300546562514970871461
+0.0377085878695282442991678880245
+0.0290566658155855653448672608472
+0.0289703808741958944028465572185
+0.00698294242383964232911117263742
+0.0320078079989234839214277270347
+0.0292967973465345269411607406394
+0.0245028235316745923514742450202
+0.00858385331892478555519631474991
+0.0173054479209336798514906160162
+0.0382524696799213911456940842769
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform.bin b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform.bin
new file mode 100644
index 0000000..da382d3
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform.bin
@@ -0,0 +1 @@
+?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{?”záG®{
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform.mw
new file mode 100644
index 0000000..290729c
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph-preferenceVector-uniform.mw
@@ -0,0 +1 @@
+v := Vector
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.graph
new file mode 100644
index 0000000..a43a87f
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.graph differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.graph-txt
new file mode 100644
index 0000000..0f9935b
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.graph-txt
@@ -0,0 +1,51 @@
+50
+
+5 8 14
+22
+3 4 5 19
+1 2 6 12 14 18
+5 6 13 23
+1 2 4 5 19 24
+2 3 4 18 22
+
+6 8 10 15 24
+4 6 20 21 23 24
+11 14 17 19 23
+4 13 17 23
+13 15 17 20 21
+4 8 14 15 23
+5 7 10 12 16
+29 42 46
+31
+31 34 35 38 42 44
+26 30 36 40 41 45 47
+26 35 37 38 40
+31 35 37 41 44
+29 34 45 49
+27 32 33
+26 31 39 44
+
+33 34 40 45
+26 30 32 46 49
+28 35 44 45 49
+35 39 40 45
+
+26 28 30 33 35 44
+32 34 35 38 39 41
+37 39 42 44 46
+
+25 26
+26 29 32
+
+
+
+27 33 36 42 46 48
+
+27 30 31 34 37
+29 34 42
+34 36 38 41 49
+26 38 40 42 48
+38 44 49
+27 30 34 35 36 41
+
+37 40 41 49
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.mw b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.mw
new file mode 100644
index 0000000..85be4bc
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.mw
@@ -0,0 +1,40 @@
+eG:=[ [2,6],[2,9],[2,15],
+[3,23],
+[4,4],[4,5],[4,6],[4,20],
+[5,2],[5,3],[5,7],[5,13],[5,15],[5,19],
+[6,6],[6,7],[6,14],[6,24],
+[7,2],[7,3],[7,5],[7,6],[7,20],[7,25],
+[8,3],[8,4],[8,5],[8,19],[8,23],
+[10,7],[10,9],[10,11],[10,16],[10,25],
+[11,5],[11,7],[11,21],[11,22],[11,24],[11,25],
+[12,12],[12,15],[12,18],[12,20],[12,24],
+[13,5],[13,14],[13,18],[13,24],
+[14,14],[14,16],[14,18],[14,21],[14,22],
+[15,5],[15,9],[15,15],[15,16],[15,24],
+[16,6],[16,8],[16,11],[16,13],[16,17],
+[17,30],[17,43],[17,47],
+[18,32],
+[19,32],[19,35],[19,36],[19,39],[19,43],[19,45],
+[20,27],[20,31],[20,37],[20,41],[20,42],[20,46],[20,48],
+[21,27],[21,36],[21,38],[21,39],[21,41],
+[22,32],[22,36],[22,38],[22,42],[22,45],
+[23,30],[23,35],[23,46],[23,50],
+[24,28],[24,33],[24,34],
+[25,27],[25,32],[25,40],[25,45],
+[27,34],[27,35],[27,41],[27,46],
+[28,27],[28,31],[28,33],[28,47],[28,50],
+[29,29],[29,36],[29,45],[29,46],[29,50],
+[30,36],[30,40],[30,41],[30,46],
+[32,27],[32,29],[32,31],[32,34],[32,36],[32,45],
+[33,33],[33,35],[33,36],[33,39],[33,40],[33,42],
+[34,38],[34,40],[34,43],[34,45],[34,47],
+[36,26],[36,27],
+[37,27],[37,30],[37,33],
+[41,28],[41,34],[41,37],[41,43],[41,47],[41,49],
+[43,28],[43,31],[43,32],[43,35],[43,38],
+[44,30],[44,35],[44,43],
+[45,35],[45,37],[45,39],[45,42],[45,50],
+[46,27],[46,39],[46,41],[46,43],[46,49],
+[47,39],[47,45],[47,50],
+[48,28],[48,31],[48,35],[48,36],[48,37],[48,42],
+[50,38],[50,41],[50,42],[50,50] ];
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.offsets
new file mode 100644
index 0000000..b8da41b
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.offsets
@@ -0,0 +1 @@
+ ¸à |2ƒÈ> aàÐ!±À„@ Ðh> D ˆa$ |6Á` ð
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.properties
new file mode 100644
index 0000000..5558097
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graph.properties
@@ -0,0 +1,16 @@
+#BVGraph properties
+#Tue Apr 04 18:40:21 CEST 2006
+bitspernode=22,56
+arcs=179
+nodes=50
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=2147483647
+windowsize=7
+minintervallength=3
+bitsperlink=6,30
+avgdist=,26
+compressionflags=
+version=0
+basename=test50-.6-7-3-2-10-graph
+avgref=,14
+zetak=3
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.graph
new file mode 100644
index 0000000..2dd82c4
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.graph
@@ -0,0 +1 @@
+¿èŽóû•FÐUD%òdZ‰-íõˌ@š‡ÂëWöM&EÑ(eë–M‚õê–ÛÚ4Ô²!&Éqj¢]4ËÍB]kå—ZäÞauVA†×%ڑu,Ü.©z„MlœœNÅSåDåؕrͱDˆ|âFJ.½·µNµ >ÙHÒ]t¤ÿuº]åÒSn—• ¾#ïE'¨i÷mŠ`
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.offsets
new file mode 100644
index 0000000..694ab3c
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.offsets
@@ -0,0 +1 @@
+¡Á¡ BC„…áÀ° H08H…ÇÐøÐø%aÀø1À€‚0Ø'@Ðø|hH€
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.properties
new file mode 100644
index 0000000..4119cea
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/rank/test50-.6-7-3-2-10-graphT.properties
@@ -0,0 +1,16 @@
+#BVGraph properties
+#Wed Apr 05 08:28:23 CEST 2006
+bitspernode=24,00
+arcs=179
+nodes=50
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=2147483647
+windowsize=7
+minintervallength=3
+bitsperlink=6,70
+avgdist=,40
+compressionflags=
+version=0
+basename=test50-.6-7-3-2-10-graphT
+avgref=,18
+zetak=3
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t-arc.presence b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t-arc.presence
new file mode 100644
index 0000000..8429e35
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t-arc.presence
@@ -0,0 +1,5 @@
+0,1,100
+2,0,110
+2,1,111
+3,1,011
+3,2,010
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.graph
new file mode 100644
index 0000000..142f006
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.graph differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.graph-txt
new file mode 100644
index 0000000..9b02570
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.graph-txt
@@ -0,0 +1,5 @@
+4
+1
+
+0 1
+1 2
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.offsets
new file mode 100644
index 0000000..348785b
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.offsets
@@ -0,0 +1 @@
+ŠCF€
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.properties
new file mode 100644
index 0000000..f6608a4
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Thu Dec 21 17:36:56 CET 2006
+bitspernode=10.000
+arcs=5
+nodes=4
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=8.000
+avgdist=0.000
+compressionflags=
+version=0
+zetak=3
+avgref=0.000
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.graph
new file mode 100644
index 0000000..0349049
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.graph
@@ -0,0 +1 @@
+]ߐ
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.graph-txt
new file mode 100644
index 0000000..16d9c37
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.graph-txt
@@ -0,0 +1,4 @@
+3
+1
+
+0 1
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.offsets
new file mode 100644
index 0000000..b654493
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.offsets
@@ -0,0 +1 @@
+ŠC@
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.properties
new file mode 100644
index 0000000..9b7022d
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t0.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Thu Dec 21 17:36:41 CET 2006
+bitspernode=8.000
+arcs=3
+nodes=3
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=8.000
+avgdist=0.000
+compressionflags=
+version=0
+zetak=3
+avgref=0.000
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.graph
new file mode 100644
index 0000000..8359455
Binary files /dev/null and b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.graph differ
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.graph-txt
new file mode 100644
index 0000000..377f5a0
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.graph-txt
@@ -0,0 +1,5 @@
+4
+
+
+0 1
+1 2
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.offsets
new file mode 100644
index 0000000..ff4393e
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.offsets
@@ -0,0 +1 @@
+¤4h
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.properties
new file mode 100644
index 0000000..e2e8057
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t1.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Thu Dec 21 17:36:45 CET 2006
+bitspernode=8.000
+arcs=4
+nodes=4
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=8.000
+avgdist=0.000
+compressionflags=
+version=0
+zetak=3
+avgref=0.000
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.graph b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.graph
new file mode 100644
index 0000000..22a0c7f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.graph
@@ -0,0 +1 @@
+×@
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.graph-txt b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.graph-txt
new file mode 100644
index 0000000..b6b91c0
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.graph-txt
@@ -0,0 +1,4 @@
+3
+
+
+1
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.offsets b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.offsets
new file mode 100644
index 0000000..5246a0f
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.offsets
@@ -0,0 +1 @@
+¤(
\ No newline at end of file
diff --git a/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.properties b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.properties
new file mode 100644
index 0000000..5244fbf
--- /dev/null
+++ b/third_party/law-2.5.1/data/it/unimi/dsi/law/urlalign/t2.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Thu Dec 21 17:36:48 CET 2006
+bitspernode=5.333
+arcs=1
+nodes=3
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=16.000
+avgdist=0.000
+compressionflags=
+version=0
+zetak=3
+avgref=0.000
diff --git a/third_party/law-2.5.1/ivy.xml b/third_party/law-2.5.1/ivy.xml
new file mode 100644
index 0000000..ca192b9
--- /dev/null
+++ b/third_party/law-2.5.1/ivy.xml
@@ -0,0 +1,72 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<ivy-module version="2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ant.apache.org/ivy/schemas/ivy.xsd">
+ <info organisation="it.unimi.dsi.law" module="law-library"/>
+
+ <configurations defaultconf="runtime" defaultconfmapping="*->default,javadoc">
+ <conf name="runtime"/>
+ <conf name="compile" extends="runtime"/>
+ <conf name="test" extends="compile"/>
+ </configurations>
+
+ <publications>
+ <artifact name="law-library" type="jar"/>
+ <artifact name="law-library" type="pom"/>
+ </publications>
+
+ <dependencies>
+ <dependency org="org.slf4j" name="slf4j-api" rev="latest.release"/>
+ <dependency org="org.slf4j" name="log4j-over-slf4j" rev="latest.release"/>
+ <dependency org="org.slf4j" name="jcl-over-slf4j" rev="latest.release"/>
+ <dependency org="ch.qos.logback" name="logback-classic" rev="latest.release" conf="runtime"/>
+
+ <dependency org="it.unimi.di.law" name="jericho-html-dev" rev="20131217" conf="runtime->default"/>
+
+ <dependency org="it.unimi.dsi" name="fastutil" rev="latest.release" conf="runtime->default"/>
+ <dependency org="it.unimi.dsi" name="dsiutils" rev="latest.release" conf="runtime->default"/>
+
+ <dependency org="it.unimi.di" name="mg4j" rev="latest.release" />
+ <dependency org="it.unimi.di" name="mg4j-big" rev="latest.release" />
+
+ <dependency org="it.unimi.dsi" name="sux4j" rev="latest.release" />
+ <dependency org="it.unimi.dsi" name="webgraph" rev="latest.release" />
+ <dependency org="it.unimi.dsi" name="webgraph-big" rev="latest.release" />
+
+ <dependency org="dnsjava" name="dnsjava" rev="latest.release" conf="runtime->default"/>
+
+ <dependency org="org.eclipse.jetty.aggregate" name="jetty-all" rev="9.4.10.RC0"/>
+
+ <dependency org="org.apache.commons" name="commons-math3" rev="latest.release"/>
+ <dependency org="org.apache.httpcomponents" name="httpclient" rev="4.2.3"/>
+ <dependency org="org.apache.httpcomponents" name="httpasyncclient" rev="latest.release"/>
+ <dependency org="com.martiansoftware" name="jsap" rev="latest.release"/>
+ <dependency org="net.sf.jung" name="jung-api" rev="latest.release"/>
+ <dependency org="net.sf.jung" name="jung-algorithms" rev="latest.release"/>
+ <dependency org="mx4j" name="mx4j" rev="latest.release"/>
+ <dependency org="mx4j" name="mx4j-tools" rev="latest.release"/>
+ <dependency org="org.softee" name="pojo-mbean" rev="latest.release"/>
+ <dependency org="gnu.getopt" name="java-getopt" rev="latest.release"/>
+
+ <dependency org="com.google.guava" name="guava" rev="latest.release"/>
+ <dependency org="info.bliki.wiki" name="bliki-core" rev="3.1.0"/>
+
+ <dependency org="org.wikidata.wdtk" name="wdtk-dumpfiles" rev="latest.release"/>
+
+ <dependency org="net.java.dev.javacc" name="javacc" rev="5.0" conf="compile"/>
+
+ <dependency org="org.apache.commons" name="commons-math" rev="latest.release" conf="test"/>
+ <dependency org="junit" name="junit" rev="latest.release" conf="test"/>
+ <dependency org="org.jacoco" name="org.jacoco.ant" rev="latest.release" conf="test"/>
+
+ <exclude org="org.slf4j" module="slf4j-log4j12"/>
+ <exclude org="log4j" module="log4j"/>
+ <exclude org="com.sun.jdmk"/>
+ <exclude org="com.sun.jmx"/>
+ <exclude org="javax.jms"/>
+
+ <!-- These dependency are missing in 9.1.1.v20140108 -->
+ <exclude org="org.eclipse.jetty" module="test-jetty-webapp"/>
+ <exclude org="org.eclipse.jetty" module="test-proxy-webapp"/>
+ <exclude org="org.eclipse.jetty.orbit"/>
+ <exclude org="org.eclipse.jetty.tests"/>
+ </dependencies>
+</ivy-module>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/Util.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/Util.java
new file mode 100644
index 0000000..c95fec7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/Util.java
@@ -0,0 +1,40 @@
+package it.unimi.dsi.law;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+// RELEASE-STATUS: DIST
+
+/** A static container of utility methods for all LAW software. */
+
+public final class Util {
+ /** Computes falling powers.
+ *
+ * @param n the base of the power.
+ * @param k the falling power.
+ * @return <code>n</code>(<code>n</code> &minus; 1)(<code>n</code> &minus; 2)&#x22ef;(<code>n</code> &minus; <code>k</code> + 1).
+ */
+
+ public static double falling(final int n, final int k) {
+ if (k > n) return 0;
+ if (k == 0) return 1;
+ double result = n;
+ for(int i = 1; i < k; i++) result *= n - i;
+ return result;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/PageRank.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/PageRank.java
new file mode 100644
index 0000000..1f2fc49
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/PageRank.java
@@ -0,0 +1,120 @@
+package it.unimi.dsi.law.big.rank;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.IOException;
+
+import org.slf4j.Logger;
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleBigList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.util.Properties;
+
+
+// RELEASE-STATUS: DIST
+
+/** A big version of {@link it.unimi.dsi.law.rank.PageRank}.
+ *
+ * @see it.unimi.dsi.law.rank.PageRank
+ * @see SpectralRanking
+ */
+
+public abstract class PageRank extends SpectralRanking {
+ /** The default damping factor. */
+ public final static double DEFAULT_ALPHA = 0.85;
+
+ /** The damping factor. In the random surfer interpretation, this is the probability that the
+ * surfer will follow a link in the current page. */
+ public double alpha = DEFAULT_ALPHA;
+ /** The preference vector to be used (or {@code null} if the uniform preference vector should be used). */
+ public DoubleBigList preference;
+ /** The vector used used to patch null rows of the adjacency matrix (<b><var>u</var></b> in the general formula).
+ * It coincides with the preference vector if {@link #stronglyPreferential} is true. If {@code null},
+ * the uniform distribution will be used. */
+ public DoubleBigList danglingNodeDistribution;
+ /** If not {@code null}, the set of buckets of {@link #graph}. */
+ public LongArrayBitVector buckets;
+ /** Decides whether we use the strongly or weakly (the default) preferential algorithm. */
+ public boolean stronglyPreferential;
+
+ /** Creates a new instance.
+ *
+ * @param g the graph.
+ * @param logger a logger.
+ */
+ public PageRank(final ImmutableGraph g, final Logger logger) {
+ super(g, logger);
+ }
+
+ /** Returns a {@link Properties} object that contains all the parameters used by the computation.
+ *
+ * @param graphBasename the basename of the graph.
+ * @param preferenceFilename the filename of preference vector. It can be {@code null}.
+ * @param danglingFilename the filename of dangling-node distribution. It can be {@code null}.
+ * @return a properties object that represent all the parameters used to calculate the rank.
+ */
+ public Properties buildProperties(final String graphBasename, final String preferenceFilename, final String danglingFilename) {
+ final Properties prop = super.buildProperties(graphBasename);
+ prop.setProperty("alpha", Double.toString(alpha));
+ prop.setProperty("norm", normDelta());
+ prop.setProperty("stronglypreferential", stronglyPreferential);
+ if (preferenceFilename != null) prop.setProperty("preferencefilename", preferenceFilename);
+ if (danglingFilename != null) prop.setProperty("danglingfilename", danglingFilename);
+ return prop;
+ }
+
+ /** Basic initialization: we log the damping factor, check that the preference vector is correctly sized and stochastic,
+ * fill {@link #rank} with the preference vector and set the dangling-node distribution
+ * depending on the value of {@link #stronglyPreferential}.
+ */
+ @Override
+ public void init() throws IOException {
+ super.init();
+ logger.info("Damping factor: " + alpha);
+
+ // Check the preference vector
+ if (preference != null) {
+ if (preference.size64() != n) throw new IllegalArgumentException("The preference vector size (" + preference.size64() + ") is different from graph dimension (" + n + ").");
+ if (! isStochastic(preference)) throw new IllegalArgumentException("The preference vector is not a stochastic vector. ");
+ logger.info("Using a specified preference vector");
+ }
+ else logger.info("Using the uniform preference vector");
+
+ if (preference != null) {
+ final DoubleIterator iterator = preference.iterator();
+ for(int s = 0; s < rank.length; s++) {
+ final double[] t = rank[s];
+ final int l = t.length;
+ for(int d = 0; d < l; d++) t[d] = iterator.nextDouble();
+ }
+ }
+ else DoubleBigArrays.fill(rank, 1.0/n);
+
+ // Initializes the preferentialAdjustment vector
+ if (stronglyPreferential) {
+ if (preference == null) throw new IllegalArgumentException("The strongly preferential flag is true but the preference vector has not been set.");
+ danglingNodeDistribution = preference;
+ }
+ else danglingNodeDistribution = null;
+ logger.info("Computing " + (stronglyPreferential ? "strongly" : "weakly") + " preferential PageRank");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/PageRankParallelGaussSeidel.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/PageRankParallelGaussSeidel.java
new file mode 100644
index 0000000..7a6bbcb
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/PageRankParallelGaussSeidel.java
@@ -0,0 +1,463 @@
+package it.unimi.dsi.law.big.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import java.io.DataInput;
+import java.io.IOException;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.bytes.ByteBigArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrayBigList;
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleBigList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A big version of {@link it.unimi.dsi.law.rank.PageRankParallelGaussSeidel}.
+ * @see it.unimi.dsi.law.rank.PageRankParallelGaussSeidel
+ * @see PageRank
+ * @see SpectralRanking
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class PageRankParallelGaussSeidel extends PageRank {
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankParallelGaussSeidel.class);
+
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** The rank lost through dangling nodes, accumulated incrementally. */
+ private double danglingRankAccumulator;
+ /** The amount of ranking in dangling nodes computed at the previous iteration. */
+ private double danglingRank;
+ /** The &#x2113;<sub>1</sub> norm of the difference between the new approximation and the previous one,
+ * if {@link #normVector} is {@code null}; the {@link #normVector}-weighted supremum norm of the same vector, otherwise. */
+ private double normDelta;
+ /** The outdegree of each node (initialized after the first computation). */
+ public long[][] outdegree;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** An big array of bytes containing the opposite of a lower bound on the binary logarithm of the elements of a norm vector, or {@code null} to stop the computation using residue estimation. */
+ private byte[][] normVector;
+ /** The value for which {@link #normVector} is suitable. */
+ private double sigma;
+ /** If true, an everywhere zero dangling-node distribution will be simulated, resulting in the computation of a pseudorank. */
+ public boolean pseudoRank;
+
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PageRankParallelGaussSeidel(final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ */
+ public PageRankParallelGaussSeidel(final ImmutableGraph transpose) {
+ this(transpose, 0, LOGGER);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PageRankParallelGaussSeidel(final ImmutableGraph transpose, final Logger logger) {
+ this(transpose, 0, logger);
+ }
+
+ /** Sets the norm vector.
+ *
+ * @param normVectorFilename a file containing a norm vector as a list of doubles in {@link DataInput} format, or {@code null} for no norm vector.
+ * @param sigma the value for which the provided norm vector is suitable.
+ */
+ public void normVector(final String normVectorFilename, final double sigma) throws IOException {
+ normVector = normVectorFilename == null ? null : approximateNormVector(BinIO.asDoubleIterator(normVectorFilename));
+ this.sigma = sigma;
+ }
+
+ /** Sets the norm vector.
+ *
+ * @param normVector the new norm vector.
+ * @param sigma the value for which the provided norm vector is suitable.
+ */
+ public void normVector(final double[][] normVector, final double sigma) {
+ final DoubleIterator it = (new DoubleBigArrayBigList(normVector)).listIterator();
+ this.normVector = approximateNormVector(it);
+ this.sigma = sigma;
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+
+ if (normVector != null) {
+ if (! pseudoRank) throw new IllegalStateException("Norm vectors can be used only when computing pseudoranks");
+ if (alpha >= 1 / sigma) throw new IllegalStateException("The specified norm vector can be used only with values of alpha smaller than " + 1 / sigma);
+ }
+
+ if (outdegree == null) {
+ // We allocate and compute the outdegree vector.
+ outdegree = LongBigArrays.newBigArray(n);
+ // TODO: refactor using .outdegrees()
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing outdegrees...");
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for (long i = n; i-- != 0;) {
+ nodeIterator.nextLong();
+ final long[][] pred = nodeIterator.successorBigArray();
+ for (long d = nodeIterator.outdegree(); d-- != 0;) {
+ LongBigArrays.incr(outdegree, LongBigArrays.get(pred, d));
+ }
+ progressLogger.lightUpdate();
+ }
+
+ progressLogger.done();
+ }
+
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing initial dangling rank...");
+
+ danglingRank = 0;
+ /* The number of dangling nodes. */
+ long danglingNodes = 0;
+ for (long i = n; i-- != 0;) {
+ final long o = LongBigArrays.get(outdegree, i);
+ if (o == 0 || buckets != null && buckets.getBoolean(i)) {
+ danglingRank += DoubleBigArrays.get(rank, i);
+ if (LongBigArrays.get(outdegree, i) == 0) danglingNodes++;
+ }
+ }
+
+ progressLogger.done();
+ logger.info(danglingNodes + " dangling nodes");
+ if (buckets != null) logger.info(buckets.count() + " buckets");
+ logger.info("Initial dangling rank: " + danglingRank);
+
+ normDelta = danglingRankAccumulator = 0;
+ completed = false;
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final long GRANULARITY = 10000;
+
+ @Override
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = PageRankParallelGaussSeidel.this.graph.copy();
+ final long n = PageRankParallelGaussSeidel.this.n;
+ final double oneMinusAlpha = 1 - alpha;
+ final double oneMinusAlphaOverN = oneMinusAlpha / n;
+ final double[][] rank = PageRankParallelGaussSeidel.this.rank;
+ final long[][] outdegree = PageRankParallelGaussSeidel.this.outdegree;
+ final LongArrayBitVector buckets = PageRankParallelGaussSeidel.this.buckets;
+ final boolean pseudoRank = PageRankParallelGaussSeidel.this.pseudoRank;
+ final double alpha = PageRankParallelGaussSeidel.this.alpha;
+ final DoubleBigList danglingNodeDistribution = PageRankParallelGaussSeidel.this.danglingNodeDistribution;
+ final DoubleBigList preference = PageRankParallelGaussSeidel.this.preference;
+ final KahanSummation s = new KahanSummation();
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final double danglingRank = PageRankParallelGaussSeidel.this.danglingRank;
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final long end = (Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ double danglingRankAccumulator = 0, norm = 0;
+ final NodeIterator nodeIterator = graph.nodeIterator(start);
+
+ for (long i = start; i < end; i++) {
+ nodeIterator.nextLong();
+ s.reset();
+ boolean hasLoop = false;
+
+ //Determine the rank from all incoming real links except possibly self link.
+ final long[][] pred = nodeIterator.successorBigArray();
+ for(long indegree = nodeIterator.outdegree(); indegree-- != 0;) {
+ final long currPred = LongBigArrays.get(pred, indegree);
+ // Skip buckets
+ if (buckets != null && buckets.getBoolean(currPred)) continue;
+ if (i == currPred) hasLoop = true;
+ else s.add(DoubleBigArrays.get(rank, currPred) / LongBigArrays.get(outdegree, currPred));
+ }
+
+ double selfDanglingRank, selfLoopFactor;
+ //Determine the diagonal rank contribution
+ final long o = LongBigArrays.get(outdegree, i);
+ if (o == 0 || buckets != null && buckets.getBoolean(i)) { //i is a dangling node
+ selfDanglingRank = DoubleBigArrays.get(rank, i);
+ selfLoopFactor = pseudoRank ? 1 :
+ (danglingNodeDistribution != null) ? 1 - alpha * danglingNodeDistribution.getDouble(i)
+ : 1.0 - alpha / n;
+ } else {
+ selfDanglingRank = 0;
+ selfLoopFactor = hasLoop ? 1 - alpha / o : 1; //i has no selfloop and it is not dangling
+ }
+
+ if (! pseudoRank) s.add(danglingNodeDistribution != null ? (danglingRank - selfDanglingRank) * danglingNodeDistribution.getDouble(i) : (danglingRank - selfDanglingRank) / n);
+
+ final double newRank = ((preference != null ? oneMinusAlpha * preference.getDouble(i) : oneMinusAlphaOverN) + alpha * s.value()) / selfLoopFactor;
+
+ if (LongBigArrays.get(outdegree, i) == 0 || buckets != null && buckets.getBoolean(i))
+ danglingRankAccumulator += newRank;
+
+ final double r = DoubleBigArrays.get(rank, i);
+ if (normVector != null) norm = Math.max(norm, Math.abs(newRank - r) * (1L << (0xFF & ByteBigArrays.get(normVector, i))));
+ else norm += Math.abs(newRank - r);
+
+ //update the rank
+ DoubleBigArrays.set(rank, i, newRank);
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+
+ synchronized (PageRankParallelGaussSeidel.this) {
+ PageRankParallelGaussSeidel.this.danglingRankAccumulator += danglingRankAccumulator;
+ if (normVector != null) PageRankParallelGaussSeidel.this.normDelta = Math.max(PageRankParallelGaussSeidel.this.normDelta, norm);
+ else PageRankParallelGaussSeidel.this.normDelta += norm;
+ }
+ }
+ }
+ }
+ catch(final Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, () -> {
+ if (iteration > 0) {
+ progressLogger.done();
+ iterationLogger.setAndDisplay(iteration);
+
+ /*
+ // Compute the supremum norm of the residual
+ double res = 0;
+ double res1 = 0;
+ double err = 0;
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = 0; i < n ; i++) {
+ nodeIterator.nextInt();
+ double prod = 0;
+ final LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) prod += rank[s] / outdegree[s];
+ final double pref = preference == null ? 1. / n : preference.getDouble(i);
+ final double delta = Math.abs(rank[i]
+ - alpha * prod
+ - alpha * danglingRankAccumulator * pref
+ - (1 - alpha) * pref);
+ if (res < delta) res = delta;
+ res1 += delta;
+ }
+
+ LOGGER.info("Supremum norm of the residual: " + res);
+ LOGGER.info("l_1 norm of the residual: " + res1);
+ LOGGER.info("Bound on the l_1 norm of the error: " + normDelta() / (1 - alpha));
+ LOGGER.info("Bound on the supremum norm of the error: " + (1 + alpha) * res / (1 - alpha));
+ LOGGER.info("Supremum norm of the error: " + err);
+ if (err > (1 + alpha) * res / (1 - alpha)) LOGGER.warn("Wrong bound on error");
+ if (res1 > normDelta()) LOGGER.warn("Wrong bound on residual: " + res1 + " > " + normDelta());
+ */
+
+ if (stoppingCriterion.shouldStop(PageRankParallelGaussSeidel.this)) {
+ completed = true;
+ return;
+ }
+
+ danglingRank = danglingRankAccumulator;
+ danglingRankAccumulator = 0;
+ }
+
+ normDelta = danglingRankAccumulator = 0;
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + iteration++ + "...");
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (final InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+ /** Return the following values: if a {@linkplain #normVector(double[], double) suitable norm vector has been set}, an upper bound on the error (the &#x2113;<sub>&#x221E;</sub> distance from the rank to be computed);
+ * otherwise, an upper bound to the &#x2113;<sub>1</sub> norm of the error, obtained multiplying by
+ * &alpha; / (1 &minus; &alpha;) the &#x2113;<sub>1</sub> norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
+ *
+ * @return an upper bound on the error.
+ */
+ @Override
+ public double normDelta() {
+ return normVector == null ? normDelta * alpha / (1 - alpha) : (alpha * sigma) * normDelta / (1 - alpha * sigma);
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ outdegree = null;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(PageRankParallelGaussSeidel.class.getName(), "Computes PageRank of a graph, given its transpose, using a parallel implementation of Gauss-Seidel's method."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "Damping factor."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold (in l_1 norm, if no norm vector has been specified; in the weighted supremum norm otherwise) to determine whether to stop."),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new Switch("pseudoRank", JSAP.NO_SHORTFLAG, "pseudorank", "Compute pseudoranks (the dangling preference is set to 0)."),
+ new FlaggedOption("normVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "norm-vector", "A vector inducing the correct weighted supremum norm."),
+ new FlaggedOption("sigma", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "sigma", "The value for which the norm vector is suitable (i.e., the maximum ratio from its properties)."),
+ new FlaggedOption("buckets", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'b', "buckets", "The buckets of the graph; if supplied, buckets will be treated as dangling nodes."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new Switch("strongly", 'S', "strongly", "use the preference vector to redistribute the dangling rank."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final boolean strongly = jsapResult.getBoolean("strongly", false);
+ final String graphBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String normVectorFilename = jsapResult.getString("normVector");
+ if (normVectorFilename != null && ! jsapResult.userSpecified("sigma")) throw new IllegalArgumentException("You must specify the sigma for which the norm vector is suitable");
+ final String buckets = jsapResult.getString("buckets");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ final ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+
+ DoubleBigList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector")) {
+ preferenceFilename = jsapResult.getString("preferenceVector");
+ final double[][] pref = DoubleBigArrays.newBigArray(graph.numNodes());
+ BinIO.loadDoubles(preferenceFilename, pref);
+ preference = new DoubleBigArrayBigList(pref);
+ }
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleBigList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ if (strongly && preference == null) throw new IllegalArgumentException("The 'strongly' option requires a preference vector");
+
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(graph, threads, LOGGER);
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.preference = preference;
+ pr.buckets = (LongArrayBitVector)(buckets == null ? null : BinIO.loadObject(buckets));
+ pr.stronglyPreferential = strongly;
+ pr.pseudoRank = jsapResult.userSpecified("pseudoRank");
+ if (normVectorFilename != null) pr.normVector(normVectorFilename, jsapResult.getDouble("sigma"));
+
+ // cycle until we reach maxIter iterations or the norm is less than the given threshold (whichever comes first)
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(graphBasename, preferenceFilename, null).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/SpectralRanking.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/SpectralRanking.java
new file mode 100644
index 0000000..2147674
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/big/rank/SpectralRanking.java
@@ -0,0 +1,253 @@
+package it.unimi.dsi.law.big.rank;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.IOException;
+
+import org.slf4j.Logger;
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.fastutil.bytes.ByteBigArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleBigList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.util.Properties;
+
+
+// RELEASE-STATUS: DIST
+
+/** A big version of {@link it.unimi.dsi.law.rank.SpectralRanking}.
+ * @see it.unimi.dsi.law.rank.SpectralRanking
+ * @author Sebastiano Vigna
+ */
+
+public abstract class SpectralRanking {
+ /** Default threshold (note that this value is used as a default by main methods). */
+ public final static double DEFAULT_THRESHOLD = 1E-6;
+ /** Default maximum number of iterations (note that this value is used as a default by main methods). */
+ public final static int DEFAULT_MAX_ITER = Integer.MAX_VALUE;
+ /** The default norm ({@link Norm#L_INFINITY}). */
+ public final static Norm DEFAULT_NORM = Norm.L_INFINITY;
+ /** The admitted tolerance in the {@linkplain #isStochastic(DoubleList) verification that a vector is a stochastic one}.
+ * A stochastic vector is nonnegative and has &#x2113;<sub>1</sub> norm equal to 1 &plusmn; {@link #STOCHASTIC_TOLERANCE}. */
+ protected final static double STOCHASTIC_TOLERANCE = 1E-6;
+
+ /** The graph. */
+ public final ImmutableGraph graph;
+ /** The number of nodes of {@link #graph}, cached. */
+ public final long n;
+ /** A logger defined by the implementing subclasses. */
+ public final Logger logger;
+ /** The current rank vector. */
+ public double[][] rank;
+ /** The current step (0 after {@linkplain #init() initialization}). */
+ public int iteration;
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph.
+ * @param logger a logger.
+ */
+ public SpectralRanking(final ImmutableGraph graph, final Logger logger) {
+ this.graph = graph;
+ this.logger = logger;
+ this.n = graph.numNodes();
+ logger.info("Nodes: " + n);
+ }
+
+
+ /** A a strategy that decides when a computation should be stopped. */
+ public interface StoppingCriterion {
+ /** Determines if the computation should be stopped.
+ *
+ * @param spectralRanking the instance incapsulating the computation.
+ * @return true if the computation should be stopped.
+ */
+ public boolean shouldStop(SpectralRanking spectralRanking);
+ };
+
+ /** A stopping criterion that stops whenever the number of iterations exceeds a given bound. */
+ public static class IterationNumberStoppingCriterion implements StoppingCriterion {
+ private final int maxIter;
+ /** Creates an instance with a given number of iterations.
+ *
+ * @param maxIter the maximum number of iterations.
+ */
+ public IterationNumberStoppingCriterion(final int maxIter) {
+ this.maxIter = maxIter;
+ }
+
+ @Override
+ public boolean shouldStop(final SpectralRanking spectralRanking) {
+ // If maxIter is infinity, we just return.
+ if (maxIter == Integer.MAX_VALUE) return false;
+ spectralRanking.logger.info("Iterations performed: " + spectralRanking.iteration + " (will stop after " + maxIter + ")");
+ return spectralRanking.iteration >= maxIter;
+ }
+ }
+
+ /** A stopping criterion that evaluates {@link SpectralRanking#normDelta()}, and stops
+ * if this value is smaller than a given threshold.
+ *
+ * <p>Note that this criterion assumes {@link SpectralRanking#normDelta()} has been properly implemented.
+ */
+ public static class NormStoppingCriterion implements StoppingCriterion {
+ private final double threshold;
+
+ /** Creates an instance with given threshold.
+ *
+ * @param threshold the threshold.
+ */
+ public NormStoppingCriterion(final double threshold) {
+ this.threshold = threshold;
+ }
+
+ @Override
+ public boolean shouldStop(final SpectralRanking spectralRanking) {
+ spectralRanking.logger.info("Current norm delta: " + spectralRanking.normDelta() + " (will stop below " + threshold + ")");
+ return spectralRanking.normDelta() < threshold;
+ }
+ }
+
+ /** Composes two stopping criteria, producing a single stopping criterion (the computation stops iff both
+ * conditions become true; lazy boolean evaluation is applied).
+ *
+ * @param stop1 a stopping criterion.
+ * @param stop2 a stopping criterion.
+ * @return a criterion that decides to stop as soon as both criteria are satisfied.
+ */
+ public static StoppingCriterion and(final StoppingCriterion stop1, final StoppingCriterion stop2) {
+ return p -> stop1.shouldStop(p) && stop2.shouldStop(p);
+ }
+
+ /** Composes two stopping criteria, producing a single stopping criterion (the computation stops iff either
+ * condition becomes true; lazy boolean evaluation is applied).
+ *
+ * @param stop1 a stopping criterion.
+ * @param stop2 a stopping criterion.
+ * @return a criterion that decides to stop as soon as one of the two criteria is satisfied.
+ */
+ public static StoppingCriterion or(final StoppingCriterion stop1, final StoppingCriterion stop2) {
+ return p -> stop1.shouldStop(p) || stop2.shouldStop(p);
+ }
+
+ /** Commodity method checking whether a vector is stochastic (nonnegative entries summing up to one within {@link #STOCHASTIC_TOLERANCE}).
+ *
+ * <p>This method uses <a href="http://en.wikipedia.org/wiki/Kahan_summation_algorithm">Kahan's summation algorithm</a>.
+ *
+ * @param v the vector to check.
+ * @return true if the vector is stochastic.
+ */
+ protected static boolean isStochastic(final DoubleBigList v) {
+ double normL1 = 0.0, c = 0.0, t, y;
+ long i;
+ //Kahan method to minimize the round errors in doubles sum.
+ for (i = v.size64(); i-- != 0 && v.getDouble(i) >= 0;) {
+ y = v.getDouble(i) - c;
+ t = (normL1 + y);
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ return (i == -1 && Math.abs(normL1 - 1.0) <= STOCHASTIC_TOLERANCE);
+ }
+
+ /** Returns a {@link Properties} object that contains all parameters used by the computation.
+ *
+ * <p>Implementing subclasses should extends this method by calling <code>super()</code>
+ * and setting additional properties on the resulting {@link Properties}.
+ *
+ * @param graphBasename basename of the graph
+ * @return a properties object that represent all the parameters used to calculate the ranking.
+ */
+ public Properties buildProperties(final String graphBasename) {
+ final Properties prop = new Properties();
+ prop.setProperty("iterations", iteration);
+ prop.setProperty("normdelta", Double.toString(normDelta()));
+ prop.setProperty("nodes", n);
+ prop.setProperty("graph", graphBasename);
+ return prop;
+ }
+
+ /** Initializes the rank vector, zeroes {@link #iteration} and logs basic data. Please extend this method to handle additional attributes. */
+ @SuppressWarnings("unused")
+ public void init() throws IOException {
+ logger.info("Initializing...");
+ iteration = 0;
+ // Creates the array, if necessary
+ if (rank == null) rank = DoubleBigArrays.newBigArray(n);
+ }
+
+
+ /** Performs one computation step. */
+ public abstract void step() throws IOException;
+
+ /** Returns the norm of an estimation of the distance to the limit of the iterative process: depending
+ * on the implementation, this can be an actual bound or, for example, just the difference between the
+ * last two approximations.
+ *
+ * <p>This method must be implemented by concrete subclasses if you want to use {@link NormStoppingCriterion}.
+ *
+ * @return the norm of an estimation of the distance to the limit.
+ * @throws IllegalStateException if called before the first iteration.
+ * @throws UnsupportedOperationException if it is not possible to compute a norm.
+ */
+ public double normDelta() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Calls {@link #init()} and steps until a given stopping criterion is met.
+ * The criterion is checked <i>a posteriori</i> (i.e., after each step); this means that
+ * at least one step is performed.
+ *
+ * @param stoppingCriterion the stopping criterion to be used.
+ */
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ do step(); while (!stoppingCriterion.shouldStop(this));
+ }
+
+ /** Clears all data and releases resources by nulling {@link #rank} (i.e., results we no longer be available).
+ * Please extend this method to handle additional attributes. */
+ public void clear() {
+ rank = null;
+ }
+
+ /** Returns a compact logarithmic approximation of a norm vector.
+ *
+ * @param doubleIterator an iterator enumerating a norm vector.
+ * @return an array of bytes containing the opposite of a lower bound on the binary logarithm of the doubles returned by the iterator.
+ */
+ protected byte[][] approximateNormVector(final DoubleIterator doubleIterator) {
+ final byte[][] normVector = ByteBigArrays.newBigArray(n);
+
+ for (long i = 0; i < n; i++) {
+ final double e = doubleIterator.nextDouble();
+ if (e == 0) throw new IllegalArgumentException("A norm vector cannot contain zeroes");
+ if (e > 1) throw new IllegalArgumentException("The norm vector contains an entry larger than one: " + e);
+ final int approx = (int)Math.ceil(- Math.log(e) / Math.log(2));
+ if (approx > 62) throw new IllegalArgumentException("The norm vector has an entry smaller than 1/2^62 (" + e + ")");
+ ByteBigArrays.set(normVector, i, (byte) approx);
+ }
+
+ return normVector;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/bubing/util/BURL.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/bubing/util/BURL.java
new file mode 100644
index 0000000..120afe7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/bubing/util/BURL.java
@@ -0,0 +1,399 @@
+package it.unimi.dsi.law.bubing.util;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+import java.util.Arrays;
+
+import org.apache.commons.lang.ArrayUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.google.common.primitives.Bytes;
+
+import it.unimi.dsi.lang.MutableString;
+
+// RELEASE-STATUS: DIST
+
+/** Static methods to manipulate normalized, canonical URLs.
+ *
+ *@deprecated Use <a href="http://law.di.unimi.it/software/bubing-docs/it/unimi/di/law/bubing/util/BURL.html">BUbiNG's BURL</a>.
+ */
+@Deprecated
+public final class BURL {
+ private static final Logger LOGGER = LoggerFactory.getLogger(BURL.class);
+
+ private static final boolean DEBUG = false;
+
+ /** Characters that will cause a URI spec to be rejected. */
+ public static final char[] FORBIDDEN_CHARS = { '\n', '\r' };
+
+ /** A list of bad characters. It includes the backslash, replaced by the slash, and illegal characters
+ * such as spaces and braces, which are replaced by the equivalent percent escape. Square brackets
+ * are percent-escaped, too, albeit legal in some circumstances, as they appear frequently in paths. */
+ public static final char[] BAD_CHAR = new char[] { '\\', ' ', '\t', '[', ']', '"', '|', '{', '}', '^', '<', '>', '`' };
+ /** Substitutes for {@linkplain #BAD_CHAR bad characters}. */
+ public static final String[] BAD_CHAR_SUBSTITUTE = new String[BAD_CHAR.length];
+
+ static {
+ BAD_CHAR_SUBSTITUTE[0] = "/";
+ for(int i = BAD_CHAR.length; i-- != 1;) BAD_CHAR_SUBSTITUTE[i] = (BAD_CHAR[i] < 16 ? "%0" : "%") + Integer.toHexString(BAD_CHAR[i]);
+ }
+
+ private BURL() {}
+
+ /** Creates a new BUbiNG URL from a string specification if possible, or returns {@code null} otherwise.
+ *
+ * @param spec the string specification for a URL.
+ * @return a BUbiNG URL corresponding to <code>spec</code> without possibly the fragment, or {@code null} if <code>spec</code> is malformed.
+ * @see #parse(MutableString)
+ */
+
+ public static URI parse(final String spec) {
+ return parse(new MutableString(spec));
+ }
+
+ /** Creates a new BUbiNG URL from a {@linkplain MutableString mutable string}
+ * specification if possible, or returns {@code null} otherwise.
+ *
+ * <p>The conditions for this method not returning {@code null} are as follows:
+ * <ul>
+ * <li><code>spec</code>, once trimmed, must not contain characters in {@link #FORBIDDEN_CHARS};
+ * <li>once characters in {@link #BAD_CHAR} have been substituted with the corresponding
+ * strings in {@link #BAD_CHAR_SUBSTITUTE}, and percent signs not followed by two hexadecimal
+ * digits have been substituted by <code>%25</code>, <code>spec</code> must not throw
+ * an exception when {@linkplain URI#URI(java.lang.String) made into a URI}.
+ * <li>the {@link URI} instance so obtained must not be {@linkplain URI#isOpaque() opaque}.
+ * <li>the {@link URI} instance so obtained, if {@linkplain URI#isAbsolute() absolute},
+ * must have a non-{@code null} {@linkplain URI#getAuthority() authority}.
+ * </ul>
+ *
+ * <p>For efficiency, this method modifies the provided specification,
+ * and in particular it makes it {@linkplain MutableString#loose() loose}. <i>Caveat emptor</i>.
+ *
+ * <p>Fragments are removed (for a web crawler fragments are just noise). {@linkplain URI#normalize() Normalization}
+ * is applied for you. Scheme and host name are downcased. If the URL has no host name, it is guaranteed
+ * that the path is non-{@code null} and non-empty (by adding a slash, if necessary). If
+ * the host name ends with a dot, it is removed.
+ *
+ * @param spec the string specification for a URL; <strong>it can be modified by this method</strong>, and
+ * in particularly it will always be made {@linkplain MutableString#loose() loose}.
+ * @return a BUbiNG URL corresponding to <code>spec</code> without possibly the fragment, or {@code null} if
+ * <code>spec</code> is malformed.
+ * @see #parse(String)
+ */
+
+ public static URI parse(final MutableString spec) {
+
+ if (DEBUG) LOGGER.debug("parse(" + spec + ")");
+ spec.loose().trim();
+ if (spec.indexOfAnyOf(FORBIDDEN_CHARS) != -1) return null;
+
+ // By the book, but flexible.
+ spec.replace(BAD_CHAR, BAD_CHAR_SUBSTITUTE);
+
+ // Find percents not followed by two hexadecimal digits and fix them.
+ final char[] a = spec.array();
+ final int l = spec.length();
+
+ for(int i = l; i-- != 0;) {
+ if (a[i] == '%' && (i >= l - 2 || ! isHexDigit(a[i + 1]) || ! isHexDigit(a[i + 2]))) spec.insert(i + 1, "25");
+ }
+
+ try {
+ final URI uri = new URI(spec.toString()).normalize();
+
+ if (uri.isOpaque()) return null;
+
+ // Let us force parsing host, user info and port, or get an exception otherwise.
+ if (uri.isAbsolute()) uri.parseServerAuthority();
+
+ // Downcase
+ String scheme = uri.getScheme();
+ if (scheme != null) {
+ if (scheme.indexOf('\0') != -1) return null; // Workaround for URI bug
+ scheme = scheme.toLowerCase();
+ }
+
+ // No absolute URL without authority (e.g., file://).
+ if (uri.isAbsolute() && uri.getAuthority() == null) return null;
+
+ // Workaround for URI bug
+ if (uri.getPath() != null && uri.getPath().indexOf('\0') != -1) return null;
+ if (uri.getUserInfo() != null && uri.getUserInfo().indexOf('\0') != -1) return null;
+ if (uri.getQuery() != null && uri.getQuery().indexOf('\0') != -1) return null;
+
+ // Remove trailing dot in host name if present and downcase
+ String host = uri.getHost();
+ if (host != null) {
+ if (host.indexOf('\0') != -1) return null; // Workaround for URI bug
+ if (host.endsWith(".")) host = host.substring(0, host.length() - 1);
+ host = host.toLowerCase();
+ }
+
+ // Substitute empty path with slash in absolute URIs.
+ String rawPath = uri.getRawPath();
+
+ if (host != null && (rawPath == null || rawPath.length() == 0)) rawPath = "/";
+
+ // Rebuild, discarding fragment, parsing again a purely ASCII string and renormalizing (convoluted, but it does work).
+ return new URI(sanitizeAndRepack(scheme, uri.getRawUserInfo(), host, uri.getPort(), rawPath, uri.getRawQuery())).normalize();
+ }
+ catch (final URISyntaxException e) {
+ return null;
+ }
+ catch(final Exception e) {
+ LOGGER.warn("Unexpected exception while parsing " + spec, e);
+ return null;
+ }
+ }
+
+ /** If the argument string does not contain non-ASCII characters, returns the string itself;
+ * otherwise, encodes non-ASCII characters by %XX-encoded UTF-8 sequences.
+ *
+ * @param s a string.
+ * @return <code>c</code> with non-ASCII characters replaced by %XX-encoded UTF-8 sequences.
+ */
+ private static String sanitize(final String s) {
+ int i = s.length();
+ for(i = s.length(); i-- != 0;) if (s.charAt(i) >= (char)128) break;
+ if (i == -1) return s;
+
+ final ByteBuffer byteBuffer = Charsets.UTF_8.encode(CharBuffer.wrap(s));
+ final StringBuilder stringBuilder = new StringBuilder();
+
+ while (byteBuffer.hasRemaining()) {
+ final int b = byteBuffer.get() & 0xff;
+ if (b >= 0x80) stringBuilder.append('%').append(Integer.toHexString(b >> 4 & 0xf)).append(Integer.toHexString(b & 0xf));
+ else stringBuilder.append((char)b);
+ }
+
+ return stringBuilder.toString();
+ }
+
+ /** {@linkplain #sanitize(String) Sanitizes} all arguments (any of which may be {@code null})
+ * and repack them as the string representation of a URI. The behaviour of this method is
+ * similar to that of {@link URI#URI(String, String, String, int, String, String, String)}, but
+ * we do not escape reserved characters, and the result is guaranteed to be an ASCII string.
+ *
+ * @return the string representation of a URI formed by the given components with non-ASCII
+ * characters replaced by %XX-encoded UTF-8 sequences.
+ * @see #sanitize(String)
+ */
+ private static String sanitizeAndRepack(final String scheme, final String userInfo, final String host, final int port, final String path, final String query) {
+ final StringBuffer sb = new StringBuffer();
+ if (scheme != null) sb.append(sanitize(scheme)).append(':');
+ if (host != null) {
+ sb.append("//");
+ if (userInfo != null) sb.append(sanitize(userInfo)).append('@');
+ final boolean needBrackets = host.indexOf(':') >= 0 && ! host.startsWith("[") && ! host.endsWith("]");
+ if (needBrackets) sb.append('[');
+ sb.append(sanitize(host));
+ if (needBrackets) sb.append(']');
+ if (port != -1) sb.append(':').append(port);
+ }
+ if (path != null) sb.append(sanitize(path));
+ if (query != null) sb.append('?').append(sanitize(query));
+ return sb.toString();
+ }
+
+ /** Creates a new BUbiNG URL from a normalized ASCII string represented by a byte array.
+ *
+ * <p>The string represented by the argument will <em>not</em> go through {@link #parse(MutableString)}.
+ * {@link URI#create(String)} will be used instead.
+ *
+ * @param normalized a normalized URI string represented by a byte array.
+ * @return the corresponding BUbiNG URL.
+ * @throws IllegalArgumentException if <code>normalized</code> does not parse correctly.
+ */
+ public static URI fromNormalizedByteArray(final byte[] normalized) {
+ return URI.create(toString(normalized));
+ }
+
+ /** Creates a new BUbiNG URL from a normalized ASCII string representing scheme and
+ * authority and a byte-array representation of a normalized ASCII path and query.
+ *
+ * <p>This method is intended to combine the results of {@link #schemeAndAuthority(URI)}/
+ * {@link #schemeAndAuthority(byte[])} and {@link #pathAndQueryAsByteArray(byte[])}(
+ * {@link #pathAndQueryAsByteArray(URI)}.
+ *
+ * @param schemeAuthority an ASCII string representing scheme and authorty.
+ * @param normalizedPathQuery the byte-array representation of a normalized ASCII path and query.
+ * @return the corresponding BUbiNG URL.
+ * @throws IllegalArgumentException if the two parts, concatenated, do not parse correctly.
+ */
+ public static URI fromNormalizedSchemeAuthorityAndPathQuery(final String schemeAuthority, final byte[] normalizedPathQuery) {
+ final char[] array = new char[schemeAuthority.length() + normalizedPathQuery.length];
+ schemeAuthority.getChars(0, schemeAuthority.length(), array, 0);
+ for(int i = array.length, j = normalizedPathQuery.length; j-- != 0;) array[--i] = (char)normalizedPathQuery[j];
+ return URI.create(new String(array));
+ }
+
+ private static boolean isHexDigit(final char c) {
+ return c >= '0' && c <= '9' || c >= 'A' && c <= 'F' || c >= 'a' && c <= 'f';
+ }
+
+ /** Returns an ASCII byte-array representation of a BUbiNG URL.
+ *
+ * @param url a BUbiNG URL.
+ * @return an ASCII byte-array representation of <code>uri</code>
+ */
+ public static byte[] toByteArray(final URI url) {
+ final String s = url.toString();
+ final byte[] result = new byte[s.length()];
+ for (int i = result.length; i-- != 0;) {
+ assert s.charAt(i) < (char)0x80 : s.charAt(i);
+ result[i] = (byte)(s.charAt(i) & 0x7F);
+ }
+ return result;
+ }
+
+ /** Returns an ASCII byte-array representation of a BUbiNG URL.
+ *
+ * @param url a BUbiNG URL.
+ * @return an ASCII byte-array representation of <code>uri</code>
+ */
+ public static String toString(final byte[] url) {
+ final char[] array = new char[url.length];
+ // This needs to be fast.
+ for(int i = array.length; i-- != 0;) {
+ assert url[i] < (char)0x80 : url[i];
+ array[i] = (char)url[i];
+ }
+ return new String(array);
+ }
+
+ /** Returns an ASCII byte-array representation of
+ * the {@linkplain URI#getRawPath() raw path} and {@linkplain URI#getRawQuery() raw query} of a BUbiNG URL.
+ *
+ * @param url a BUbiNG URL.
+ * @return an ASCII byte-array representation of
+ * the {@linkplain URI#getRawPath() raw path} and {@linkplain URI#getRawQuery() raw query} of a BUbiNG URL.
+ */
+ public static byte[] pathAndQueryAsByteArray(final URI url) {
+ final String query = url.getRawQuery();
+ final String path = url.getRawPath();
+ final byte[] result = new byte[path.length() + (query != null ? 1 + query.length() : 0)];
+
+ for (int i = path.length(); i-- != 0;) {
+ assert path.charAt(i) < (char)0x80 : path.charAt(i);
+ result[i] = (byte)(path.charAt(i) & 0x7F);
+ }
+
+ if (query != null) {
+ result[path.length()] = '?';
+ for (int j = query.length(), i = result.length; j-- != 0;) {
+ assert query.charAt(j) < (char)0x80 : query.charAt(j);
+ result[--i] = (byte)(query.charAt(j) & 0x7F);
+ }
+ }
+
+ return result;
+ }
+
+ /** Returns the concatenated {@linkplain URI#getRawPath() raw path} and {@linkplain URI#getRawQuery() raw query} of a BUbiNG URL.
+ *
+ * @param url a BUbiNG URL.
+ * @return the concatenated {@linkplain URI#getRawPath() raw path} and {@linkplain URI#getRawQuery() raw query} of <code>uri</code>.
+ */
+ public static String pathAndQuery(final URI url) {
+ final String query = url.getRawQuery();
+ return query != null ? url.getRawPath() + '?' + query : url.getRawPath();
+ }
+
+ /** Returns the concatenated {@linkplain URI#getScheme()} and {@link URI#getRawAuthority() raw authority} of a BUbiNG URL.
+ *
+ * @param url a BUbiNG URL.
+ * @return the concatenated {@linkplain URI#getScheme()} and {@link URI#getRawAuthority() raw authority} of <code>uri</code>.
+ */
+ public static String schemeAndAuthority(final URI url) {
+ return url.getScheme() + "://" + url.getRawAuthority();
+ }
+
+ private final static byte[] DOUBLE_BAR = new byte[] { '/', '/' };
+
+ /** Extracts the host of an absolute BUbiNG URL in its byte-array representation.
+ *
+ * @param url a byte-array representation of a BUbiNG URL.
+ * @return the host of <code>url</code>.
+ */
+ public static String host(final byte[] url){
+ int startHost = Bytes.indexOf(url, DOUBLE_BAR) + DOUBLE_BAR.length;
+ final int endAuthority = ArrayUtils.indexOf(url, (byte)'/', startHost);
+ final int atPosition = ArrayUtils.indexOf(url, (byte)'@', startHost);
+ if (atPosition != -1 && atPosition < endAuthority) startHost = atPosition + 1;
+ final int colonPosition = ArrayUtils.indexOf(url, (byte)':', startHost);
+ final int endHost = colonPosition != -1 && colonPosition < endAuthority ? colonPosition : endAuthority;
+ final char[] array = new char[endHost - startHost];
+ for(int i = endHost - startHost; i-- != 0;) array[i] = (char)url[i + startHost];
+ return new String(array);
+ }
+
+ /** Extracts the path and query of an absolute BUbiNG URL in its byte-array representation.
+ *
+ * @param url a byte-array representation of a BUbiNG URL.
+ * @return the path and query in byte-array representation.
+ */
+ public static byte[] pathAndQueryAsByteArray(final byte[] url){
+ final int startAuthority = Bytes.indexOf(url, DOUBLE_BAR) + DOUBLE_BAR.length;
+ return Arrays.copyOfRange(url, ArrayUtils.indexOf(url, (byte)'/', startAuthority), url.length);
+ }
+
+ /** Extracts the host part from a scheme and authority by removing the scheme, the user info and the port number.
+ *
+ * @param schemeAuthority a scheme and authority.
+ * @return the host part.
+ */
+ public static String hostFromSchemeAndAuthority(final String schemeAuthority){
+ final int startOfAuthority = schemeAuthority.indexOf(":") + 3;
+ final int atPosition = schemeAuthority.indexOf('@', startOfAuthority);
+ final int startOfHost = atPosition != -1 ? atPosition + 1 : startOfAuthority;
+ final int colonPosition = schemeAuthority.indexOf(':', startOfHost);
+ return colonPosition == -1 ? schemeAuthority.substring(startOfHost) : schemeAuthority.substring(startOfHost, colonPosition);
+ }
+
+ /** Extracts the scheme and authority of an absolute BUbiNG URL in its byte-array representation.
+ *
+ * @param url an absolute BUbiNG URL.
+ * @return the scheme and authority of <code>url</code>.
+ */
+ public static String schemeAndAuthority(final byte[] url){
+ int i, j;
+ for(i = 0, j = 2; ; i++) if (url[i] == '/' && j-- == 0) break;
+ final char[] array = new char[i];
+ for(i = array.length; i-- != 0;) array[i] = (char)url[i];
+ return new String(array);
+ }
+
+ /** Returns the memory usage associated to a byte array.
+ *
+ * <p>This method is useful in establishing the memory footprint of URLs in byte-array representation.
+ *
+ * @param array a byte array.
+ * @return its memory usage in bytes.
+ */
+ public static int memoryUsageOf(final byte[] array) {
+ return ((16 + array.length + 7) & -1 << 3) + // Obtained by Classmexer on a 64-bit Sun JVM for Intel.
+ 8; // This accounts for the space used by the FIFO queue.
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/BFS.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/BFS.java
new file mode 100644
index 0000000..dc5b2b8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/BFS.java
@@ -0,0 +1,142 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+/** Computes the visit order with respect to a breadth-first visit.
+ *
+ * @author Marco Rosa
+ */
+
+
+//RELEASE-STATUS: DIST
+
+public class BFS {
+ private final static Logger LOGGER = LoggerFactory.getLogger(BFS.class);
+
+ /** Return the permutation induced by the visit order of a depth-first visit.
+ *
+ * @param graph a graph.
+ * @param startingNode the only starting node of the visit, or -1 for a complete visit.
+ * @param startPerm a permutation that will be used to shuffle successors.
+ * @return the permutation induced by the visit order of a depth-first visit.
+ */
+ public static int[] bfsperm(final ImmutableGraph graph, final int startingNode, final int[] startPerm) {
+ final int n = graph.numNodes();
+
+ final int[] visitOrder = new int[n];
+ final int[] invStartPerm = Util.invertPermutation(startPerm, new int[n]);
+ Arrays.fill(visitOrder, -1);
+ final IntArrayFIFOQueue queue = new IntArrayFIFOQueue();
+ final LongArrayBitVector visited = LongArrayBitVector.ofLength(n);
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ pl.expectedUpdates = n;
+ pl.itemsName = "nodes";
+ pl.start("Starting breadth-first visit...");
+ Arrays.fill(visitOrder, -1);
+
+ int pos = 0;
+
+ for(int i = 0; i < n; i++) {
+ final int start = i == 0 && startingNode != -1 ? startingNode : invStartPerm[i];
+ if (visited.getBoolean(start)) continue;
+ queue.enqueue(start);
+ visited.set(start);
+
+ int currentNode;
+ final IntArrayList successors = new IntArrayList();
+
+ while(! queue.isEmpty()) {
+ currentNode = queue.dequeueInt();
+ visitOrder[pos++] = currentNode;
+ int degree = graph.outdegree(currentNode);
+ final LazyIntIterator iterator = graph.successors(currentNode);
+
+ successors.clear();
+ while(degree-- != 0) {
+ final int succ = iterator.nextInt();
+ if (! visited.getBoolean(succ)) {
+ successors.add(succ);
+ visited.set(succ);
+ }
+ }
+
+ final int[] randomSuccessors = successors.elements();
+ IntArrays.quickSort(randomSuccessors, 0, successors.size(), (x, y) -> startPerm[x] - startPerm[y]);
+
+ for(int j = successors.size(); j-- != 0;) queue.enqueue(randomSuccessors[j]);
+ pl.update();
+ }
+
+ if (startingNode != -1) break;
+ }
+
+ pl.done();
+ return visitOrder;
+ }
+
+ public static void main(final String[] args) throws JSAPException, IOException {
+ final SimpleJSAP jsap = new SimpleJSAP(BFS.class.getName(), "Computes the permutation induced by a breadth-first visit.", new Parameter[] {
+ new FlaggedOption("randomSeed", JSAP.LONG_PARSER, "0", JSAP.NOT_REQUIRED, 'r', "random-seed", "The random seed."),
+ new FlaggedOption("initialNode", JSAP.INTEGER_PARSER, "-1", JSAP.NOT_REQUIRED, 'i', "initial-node", "The initial node of the visit. If specified, the visit will be performed only starting from the given node. The default performs a complete visit, iterating on all possible starting nodes."),
+ new Switch("random", 'p', "Start from a random permutation."),
+ new UnflaggedOption("graph", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the input graph"),
+ new UnflaggedOption("perm", JSAP.STRING_PARSER, JSAP.REQUIRED, "The name of the output permutation"), });
+
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final ImmutableGraph graph = ImmutableGraph.load(jsapResult.getString("graph"));
+
+ final int n = graph.numNodes();
+ final int[] startPerm = Util.identity(new int[n]);
+ final long seed = jsapResult.getLong("randomSeed");
+ final int initialnode = jsapResult.getInt("initialNode");
+ if (jsapResult.getBoolean("random")) Collections.shuffle(IntArrayList.wrap(startPerm), new Random(seed));
+
+ BinIO.storeInts(Util.invertPermutationInPlace(bfsperm(graph, initialnode, startPerm)), jsapResult.getString("perm"));
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/DFS.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/DFS.java
new file mode 100644
index 0000000..be66428
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/DFS.java
@@ -0,0 +1,136 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.IOException;
+import java.util.Collections;
+import java.util.Random;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntStack;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+/** Computes the visit order with respect to a depth-first visit.
+ *
+ * @author Marco Rosa
+ * @deprecated This class performs a stack-based visit, but technically not a DFS.
+ */
+
+
+//RELEASE-STATUS: DIST
+
+@Deprecated
+public class DFS {
+ private final static Logger LOGGER = LoggerFactory.getLogger(DFS.class);
+
+ /** Return the permutation induced by the visit order of a depth-first visit.
+ *
+ * @param graph a graph.
+ * @param startPerm a permutation that will be used to shuffle successors.
+ * @return the permutation induced by the visit order of a depth-first visit.
+ */
+ public static int[] dfsperm(final ImmutableGraph graph, final int[] startPerm) {
+ final int n = graph.numNodes();
+
+ final int[] invStartPerm = Util.invertPermutation(startPerm, new int[n]);
+ final int[] perm = Util.identity(n);
+ final IntStack stack = new IntArrayList();
+ final LongArrayBitVector visited = LongArrayBitVector.ofLength(n);
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ pl.expectedUpdates = n;
+ pl.itemsName = "nodes";
+ pl.start("Starting depth-first visit...");
+
+ int pos = 0;
+ for(int j = 0; j < n; j++){
+ final int start = invStartPerm[j];
+ if (visited.getBoolean(start)) continue;
+ stack.push(start);
+ visited.set(start);
+
+ int currentNode;
+ final IntArrayList successors = new IntArrayList();
+
+ while(! stack.isEmpty()) {
+ currentNode = stack.popInt();
+ perm[pos++] = currentNode;
+ final int degree = graph.outdegree(currentNode);
+ final LazyIntIterator iterator = graph.successors(currentNode);
+
+ successors.clear();
+ for(int i = 0; i < degree; i++) {
+ final int succ = iterator.nextInt();
+ if (! visited.getBoolean(succ)) {
+ successors.add(succ);
+ visited.set(succ);
+ }
+ }
+
+ final int[] randomSuccessors = successors.elements();
+ IntArrays.quickSort(randomSuccessors, 0, successors.size(), (x, y) -> startPerm[y] - startPerm[x]);
+
+ for(int i = successors.size(); i-- != 0;) stack.push(randomSuccessors[i]);
+ pl.update();
+ }
+ }
+
+
+ pl.done();
+ return perm;
+ }
+
+ public static void main(final String[] args) throws JSAPException, IOException {
+ final SimpleJSAP jsap = new SimpleJSAP(DFS.class.getName(), "Computes the permutation induced by a depth-first visit.", new Parameter[] {
+ new FlaggedOption("randomSeed", JSAP.LONG_PARSER, "0", JSAP.NOT_REQUIRED, 'r', "random-seed", "The random seed."),
+ new Switch("random", 'p', "Start from a random permutation."),
+ new UnflaggedOption("graph", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the input graph"),
+ new UnflaggedOption("perm", JSAP.STRING_PARSER, JSAP.REQUIRED, "The name of the output permutation"), });
+
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final ImmutableGraph graph = ImmutableGraph.load(jsapResult.getString("graph"));
+
+ final int n = graph.numNodes();
+ final int[] startPerm = Util.identity(new int[n]);
+ final long seed = jsapResult.getLong("randomSeed");
+ if (jsapResult.getBoolean("random")) Collections.shuffle(IntArrayList.wrap(startPerm), new Random(seed));
+
+ BinIO.storeInts(Util.invertPermutationInPlace(dfsperm(graph, startPerm)), jsapResult.getString("perm"));
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/LayeredLabelPropagation.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/LayeredLabelPropagation.java
new file mode 100644
index 0000000..a383851
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/LayeredLabelPropagation.java
@@ -0,0 +1,939 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.Thread.UncaughtExceptionHandler;
+import java.text.DecimalFormat;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+
+import org.apache.commons.lang.mutable.MutableDouble;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.ints.AbstractInt2IntMap;
+import it.unimi.dsi.fastutil.ints.Int2IntMap;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.objects.ObjectIterator;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.algo.EliasFanoCumulativeOutdegreeList;
+
+
+// RELEASE-STATUS: DIST
+
+/** An implementation of the <em>layered label propagation</em> algorithm described by
+ * by Paolo Boldi, Sebastiano Vigna, Marco Rosa, Massimo Santini, and Sebastiano Vigna in &ldquo;Layered label propagation:
+ * A multiresolution coordinate-free ordering for compressing social networks&rdquo;,
+ * <i>Proceedings of the 20th international conference on World Wide Web</i>, pages 587&minus;596, ACM, 2011.
+ *
+ * <p>The method {@link #computePermutation(double[], String, int)} returns a permutation of the original
+ * <em>symmetric</em> graph provided with the {@linkplain #LayeredLabelPropagation(ImmutableGraph, int[], long, boolean) constructor}
+ * which will (hopefully) increase locality (see the paper). Usually, the permutation is fed to
+ * {@link Transform#mapOffline(ImmutableGraph, int[], int, File, ProgressLogger)} to permute the original graph.
+ *
+ * <p>Note that the graph provided must be <em>symmetric</em> and <em>loopless</em>. If this is not the case,
+ * please use {@link Transform#symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)} and possibly
+ * {@link Transform#filterArcs(ImmutableGraph, it.unimi.dsi.webgraph.Transform.ArcFilter, ProgressLogger)} with
+ * filter {@link Transform#NO_LOOPS} to generate a suitable graph.
+ *
+ * <p>This class can also be used to run just label propagation over a given graph to
+ * get the {@linkplain #computeLabels(double, int) labels assigned to the nodes} for a fixed &gamma;.
+ *
+ * <h2>Memory requirements</h2>
+ *
+ * <p>This class requires 13 bytes per node (three integers and a boolean), plus the memory
+ * that is necessary to load the graph, which however can be just
+ * {@link ImmutableGraph#loadMapped(CharSequence, ProgressLogger) memory-mapped}.
+ *
+ * <p>Note that the main method will warm up the algorithm by performing a {@linkplain DFS depth-first visit}
+ * if the graph is not mapped. The visit will require storing an additional array of integers.
+ *
+ * @author Paolo Boldi
+ * @author Marco Rosa
+ * @author Massimo Santini
+ * @author Sebastiano Vigna
+ */
+
+public class LayeredLabelPropagation {
+
+ private final static Logger LOGGER = LoggerFactory.getLogger(LayeredLabelPropagation.class);
+
+ /** The list of default &gamma; values. It must be kept in sync with the {@link #main(String[])} default parameters. */
+ public static final double[] DEFAULT_GAMMAS = { 1., 1./2, 1./4, 1./8, 1./16, 1./32, 1./64, 1./128, 1./256, 1./512, 1./1024, 0 };
+
+ /** The format used to print &gamma;'s. */
+ private static final DecimalFormat GAMMA_FORMAT = new java.text.DecimalFormat("0.############");
+
+ /** The default maximum number of updates. */
+ public static final int MAX_UPDATES = 100;
+
+ /** The minimum gain in the Hamiltonian. Under this threshold we stop. */
+ private final static double GAIN_TRESHOLD = 0.001;
+
+ /** The update list will be shuffled by blocks of this size, to ensure some locality. */
+ private static final int SHUFFLE_GRANULARITY = 100000;
+
+ /** A symmetric, loopless graph. */
+ private final ImmutableGraph symGraph;
+
+ /** The number of nodes of {@link #symGraph}. */
+ private final int n;
+
+ /** The label of each node. After a call to {@link #computePermutation(int, double[], String)}
+ * this field contains the final list of labels. */
+ private AtomicIntegerArray label;
+
+ /** Volume of each current cluster, indexed by label (many will be zeroes). */
+ private final AtomicIntegerArray volume;
+
+ /** The chosen update order. */
+ private final int[] updateList;
+
+ /** The objective function (Hamiltonian of the potts model). */
+ private final double[] objectiveFunction;
+
+ /** The objective function (Hamiltonian of the potts model). */
+ private final MutableDouble gapCost;
+
+ /** The random-number generator. */
+ private final XoRoShiRo128PlusRandom r;
+
+ /** The basename of temporary files containing labellings for various &gamma;'s. */
+ private File labelling;
+
+ /** If true, the user has set a basename for label files, and such files must not be deleted. */
+ private boolean labelBasenameSet;
+
+ /** A virtual permutation applied to the graph, or {@code null} for no permutation. */
+ private final int[] startPerm;
+
+ /** Whether to perform an exactly reproducible run in case {@link #startPerm} is not {@code null} (slower). */
+ private final boolean exact;
+
+ /** The number of threads used in the computation. */
+ private final int numberOfThreads;
+
+ /** The random seed. */
+ private final long seed;
+
+ /** For each note, true iff at least one of the successors changed its label. */
+ private final boolean[] canChange;
+
+ /** The number of nodes that changed their label in the current iteration. */
+ private final AtomicInteger modified;
+
+ /** A simple exception handler that stores the thrown exception in {@link #threadException}. */
+ private final SimpleUncaughtExceptionHandler simpleUncaughtExceptionHandler;
+
+ /** One of the throwables thrown by some of the threads, if at least one thread has thrown a throwable. */
+ private volatile Throwable threadException;
+
+ /** The current update. */
+ private int update;
+
+ /** The starting node of the next chunk of nodes to be processed. */
+ protected int nextNode;
+ /** The number of arcs before {@link #nextNode}. */
+ protected long nextArcs;
+ /** The outdegrees cumulative function. */
+ protected final EliasFanoCumulativeOutdegreeList cumulativeOutdegrees;
+
+
+ /** Creates a new instance.
+ *
+ * @param symGraph a symmetric, loopless graph.
+ * @param seed a random seed.
+ */
+ public LayeredLabelPropagation(final ImmutableGraph symGraph, final long seed) throws IOException {
+ this(symGraph, null, seed, false);
+ }
+
+
+ /** Creates a new instance using a specific initial permutation.
+ *
+ * @param symGraph a symmetric, loopless graph.
+ * @param startPerm an initial permutation of the graph, or {@code null} for no permutation.
+ * @param seed a random seed.
+ */
+ public LayeredLabelPropagation(final ImmutableGraph symGraph, final int[] startPerm, final long seed) throws IOException {
+ this(symGraph, startPerm, seed, false);
+ }
+
+ /** Creates a new instance using a specific initial permutation.
+ *
+ * <p>If <code>exact</code> is true, the final permutation is
+ * <em>exactly</em> the same as if you first permute the graph with <code>startPerm</code> and
+ * then apply LLP with an {@code null} starting permutation.
+ *
+ * @param symGraph a symmetric, loopless graph.
+ * @param startPerm an initial permutation of the graph, or {@code null} for no permutation.
+ * @param seed a random seed.
+ * @param exact a boolean flag that forces the algorithm to run exactly.
+ */
+ public LayeredLabelPropagation(final ImmutableGraph symGraph, final int[] startPerm, final long seed, final boolean exact) throws IOException {
+ this(symGraph, startPerm, 0, seed, exact);
+ }
+
+ /** Creates a new instance using a specific initial permutation and specified number of threads.
+ *
+ * <p>If <code>exact</code> is true, the final permutation is
+ * <em>exactly</em> the same as if you first permute the graph with <code>startPerm</code> and
+ * then apply LLP with an {@code null} starting permutation.
+ *
+ * @param symGraph a symmetric, loopless graph.
+ * @param startPerm an initial permutation of the graph, or {@code null} for no permutation.
+ * @param numberOfThreads the number of threads to be used (0 for automatic sizing).
+ * @param seed a random seed.
+ * @param exact a boolean flag that forces the algorithm to run exactly.
+ */
+ public LayeredLabelPropagation(final ImmutableGraph symGraph, final int[] startPerm, final int numberOfThreads, final long seed, final boolean exact) throws IOException {
+ this.symGraph = symGraph;
+ this.n = symGraph.numNodes();
+ this.startPerm = startPerm;
+ this.seed = seed;
+ this.r = new XoRoShiRo128PlusRandom(seed);
+ this.exact = exact;
+ this.label = new AtomicIntegerArray(n);
+ this.volume = new AtomicIntegerArray(n);
+ cumulativeOutdegrees = new EliasFanoCumulativeOutdegreeList(symGraph, symGraph.numArcs(), 1);
+
+ this.gapCost = new MutableDouble();
+ this.updateList = Util.identity(n);
+ simpleUncaughtExceptionHandler = new SimpleUncaughtExceptionHandler();
+ labelling = File.createTempFile(this.getClass().getName(), "labelling");
+ labelling.deleteOnExit();
+
+ this.numberOfThreads = numberOfThreads != 0 ? numberOfThreads : Runtime.getRuntime().availableProcessors();
+ this.canChange = new boolean[n];
+ this.modified = new AtomicInteger(0);
+ this.objectiveFunction = new double[this.numberOfThreads];
+ }
+
+
+ /**
+ * Sets the basename for label files.
+ *
+ * @param labelBasename basename for label files.
+ */
+ public void labelBasename(final String labelBasename) {
+ labelBasenameSet = true;
+ labelling = new File(labelBasename);
+ }
+
+
+ /**
+ * Combines two labellings devilishly into a new one.
+ *
+ * @param label the minor label; the result will be stored here.
+ * @param major the major label.
+ * @param perm a virtual permutation applied to the graph, or {@code null} for no permutation.
+ * @param support a support array.
+ * @return the resulting number of labels.
+ */
+ private static int combine(final int[] label, final int[] major, final int[] perm, final int[] support) {
+ final int n = label.length;
+ if (n == 0) return 0;
+ if (n != major.length) throw new IllegalArgumentException();
+
+ Util.identity(support);
+
+ if (perm == null) IntArrays.mergeSort(support, 0, n, (a, b) -> {
+ int t = label[major[a]] - label[major[b]];
+ if (t != 0) return t;
+ t = major[a] - major[b];
+ return t != 0 ? t : label[a] - label[b];
+ });
+ else IntArrays.mergeSort(support, 0, n, (a, b) -> {
+ int t = label[major[a]] - label[major[b]];
+ if (t != 0) return t;
+ t = perm[major[a]] - perm[major[b]];
+ return t != 0 ? t : label[a] - label[b];
+ });
+
+
+ int currMinor = label[support[0]];
+ int currMajor = major[support[0]];
+ int curr = 0;
+ label[support[0]] = curr;
+
+ for (int i = 1; i < n; i++) {
+ final int t = support[i];
+ final int u = label[t];
+ if (major[t] != currMajor || u != currMinor) {
+ currMinor = u;
+ currMajor = major[t];
+ curr++;
+ }
+
+ label[t] = curr;
+ }
+
+ return ++curr;
+
+ }
+
+ /** A minimal implementation of a set of counters using a hash table without rehashing. */
+ private final static class OpenHashTableCounter {
+ /** The keys. Always sized as a power of two. */
+ private int[] key;
+ /** The counters associated to {@link #key}. */
+ private int[] count;
+ /** Keeps track of the location of each key. Useful for linear-time iteration over the key/value pairs. */
+ private int[] location;
+ /** The mask used to compute the key locations. */
+ private int mask;
+ /** The number of keys in the table. */
+ private int n;
+
+ public OpenHashTableCounter() {
+ mask = -1;
+ count = IntArrays.EMPTY_ARRAY;
+ key = IntArrays.EMPTY_ARRAY;
+ location = IntArrays.EMPTY_ARRAY;
+ }
+
+ public void incr(final int k) {
+ int pos = (k * 2056437379) & mask;
+ while (count[pos] != 0 && key[pos] != k)
+ pos = (pos + 1) & mask;
+ if (count[pos]++ == 0) {
+ key[pos] = k;
+ location[n++] = pos;
+ }
+ }
+
+ public boolean containsKey(final int k) {
+ int pos = (k * 2056437379) & mask;
+ while (count[pos] != 0 && key[pos] != k)
+ pos = (pos + 1) & mask;
+ return count[pos] != 0;
+ }
+
+ // After a call to this method, incr() cannot be called anymore.
+ public void addZeroCount(final int k) {
+ int pos = (k * 2056437379) & mask;
+ while (count[pos] != 0 && key[pos] != k)
+ pos = (pos + 1) & mask;
+ if (count[pos] == 0) {
+ key[pos] = k;
+ location[n++] = pos;
+ }
+ }
+
+ private final static class Entry extends AbstractInt2IntMap.BasicEntry {
+ public Entry() {
+ super(0, 0);
+ }
+
+ public void setKey(final int key) {
+ this.key = key;
+ }
+
+ @Override
+ public int setValue(final int value) {
+ this.value = value;
+ return -1; // Violates the interface, but it's all internal.
+ }
+ }
+
+ public Iterator<Int2IntMap.Entry> entries() {
+ return new ObjectIterator<>() {
+ private int i;
+
+ private final Entry entry = new Entry();
+
+ @Override
+ public boolean hasNext() {
+ return i < n;
+ }
+
+ @Override
+ public Entry next() {
+ if (!hasNext()) throw new NoSuchElementException();
+ final int l = location[i++];
+ entry.setKey(key[l]);
+ entry.setValue(count[l]);
+ return entry;
+ }
+ };
+ }
+
+ public void clear(final int size) {
+ if (mask + 1 < (1 << (Fast.ceilLog2(size) + 1))) {
+ mask = (1 << (Fast.ceilLog2(size) + 1)) - 1;
+ count = new int[mask + 1];
+ key = new int[mask + 1];
+ location = new int[mask + 1];
+ }
+ else while (n-- != 0) count[location[n]] = 0;
+ n = 0;
+ }
+ }
+
+ private final class GapCostThread extends Thread {
+ @SuppressWarnings("hiding")
+ private final ImmutableGraph symGraph;
+
+ /** The permutation whose cost is to be evaluated. */
+ private final int[] perm;
+
+ private GapCostThread(final ImmutableGraph symGraph, final int[] perm) {
+ this.symGraph = symGraph;
+ this.perm = perm;
+ }
+
+ @Override
+ public void run() {
+ final ImmutableGraph symGraph = this.symGraph;
+ final int numNodes = LayeredLabelPropagation.this.n;
+ final long numArcs = LayeredLabelPropagation.this.symGraph.numArcs();
+ final int[] perm = this.perm;
+ int[] permutedSuccessors = new int[32];
+ int[] successors;
+ final long granularity = Math.max(1024, numArcs >>> 9);
+ int start, end;
+
+ double gapCost = 0;
+ for (;;) {
+
+ // Try to get another piece of work.
+ synchronized(LayeredLabelPropagation.this.cumulativeOutdegrees) {
+ if (nextNode == numNodes) {
+ LayeredLabelPropagation.this.gapCost.add(gapCost);
+ break;
+ }
+ start = nextNode;
+ final long target = nextArcs + granularity;
+ if (target >= numArcs) nextNode = numNodes;
+ else {
+ nextArcs = cumulativeOutdegrees.skipTo(target);
+ nextNode = cumulativeOutdegrees.currentIndex();
+ }
+ end = nextNode;
+ }
+
+ final NodeIterator nodeIterator = symGraph.nodeIterator(start);
+ for (int i = start; i < end; i++) {
+ nodeIterator.nextInt();
+ final int node = perm[i];
+ final int outdegree = nodeIterator.outdegree();
+ if (outdegree > 0) {
+ successors = nodeIterator.successorArray();
+ permutedSuccessors = IntArrays.grow(permutedSuccessors, outdegree);
+ for (int j = outdegree; j-- != 0;)
+ permutedSuccessors[j] = perm[successors[j]];
+ IntArrays.quickSort(permutedSuccessors, 0, outdegree);
+ int prev = node;
+ for (int j = 0; j < outdegree; j++) {
+ gapCost += Fast.ceilLog2(Math.abs(prev - permutedSuccessors[j]));
+ prev = permutedSuccessors[j];
+ }
+ }
+ }
+ }
+ }
+ }
+
+ private final class IterationThread extends Thread {
+ @SuppressWarnings("hiding")
+ private final ImmutableGraph symGraph;
+
+ /** The current value of &gamma;. */
+ private final double gamma;
+
+ /** A progress logger. */
+ private final ProgressLogger pl;
+
+ private final int index;
+
+ private IterationThread(final ImmutableGraph symGraph, final double gamma, final int index, final ProgressLogger pl) {
+ this.symGraph = symGraph;
+ this.gamma = gamma;
+ this.index = index;
+ this.pl = pl;
+ }
+
+ @Override
+ public void run() {
+ final XoRoShiRo128PlusRandom r = new XoRoShiRo128PlusRandom(LayeredLabelPropagation.this.seed);
+ final AtomicIntegerArray label = LayeredLabelPropagation.this.label;
+ final AtomicIntegerArray volume = LayeredLabelPropagation.this.volume;
+ final ImmutableGraph symGraph = this.symGraph;
+ final int numNodes = LayeredLabelPropagation.this.n;
+ final long numArcs = LayeredLabelPropagation.this.symGraph.numArcs();
+ final int[] updateList = LayeredLabelPropagation.this.updateList;
+ final int[] startPerm = LayeredLabelPropagation.this.startPerm;
+ final boolean[] canChange = LayeredLabelPropagation.this.canChange;
+ final boolean exact = LayeredLabelPropagation.this.exact;
+ final double gamma = this.gamma;
+ final long granularity = Math.max(1024, numArcs >>> 9);
+
+ int start, end;
+ double delta = LayeredLabelPropagation.this.objectiveFunction[index];
+
+ for (;;) {
+
+ // Try to get another piece of work.
+ synchronized(LayeredLabelPropagation.this.cumulativeOutdegrees) {
+ if (nextNode == numNodes) {
+ LayeredLabelPropagation.this.objectiveFunction[index] = delta;
+ break;
+ }
+ start = nextNode;
+ final long target = nextArcs + granularity;
+ if (target >= numArcs) nextNode = numNodes;
+ else {
+ nextArcs = cumulativeOutdegrees.skipTo(target);
+ nextNode = cumulativeOutdegrees.currentIndex();
+ }
+ end = nextNode;
+ }
+
+ final OpenHashTableCounter map = new OpenHashTableCounter();
+
+ for (int i = start; i < end; i++) {
+ final int node = updateList[i];
+
+ /** Note that here we are using a heuristic optimisation: if no neighbour has changed,
+ * the label of a node cannot change. If gamma != 0, this is not necessarily true,
+ * as a node might need to change its value just because of a change of volume of
+ * the adjacent labels. */
+
+ if (canChange[node]) {
+ canChange[node] = false;
+ final int outdegree = symGraph.outdegree(node);
+ if (outdegree > 0) {
+ final int currentLabel = label.get(node);
+ volume.decrementAndGet(currentLabel);
+
+ map.clear(outdegree);
+ LazyIntIterator successors = symGraph.successors(node);
+ for (int j = outdegree; j-- != 0;) map.incr(label.get(successors.nextInt()));
+
+ if (!map.containsKey(currentLabel)) map.addZeroCount(currentLabel);
+
+ double max = Double.NEGATIVE_INFINITY;
+ double old = 0;
+ final IntArrayList majorities = new IntArrayList();
+
+ for (final Iterator<Int2IntMap.Entry> entries = map.entries(); entries.hasNext();) {
+ final Int2IntMap.Entry entry = entries.next();
+ final int l = entry.getIntKey();
+ final int freq = entry.getIntValue(); // Frequency of label in my
+ // neighbourhood
+ final double val = freq - gamma * (volume.get(l) + 1 - freq);
+
+ if (max == val) majorities.add(l);
+
+ if (max < val) {
+ majorities.clear();
+ max = val;
+ majorities.add(l);
+ }
+
+ if (l == currentLabel) old = val;
+ }
+
+ if (exact) {
+ if (startPerm != null) IntArrays.quickSort(majorities.elements(), 0, majorities.size(), (a, b) -> startPerm[a] - startPerm[b]);
+ else IntArrays.quickSort(majorities.elements(), 0, majorities.size());
+ }
+
+
+ // Extract a label from the majorities
+ final int nextLabel = majorities.getInt(r.nextInt(majorities.size()));
+ if (nextLabel != currentLabel) {
+ modified.addAndGet(1);
+ successors = symGraph.successors(node);
+ for (int j = outdegree; j-- != 0;) canChange[successors.nextInt()] = true;
+ }
+ label.set(node, nextLabel);
+ volume.incrementAndGet(nextLabel);
+
+ delta += max - old;
+ }
+ }
+ }
+ synchronized (pl) {
+ pl.update(end - start);
+ }
+ }
+ }
+ }
+
+
+
+ private final class SimpleUncaughtExceptionHandler implements UncaughtExceptionHandler {
+ @Override
+ public void uncaughtException(final Thread t, final Throwable e) {
+ threadException = e;
+ }
+ }
+
+ private void update(final double gamma) {
+ final int n = this.n;
+ final int[] updateList = this.updateList;
+ modified.set(0);
+ nextArcs = nextNode = 0;
+
+ if (exact) {
+ if (startPerm == null) Util.identity(updateList);
+ else Util.invertPermutation(startPerm, updateList);
+ }
+
+ // Local shuffle
+ for(int i = 0; i < n;) IntArrays.shuffle(updateList, i, Math.min(i += SHUFFLE_GRANULARITY, n), r);
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ pl.expectedUpdates = n;
+ pl.logInterval = ProgressLogger.TEN_SECONDS;
+ pl.itemsName = "nodes";
+ pl.start("Starting update " + update + "...");
+
+ final Thread[] thread = new Thread[numberOfThreads];
+
+ nextArcs = nextNode = 0;
+ for (int i = 0; i < numberOfThreads; i++) {
+ thread[i] = new IterationThread(symGraph.copy(), gamma, i, pl);
+ thread[i].setUncaughtExceptionHandler(simpleUncaughtExceptionHandler);
+ thread[i].start();
+ }
+
+ for (int i = 0; i < numberOfThreads; i++)
+ try {
+ thread[i].join();
+ }
+ catch (final InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadException != null) throw new RuntimeException(threadException);
+ pl.done();
+ }
+
+
+
+ private void computeGapCost(final int[] newPerm) {
+ final int[] startPerm = this.startPerm;
+ final AtomicIntegerArray label = this.label;
+
+ Util.identity(newPerm);
+ if (startPerm != null) IntArrays.quickSort(newPerm, (x, y) -> {
+ final int t = startPerm[label.get(x)] - startPerm[label.get(y)];
+ return t != 0 ? t : startPerm[x] - startPerm[y];
+ });
+ else IntArrays.quickSort(newPerm, (x, y) -> {
+ final int t = label.get(x) - label.get(y);
+ return t != 0 ? t : x - y;
+ });
+
+ Util.invertPermutationInPlace(newPerm);
+
+ final Thread[] thread = new Thread[numberOfThreads];
+
+ nextArcs = nextNode = 0;
+ for (int i = 0; i < numberOfThreads; i++) (thread[i] = new GapCostThread(symGraph.copy(), newPerm)).start();
+
+ for (int i = 0; i < numberOfThreads; i++)
+ try {
+ thread[i].join();
+ }
+ catch (final InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+
+ private double objectiveFunction() {
+ double res = 0;
+ for (final double d : objectiveFunction) res += d;
+ return res;
+ }
+
+ private void init() {
+ for (int i = 0; i < n; i++) {
+ label.set(i, i);
+ volume.set(i, 1);
+ canChange[i] = true;
+ updateList[i] = i;
+ }
+ for (int i = 0; i < numberOfThreads; i++) objectiveFunction[i] = 0;
+ }
+
+ /**
+ * Computes the labels of a graph for a given value of &gamma; using the {@linkplain #MAX_UPDATES default maximum number of updates}.
+ *
+ * @param gamma the gamma parameter.
+ * @return the labels.
+ */
+ public AtomicIntegerArray computeLabels(final double gamma) {
+ return computeLabels(gamma, MAX_UPDATES);
+ }
+ /**
+ * Computes the labels of a graph for a given value of &gamma;.
+ *
+ * @param gamma the gamma parameter.
+ * @param maxUpdates the maximum number of updates performed.
+ * @return the labels.
+ */
+ public AtomicIntegerArray computeLabels(final double gamma, final int maxUpdates) {
+ init();
+ final String gammaFormatted = GAMMA_FORMAT.format(gamma);
+ double prevObjFun = 0;
+ double gain = 0;
+ final ProgressLogger pl = new ProgressLogger(LOGGER, "updates");
+ pl.logger().info("Running " + this.numberOfThreads + " threads");
+ pl.start("Starting iterations with gamma=" + gammaFormatted + "...");
+
+ update = 0;
+
+ do {
+ prevObjFun = objectiveFunction();
+ update(gamma);
+ pl.updateAndDisplay();
+ gain = 1 - (prevObjFun / objectiveFunction());
+ LOGGER.info("Gain: " + gain);
+ LOGGER.info("Modified: " + modified.get());
+ update++;
+ } while (modified.get() > 0 && gain > GAIN_TRESHOLD && update < maxUpdates);
+
+ pl.done();
+
+ return label;
+ }
+
+ /**
+ * Computes the final permutation of the graph using the {@linkplain #MAX_UPDATES default maximum number of updates} and
+ * the {@linkplain #DEFAULT_GAMMAS default gammas}.
+ *
+ * @param cluster if not {@code null}, clusters will be saved to a file with this name.
+ * @return the final permutation of the graph.
+ */
+ public int[] computePermutation(final String cluster) throws IOException {
+ return computePermutation(DEFAULT_GAMMAS, cluster, MAX_UPDATES);
+ }
+
+ /**
+ * Computes the final permutation of the graph using the {@linkplain #MAX_UPDATES default maximum number of updates}.
+ *
+ * @param gammas a set of parameters that will be used to generate labellings.
+ * @param cluster if not {@code null}, clusters will be saved to a file with this name.
+ * @return the final permutation of the graph.
+ */
+ public int[] computePermutation(final double[] gammas, final String cluster) throws IOException {
+ return computePermutation(gammas, cluster, MAX_UPDATES);
+ }
+
+ /**
+ * Computes the final permutation of the graph.
+ *
+ * @param gammas a set of parameters that will be used to generate labellings.
+ * @param cluster if not {@code null}, clusters will be saved to a file with this name.
+ * @param maxUpdates the maximum number of updates performed.
+ * @return the final permutation of the graph.
+ */
+ public int[] computePermutation(final double[] gammas, final String cluster, final int maxUpdates) throws IOException {
+ final int n = this.n;
+ final int m = gammas.length;
+
+ final double[] gapCosts = new double[m];
+
+ final ProgressLogger plGammas = new ProgressLogger(LOGGER);
+ plGammas.itemsName = "gammas";
+ plGammas.expectedUpdates = m;
+ plGammas.start();
+
+ for (int index = 0; index < m; index++) {
+ init();
+ final double gamma = gammas[index];
+ final String gammaFormatted = GAMMA_FORMAT.format(gamma);
+ double prevObjFun = 0;
+ double gain = 0;
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, "updates");
+ pl.logger().info("Running " + this.numberOfThreads + " threads");
+ pl.start("Starting iterations with gamma=" + gammaFormatted + " (" + (index + 1) + "/" + m + ") ...");
+
+ update = 0;
+
+ do {
+ prevObjFun = objectiveFunction();
+ update(gamma);
+ pl.updateAndDisplay();
+ gain = 1 - (prevObjFun / objectiveFunction());
+ LOGGER.info("Gain: " + gain);
+ LOGGER.info("Modified: " + modified.get());
+ update++;
+ } while (modified.get() > 0 && gain > GAIN_TRESHOLD && update < maxUpdates);
+
+ pl.done();
+
+ final int length = label.length();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(labelling + "-" + index)));
+ for (int i = 0; i < length; i++)
+ dos.writeInt(label.get(i));
+ dos.close();
+
+ if (!labelBasenameSet) new File(labelling + "-" + index).deleteOnExit();
+
+
+ gapCost.setValue(0);
+
+ computeGapCost(updateList);
+ gapCosts[index] = gapCost.doubleValue();
+ LOGGER.info("Completed iteration with gamma " + gammaFormatted + " (" + (index + 1) + "/" + m + ") , gap cost: " + gapCost.doubleValue());
+ plGammas.updateAndDisplay();
+ }
+ plGammas.done();
+
+ label = null; // We no longer need the atomic list
+
+ final int[] best = Util.identity(m);
+ IntArrays.quickSort(best, 0, best.length, (x, y) -> (int)Math.signum(gapCosts[y] - gapCosts[x]));
+
+ final int bestGamma = best[m - 1];
+ LOGGER.info("Best gamma: " + GAMMA_FORMAT.format(gammas[bestGamma]) + "\twith GapCost: " + gapCosts[bestGamma]);
+ LOGGER.info("Worst gamma: " + GAMMA_FORMAT.format(gammas[best[0]]) + "\twith GapCost: " + gapCosts[best[0]]);
+
+
+ final int intLabel[] = BinIO.loadInts(labelling + "-" + bestGamma);
+ if (startPerm != null) for (int i = 0; i < n; i++) intLabel[i] = startPerm[intLabel[i]];
+
+
+ for (int step = 0; step < m; step++) {
+ LOGGER.info("Starting step " + step + "...");
+ int[] major = BinIO.loadInts(labelling + "-" + best[step]);
+ combine(intLabel, major, startPerm, updateList);
+ major = BinIO.loadInts(labelling + "-" + bestGamma);
+ final int numberOflabels = combine(intLabel, major, startPerm, updateList);
+ LOGGER.info("Number of labels: " + numberOflabels);
+ LOGGER.info("Finished step " + step);
+ }
+
+
+ final int[] newPerm = this.updateList; // It is no longer necessary: we reuse it.
+ final int[] startPerm = this.startPerm;
+ Util.identity(newPerm);
+ if (startPerm == null) IntArrays.radixSortIndirect(newPerm, intLabel, true);
+ else IntArrays.mergeSort(newPerm, (x, y) -> {
+ final int t = intLabel[x] - intLabel[y];
+ return t != 0 ? t : startPerm[x] - startPerm[y];
+ });
+
+ if (cluster != null) {
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(cluster)));
+
+ // Printing clusters; volume is really the best saved clustering
+ BinIO.loadInts(labelling + "-" + bestGamma, intLabel);
+ int current = intLabel[newPerm[0]];
+ int j = 0;
+ for (int i = 0; i < n; i++) {
+ final int tmp = intLabel[newPerm[i]];
+ if (tmp != current) {
+ current = tmp;
+ j++;
+ }
+ dos.writeInt(j);
+ }
+ dos.close();
+ }
+
+ Util.invertPermutationInPlace(newPerm);
+
+ return newPerm;
+ }
+
+
+ public static void main(final String[] args) throws IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(LayeredLabelPropagation.class.getName(), "Runs the Layered Label Propagation algorithm on a graph.", new Parameter[] {
+ new FlaggedOption("gammas", JSAP.STRING_PARSER, "-0,-1,-2,-3,-4,-5,-6,-7,-8,-9,-10,0-0", JSAP.NOT_REQUIRED, 'g', "gammas",
+ "The set of values of gamma, expressed as a comma-separated list of dyadics k/2^j specified as [k]-j (if missing, k=1)."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("maxUpdates", JSAP.INTEGER_PARSER, Integer.toString(MAX_UPDATES), JSAP.NOT_REQUIRED, 'u', "max-updates", "Maximum number of updates."),
+ new FlaggedOption("cluster", JSAP.STRING_PARSER, null, JSAP.NOT_REQUIRED, 'c', "clusters", "Store clusters id in the given file."),
+ new Switch("random", 'r', "random", "The graph will be virtually permuted in a random fashion."),
+ new FlaggedOption("randomSeed", JSAP.LONG_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "random-seed", "The random seed."),
+ new FlaggedOption("labelBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'l', "label-basename", "A basename for label files."),
+ new Switch("mapped", 'm', "mapped", "The graph will be mapped into memory, rather than loaded. Moreover, the initial warm-up visit will be skipped."),
+ new UnflaggedOption("symGraph", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of a symmetric, loopless version of the graph."),
+ new Switch("longs", 'L', "longs", "The permutation will be saved as a list of longs."),
+ new UnflaggedOption("perm", JSAP.STRING_PARSER, JSAP.REQUIRED, "The output permutation."), });
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped");
+ final boolean random = jsapResult.getBoolean("random");
+ final int threads = jsapResult.getInt("threads");
+
+ final ImmutableGraph symGraph = mapped ? ImmutableGraph.loadMapped(jsapResult.getString("symGraph")) : ImmutableGraph.load(jsapResult.getString("symGraph"));
+ final int n = symGraph.numNodes();
+
+ int[] startPerm = mapped && ! random ? null : Util.identity(n);
+ final XoRoShiRo128PlusRandom r = jsapResult.userSpecified("randomSeed") ? new XoRoShiRo128PlusRandom(jsapResult.getLong("randomSeed")) : new XoRoShiRo128PlusRandom();
+ if (random) IntArrays.shuffle(startPerm, r);
+
+ if (startPerm != null && ! mapped) startPerm = Util.invertPermutationInPlace(DFS.dfsperm(symGraph, startPerm));
+
+ final LayeredLabelPropagation clustering = new LayeredLabelPropagation(symGraph, startPerm, threads, jsapResult.userSpecified("randomSeed") ? jsapResult.getLong("randomSeed") : r.nextLong(), false);
+ if (jsapResult.userSpecified("labelBasename")) clustering.labelBasename(jsapResult.getString("labelBasename"));
+
+ final DoubleArrayList gammas = new DoubleArrayList();
+ for (final String gamma : jsapResult.getString("gammas").split(",")) {
+ final String[] p = gamma.split("-");
+ gammas.add((p[0].length() != 0 ? Integer.parseInt(p[0]) : 1) * Math.pow(1. / 2, Integer.parseInt(p[1])));
+ }
+ Collections.sort(gammas);
+ final int[] permutation = clustering.computePermutation(gammas.toDoubleArray(), jsapResult.getString("cluster"), jsapResult.getInt("maxUpdates"));
+ if (jsapResult.userSpecified("longs")) {
+ final int length = permutation.length;
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(jsapResult.getString("perm"))));
+ for(int i = 0; i < length; i++) dos.writeLong(permutation[i]);
+ dos.close();
+ }
+ else BinIO.storeInts(permutation, jsapResult.getString("perm"));
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/RemoveHubs.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/RemoveHubs.java
new file mode 100644
index 0000000..74893d7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/RemoveHubs.java
@@ -0,0 +1,307 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Arrays;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongOpenHashBigSet;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.rank.PageRank;
+import it.unimi.dsi.law.rank.PageRankParallelPowerSeries;
+import it.unimi.dsi.law.rank.SpectralRanking;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.ImmutableSubgraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.Transform;
+
+//RELEASE-STATUS: DIST
+
+/**
+ * Removes nodes from a graph following a number of strategies.
+ *
+ * <p>This class has been used to perform the experiments described by Paolo Boldi, Marco Rosa, and
+ * Sebastiano Vigna in &ldquo;Robustness of Social Networks: Comparative Results Based on Distance
+ * Distributions&rdquo;, <i>Proceedings of the Third international Conference, SocInfo 2011</i>,
+ * volume 6894 of Lecture Notes in Computer Science, pages 8&minus;21, Springer, 2011. The
+ * implemented removal strategies are largest outdegree, label propagation, PageRank, PageRank on
+ * the symmetrized graph, random and near-root (see the paper).
+ *
+ * <p>For each method and for each fraction of arcs two graph will be stored, with the following
+ * basenames: <code><var>dest</var>-method-fraction</code> and
+ * <code><var>dest</var>-method-fraction-tr</code>.
+ *
+ * <p>For each strategy and each fraction of arcs to be removed, nodes are removed from the original
+ * graph (and its transpose) according to the total order specified by the strategy until the
+ * specified fraction of arcs is removed.
+ *
+ * @author Marco Rosa
+ * @author Sebastiano Vigna
+ */
+
+public class RemoveHubs {
+ private final static Logger LOGGER = LoggerFactory.getLogger(RemoveHubs.class);
+
+ private final static int[] reverse(final int[] perm) {
+ final int length = perm.length;
+ for(int i = length / 2; i-- != 0;) {
+ final int t = perm[i];
+ perm[i] = perm[length - i - 1];
+ perm[length - i - 1] = t;
+ }
+
+ return perm;
+ }
+
+ protected static int[] store(final ImmutableGraph g, final ImmutableGraph gt, final double[] fraction, final int[] perm, final String dest, final String method) throws IOException {
+ final int n = g.numNodes();
+ final long m = g.numArcs();
+ final IntSet sub = new IntOpenHashSet(Util.identity(n));
+ final LongOpenHashBigSet removedArcs = new LongOpenHashBigSet();
+ final int[] cut = new int[fraction.length];
+ int count = 0;
+ int i = n;
+
+ BinIO.storeInts(perm, dest + "-" + method + ".perm");
+
+ for (int j = 0; j < fraction.length; j++) {
+ LOGGER.info("Storing fraction " + fraction[j] + "...");
+ for (; i-- != 0 && count < (fraction[j] * m);) {
+ final int x = perm[i];
+ sub.remove(x);
+ final LazyIntIterator successors = g.successors(x);
+ for(int s; (s = successors.nextInt()) != -1;) {
+ if (removedArcs.add((long)x << 32 | s)) count++;
+ }
+ final LazyIntIterator predecessors = gt.successors(x);
+ for(int p; (p = predecessors.nextInt()) != -1;) {
+ if (removedArcs.add((long)p << 32 | x)) count++;
+ }
+ }
+ if (dest != null) {
+ final int[] node = sub.toIntArray();
+ Arrays.sort(node);
+ BinIO.storeInts(node, dest + "-" + method + "-" + fraction[j] + ".subgraph");
+ BVGraph.store(new ImmutableSubgraph(g, sub), dest + "-" + method + "-" + fraction[j], 4, 2, 0, -1, 0);
+ BVGraph.store(new ImmutableSubgraph(gt, sub), dest + "-" + method + "-" + fraction[j] + "-t", 4, 2, 0, -1, 0);
+ }
+ cut[j] = i;
+ }
+ return cut;
+ }
+
+ protected static double[] pr(final ImmutableGraph gt) throws IOException {
+ final PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(gt, 0, LOGGER);
+ pr.alpha = .85;
+ pr.stepUntil(PageRank.or(new SpectralRanking.NormStoppingCriterion(1E-14), new SpectralRanking.IterationNumberStoppingCriterion(PageRank.DEFAULT_MAX_ITER)));
+ return pr.rank;
+ }
+
+ protected static int[] largestOutdegree(final ImmutableGraph g) {
+ LOGGER.info("Removing by outdegree...");
+ final int n = g.numNodes();
+
+ final int[] degree = new int[n];
+ for (final NodeIterator i = g.nodeIterator(); i.hasNext(); degree[i.nextInt()] = i.outdegree());
+
+ final int[] perm = Util.identity(n);
+ IntArrays.radixSortIndirect(perm, degree, false);
+
+ return perm;
+ }
+
+ protected static int[] largestIndegree(final ImmutableGraph gt) {
+ LOGGER.info("Removing by indegree...");
+ final int n = gt.numNodes();
+
+ final int[] degree = new int[n];
+ for (final NodeIterator i = gt.nodeIterator(); i.hasNext(); degree[i.nextInt()] = i.outdegree());
+
+ final int[] perm = Util.identity(n);
+ IntArrays.radixSortIndirect(perm, degree, false);
+
+ return perm;
+ }
+
+ protected static int[] labelPropagation(final ImmutableGraph symGraph) throws IOException {
+ LOGGER.info("Removing by LP...");
+ final int n = symGraph.numNodes();
+
+ final LayeredLabelPropagation clustering = new LayeredLabelPropagation(symGraph, null, 0);
+ final AtomicIntegerArray l = clustering.computeLabels(0);
+
+ final int[] external = new int[n];
+ for (final NodeIterator i = symGraph.nodeIterator(); i.hasNext();) {
+ final int node = i.nextInt();
+ final int deg = i.outdegree();
+ final int[] succ = i.successorArray();
+ final int label = l.get(node);
+ for (int j = 0; j < deg; j++) {
+ if (l.get(succ[j]) != label)
+ external[node]++;
+ }
+ }
+
+ final int[] perm = Util.identity(n);
+ IntArrays.quickSort(perm, (x, y) -> {
+ final int tmp = l.get(x) - l.get(y);
+ if (tmp != 0) return tmp;
+ else return external[y] - external[x];
+ });
+
+ int pos = 0;
+ int currentLabel = l.get(perm[0]);
+ for (int i = 0; i < n; i++) {
+ final int node = perm[i];
+ final int label = l.get(node);
+ if (label != currentLabel) {
+ pos = 0;
+ currentLabel = l.get(node);
+ }
+ external[node] = ++pos;
+ }
+
+ IntArrays.radixSortIndirect(perm, external, false);
+ return reverse(perm);
+ }
+
+ protected static int[] pageRank(final ImmutableGraph gt) throws IOException {
+ LOGGER.info("Removing by PageRank...");
+ return rank(gt, pr(gt));
+ }
+
+ protected static int[] rank(final ImmutableGraph g, final double[] rank) {
+ final int n = g.numNodes();
+
+ final int[] perm = Util.identity(n);
+ IntArrays.quickSort(perm, (x, y) -> (int)Math.signum(rank[x] - rank[y]));
+
+ return perm;
+ }
+
+ protected static int[] random(final ImmutableGraph g) {
+ LOGGER.info("Removing randomly...");
+ final int n = g.numNodes();
+ final int[] perm = IntArrays.shuffle(Util.identity(n), new XoRoShiRo128PlusRandom(0));
+ return perm;
+ }
+
+ protected static int[] symPageRank(final ImmutableGraph g, final ImmutableGraph gt) throws IOException {
+ LOGGER.info("Removing by symmetric PageRank...");
+ final int n = g.numNodes();
+
+ final double[] rank = pr(Transform.symmetrize(g, gt, new ProgressLogger(LOGGER)));
+ final int[] perm = Util.identity(n);
+ IntArrays.quickSort(perm, (x, y) -> (int)Math.signum(rank[x] - rank[y]));
+
+ return perm;
+ }
+
+ protected static int[] url(final ImmutableGraph g, final FastBufferedReader fastBufferedReader) throws IOException {
+ LOGGER.info("Removing roots...");
+ final int n = g.numNodes();
+
+ final int[] slashes = new int[n];
+
+ final MutableString s = new MutableString();
+ for (int i = 0; i < n; i++) {
+ fastBufferedReader.readLine(s);
+ final char[] a = s.array();
+ int t = 0;
+ for (int j = s.length()-1; j-- != 0;) if (a[j] == '/') t++;
+ slashes[i] = t;
+ }
+
+ final int[] perm = Util.identity(n);
+ IntArrays.radixSortIndirect(perm, slashes, false);
+ return reverse(perm);
+ }
+
+ public static void main(final String[] args) throws IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(RemoveHubs.class.getName(), "Searches and removes hubs in a given graph.", new Parameter[] {
+ new Switch("urls", 'u', "urls", "removes homepages (the algorithm expect that urls are given from standard input)."),
+ new Switch("lp", 'l', "lp", "removes hubs by label propagation."),
+ new Switch("degree", 'd', "degree", "removes hubs by largest outdegree."),
+ new Switch("pr", 'p', "pr", "removes hubs by PageRank."),
+ new FlaggedOption("rank", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'R', "rank", "removes hubs using an explicit rank."),
+ new Switch("random", 'r', "random", "removes hubs randomly. "),
+ new FlaggedOption("fraction", JSAP.STRING_PARSER, "0.05,0.1,0.15,0.2,0.3", JSAP.REQUIRED, 'f', "fraction", "A list of comma-separated values representing the fraction of arcs that must be removed."),
+ new UnflaggedOption("g", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the graph."),
+ new UnflaggedOption("gt", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the transposed graph."),
+ new UnflaggedOption("sym", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the symmetrized graph."),
+ new UnflaggedOption("dest", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the resulting subgraphs.") });
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted())
+ System.exit(1);
+
+ final ImmutableGraph g = ImmutableGraph.load(jsapResult.getString("g"));
+ final ImmutableGraph gt = ImmutableGraph.load(jsapResult.getString("gt"));
+ final ImmutableGraph sym = ImmutableGraph.load(jsapResult.getString("sym"));
+ final String dest = jsapResult.getString("dest");
+
+ final DoubleArrayList fraction = new DoubleArrayList();
+ for (final String f : jsapResult.getString("fraction").split(","))
+ fraction.add(Double.parseDouble(f));
+ final double[] f = fraction.toDoubleArray();
+ Arrays.sort(f);
+
+ if (jsapResult.userSpecified("urls"))
+ store(g, gt, f, url(g, new FastBufferedReader(new InputStreamReader(System.in, Charsets.ISO_8859_1))), dest, "urls");
+ if (jsapResult.userSpecified("lp"))
+ store(g, gt, f, labelPropagation(sym), dest, "lp");
+ if (jsapResult.userSpecified("pr"))
+ store(g, gt, f, pageRank(gt), dest, "pr");
+ if (jsapResult.userSpecified("random"))
+ store(g, gt, f, random(g), dest, "rnd");
+ if (jsapResult.userSpecified("degree")) {
+ store(g, gt, f, largestOutdegree(g), dest, "outdeg");
+ store(g, gt, f, largestIndegree(gt), dest, "indeg");
+ }
+ if (jsapResult.userSpecified("rank")) {
+ store(g, gt, f, rank(g, BinIO.loadDoubles(jsapResult.getString("rank"))), dest, "rank");
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/package.html
new file mode 100644
index 0000000..b2c5612
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/graph/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>LAW software</title>
+ </head>
+
+ <body>
+
+ <P>Graph-related classes.
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/DataInput2Text.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/DataInput2Text.java
new file mode 100644
index 0000000..813ef7e
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/DataInput2Text.java
@@ -0,0 +1,138 @@
+package it.unimi.dsi.law.io.tool;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.DataInputStream;
+import java.io.DataOutput;
+import java.io.EOFException;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.PrintStream;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.objects.Reference2IntOpenHashMap;
+
+// RELEASE-STATUS: DIST
+
+/**
+ * The main method of this class converts a binary {@link DataOutput} file containing numbers to text format.
+ */
+public class DataInput2Text {
+
+ private DataInput2Text() {}
+
+ protected static enum Type { BYTE, SHORT, INT, LONG, FLOAT, DOUBLE };
+ private static final Reference2IntOpenHashMap<Type> type2size = new Reference2IntOpenHashMap<>(
+ new Type[] { Type.BYTE, Type.SHORT, Type.INT, Type.LONG, Type.FLOAT, Type.DOUBLE }, new int[] { 1, 2, 4, 8, 4, 8 }
+ );
+
+ /** Skips the given number of items; returns true if we went beyond EOF. */
+ private static boolean skipItems(final FastBufferedInputStream fbis, final long l, final int sizeOfItem) throws IOException {
+ if (l == 0) return false;
+ final long toBeSkipped = sizeOfItem * l;
+ return fbis.skip(toBeSkipped) < toBeSkipped;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(Text2DataOutput.class.getName(), "Converts a binary (DataOutput) file containing numbers to text format.",
+ new Parameter[] {
+ new UnflaggedOption("binaryFile", JSAP.STRING_PARSER, "-", JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The input binary file, - for stdin."),
+ new UnflaggedOption("textFile", JSAP.STRING_PARSER, "-", JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The output text file, - for stdout."),
+ new FlaggedOption("type", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 't', "type", "The binary data type (byte, int, short, long, double, float)."),
+ new FlaggedOption("from", JSAP.LONGSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'f', "from", "Start from the given item (inclusive), 0-based."),
+ new FlaggedOption("number", JSAP.LONGSIZE_PARSER, Long.toString(Long.MAX_VALUE), JSAP.NOT_REQUIRED, 'n', "number", "Maximum number of elements that will be output."),
+ new FlaggedOption("step", JSAP.LONGSIZE_PARSER, "1", JSAP.NOT_REQUIRED, 's', "step", "Only output one out of step elements."),
+ });
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final Type type = Type.valueOf(jsapResult.getString("type").toUpperCase());
+ final int sizeOfItem = type2size.getInt(type);
+ final long fromItem = jsapResult.getLong("from");
+ final long step = jsapResult.getLong("step");
+ long howManyItems = jsapResult.getLong("number");
+
+ final String inFilename = jsapResult.getString("binaryFile");
+ final String outFilename = jsapResult.getString("textFile");
+
+ // This double-buffers stdin, but gives us a uniform interface to skipping.
+ final FastBufferedInputStream fbis = new FastBufferedInputStream(inFilename.equals("-") ? System.in : new FileInputStream(inFilename));
+ final DataInputStream dis = new DataInputStream(fbis);
+ final PrintStream out = outFilename.equals("-") ? System.out : new PrintStream(new FastBufferedOutputStream(new FileOutputStream(outFilename)), false, Charsets.ISO_8859_1.toString());
+
+ skipItems(fbis, fromItem, sizeOfItem);
+
+ try {
+ switch (type) {
+ case BYTE:
+ while (howManyItems-- > 0) {
+ out.println(dis.readByte());
+ if (step > 1 && skipItems(fbis, step - 1, sizeOfItem)) break;
+ }
+ break;
+ case SHORT:
+ while (howManyItems-- > 0) {
+ out.println(dis.readShort());
+ if (step > 1 && skipItems(fbis, step - 1, sizeOfItem)) break;
+ }
+ break;
+ case INT:
+ while (howManyItems-- > 0) {
+ out.println(dis.readInt());
+ if (step > 1 && skipItems(fbis, step - 1, sizeOfItem)) break;
+ }
+ break;
+ case LONG:
+ while (howManyItems-- > 0) {
+ out.println(dis.readLong());
+ if (step > 1 && skipItems(fbis, step - 1, sizeOfItem)) break;
+ }
+ break;
+ case FLOAT:
+ while (howManyItems-- > 0) {
+ out.println(dis.readFloat());
+ if (step > 1 && skipItems(fbis, step - 1, sizeOfItem)) break;
+ }
+ break;
+ case DOUBLE:
+ while (howManyItems-- > 0) {
+ out.println(dis.readDouble());
+ if (step > 1 && skipItems(fbis, step - 1, sizeOfItem)) break;
+ }
+ break;
+ }
+ }
+ catch (final EOFException eof) {}
+
+ dis.close();
+ out.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/NumberDistinctLines.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/NumberDistinctLines.java
new file mode 100644
index 0000000..3486e71
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/NumberDistinctLines.java
@@ -0,0 +1,112 @@
+package it.unimi.dsi.law.io.tool;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.Long2IntOpenHashMap;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.util.CRC64;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.StringMap;
+
+// RELEASE-STATUS: DIST
+
+/**
+ * The main method of this class reads a UTF-8 file containg a newline separated
+ * list of strings and writes a {@link java.io.DataOutputStream} containing a
+ * list of ints such that the <var>i</var>-th int is equal to the <var>j</var>-th
+ * int iff the ({@linkplain it.unimi.dsi.law.util.CRC64 crc} of the) <var>i</var>-th
+ * string is equal to the ({@linkplain it.unimi.dsi.law.util.CRC64 crc} of
+ * the) <var>j</var>-th string. The minimum int will be 0 and the maximum int
+ * will be equal to the number of different strings minus one.
+ */
+
+public class NumberDistinctLines {
+ private final static Logger LOGGER = LoggerFactory.getLogger(NumberDistinctLines.class);
+
+ private final static int BUFFER_SIZE = 1024 * 1024;
+
+ private NumberDistinctLines() {}
+
+ @SuppressWarnings("unchecked")
+ public static void main(final String arg[]) throws IOException, JSAPException, ClassNotFoundException {
+ final SimpleJSAP jsap = new SimpleJSAP(NumberDistinctLines.class.getName(), "Numeber distinct lines.",
+ new Parameter[] {
+ new FlaggedOption("restrictFile", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'r', "restrict", "The SignedMinimalPerfectHash of the only strings to be considered."),
+ new UnflaggedOption("stringsFile", JSAP.STRING_PARSER, JSAP.REQUIRED, "The intput file of UTF-8 strings to number."),
+ new UnflaggedOption("intsFile", JSAP.STRING_PARSER, JSAP.REQUIRED, "The output file of ints numbering distinct lines.")
+ });
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final Long2IntOpenHashMap crc2int = new Long2IntOpenHashMap();
+ crc2int.defaultReturnValue(-1);
+
+ final String inFileName = jsapResult.getString("stringsFile"), outFileName = jsapResult.getString("intsFile");
+
+ final FastBufferedReader in = new FastBufferedReader(new InputStreamReader(inFileName.equals("-") ? System.in : new FileInputStream(inFileName), "UTF-8"), BUFFER_SIZE);
+ final DataOutputStream out = new DataOutputStream(new BufferedOutputStream(outFileName.equals("-") ? (OutputStream)System.out : new FileOutputStream(outFileName), BUFFER_SIZE));
+
+ final StringMap<? extends CharSequence> restrict = jsapResult.contains("restrictFile") ? (StringMap<? extends CharSequence>)BinIO.loadObject(jsapResult.getString("restrictFile")) : null;
+
+ final ProgressLogger pm = new ProgressLogger(LOGGER, "lines");
+ pm.start("Converting...");
+
+ final MutableString s = new MutableString();
+ int numStrings = 0;
+ while ((in.readLine(s)) != null) {
+ if (restrict != null && restrict.getLong(s) == -1) {
+ out.writeInt(-1);
+ } else {
+ int i;
+ final long crc = CRC64.compute(s);
+ if ((i = crc2int.get(crc)) == -1) crc2int.put(crc, i = numStrings++);
+ out.writeInt(i);
+ }
+ pm.update();
+ }
+
+ in.close();
+ out.close();
+
+ pm.stop("Done. Number of different strings " + numStrings + " (map size " + crc2int.size() + ").");
+
+ }
+}
+
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/Text2DataOutput.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/Text2DataOutput.java
new file mode 100644
index 0000000..8810cc2
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/Text2DataOutput.java
@@ -0,0 +1,94 @@
+package it.unimi.dsi.law.io.tool;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.DataOutput;
+import java.io.DataOutputStream;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.io.tool.DataInput2Text.Type;
+
+// RELEASE-STATUS: DIST
+
+/** The main method of this class converts converts a text file containing numbers to binary {@link DataOutput} format. The input must be in a format understandable by
+ * the {@code decode()} method of integer primivite types (e.g., {@link Integer#decode(String)}). */
+
+public class Text2DataOutput {
+
+ private Text2DataOutput() {}
+
+ public static void main(final String[] arg) throws IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(Text2DataOutput.class.getName(), "Converts a text file containing numbers to binary (DataOutput) format. Inputs of integer type can be written in any format accepted by the decode() family of parsing methods.",
+ new Parameter[] {
+ new FlaggedOption("type", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 't', "type", "The binary data type (byte, int, short, long, double, float)."),
+ new UnflaggedOption("textFile", JSAP.STRING_PARSER, "-", JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The intput text file, - for stdin."),
+ new UnflaggedOption("binaryFile", JSAP.STRING_PARSER, "-", JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The output binary file, - for stdout."),
+ });
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final Type type = Type.valueOf(jsapResult.getString("type").toUpperCase());
+ final String inFilename = jsapResult.getString("textFile");
+ final String outFilename = jsapResult.getString("binaryFile");
+ final FastBufferedReader in = new FastBufferedReader(new InputStreamReader(inFilename.equals("-") ? System.in : new FileInputStream(inFilename), Charsets.ISO_8859_1.toString()));
+ final DataOutputStream out = new DataOutputStream(new FastBufferedOutputStream(outFilename.equals("-") ? System.out : new FileOutputStream(outFilename)));
+
+ final MutableString s = new MutableString();
+
+ switch (type) {
+ case BYTE:
+ while ((in.readLine(s)) != null) out.writeByte(Byte.decode(s.trimRight().toString()).byteValue());
+ break;
+ case INT:
+ while ((in.readLine(s)) != null) out.writeInt(Integer.decode(s.trimRight().toString()).intValue());
+ break;
+ case SHORT:
+ while ((in.readLine(s)) != null) out.writeShort(Short.decode(s.trimRight().toString()).shortValue());
+ break;
+ case LONG:
+ while ((in.readLine(s)) != null) out.writeLong(Long.decode(s.trimRight().toString()).longValue());
+ break;
+ case FLOAT:
+ while ((in.readLine(s)) != null) out.writeFloat(Float.parseFloat(s.trimRight().toString()));
+ break;
+ case DOUBLE:
+ while ((in.readLine(s)) != null) out.writeDouble(Double.parseDouble(s.trimRight().toString()));
+ break;
+ }
+
+ in.close();
+ out.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/package.html
new file mode 100644
index 0000000..ee080cb
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/io/tool/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>LAW software</title>
+ </head>
+
+ <body>
+
+ <P>Tools manipulating and converting files.
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/package.html
new file mode 100644
index 0000000..b013ed8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>Basic classes</title>
+ </head>
+
+ <body>
+
+ <P>Basic classes.
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/DominantEigenvectorParallelPowerMethod.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/DominantEigenvectorParallelPowerMethod.java
new file mode 100644
index 0000000..6a40ab3
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/DominantEigenvectorParallelPowerMethod.java
@@ -0,0 +1,382 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes the left dominant eigenvalue and eigenvector of a graph using a parallel implementation of the power method.
+ * At the end of the computation, {@link #lambda} will contain an approximation of the dominant eigenvalue.
+ *
+ * <ul>
+ * <li>If the {@linkplain #markovian Markovian flag} has been set, the rows of the adjacency matrix will be &#x2113;<sub>1</sub>-normalized.
+ *
+ * <li>{@link #normDelta()} returns the difference in {@linkplain #norm} between the previous approximation of the dominant eigenvector
+ * and the product of the previous approximation by the graph, divided by the current estimate of the dominant eigenvalue obtained
+ * by Rayleigh quotients.
+ *
+ * <li>The {@link #step()} method is not available: due to the need for some synchronization logic, only {@link #stepUntil(StoppingCriterion)}
+ * is available.
+ *
+ * <li>The computed eigenvector will be a unit vector in the specified
+ * {@linkplain #norm norm}. In particular, in the Markovian case if you use a norm different from {@link Norm#L_1} the dominant eigenvector will need to be
+ * &#x2113;<sub>1</sub>-normalized to be a distribution.
+ *
+ * <li>It is <strong>strongly suggested</strong> that you apply a <em>{@linkplain #shift}</em> (e.g., -1). A negative shift
+ * guarantees convergence even in the presence of eigenvalues of maximum modulus that are not real positive (see a numerical
+ * linear algebra textbook for details).
+ *
+ * </ul>
+ * @see SpectralRanking
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class DominantEigenvectorParallelPowerMethod extends SpectralRanking {
+ private final static Logger LOGGER = LoggerFactory.getLogger(DominantEigenvectorParallelPowerMethod.class);
+
+ /** The default norm ({@link Norm#L_2}). Works well with Rayleigh-quotient estimation of the dominant eigenvalue. */
+ public final static Norm DEFAULT_DOMINANT_EIGENVECTOR_NORM = Norm.L_2;
+
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** Accumulates the numerator of the Rayleigh quotient. */
+ private final KahanSummation rayleighQuotientNumerator;
+ /** Accumulates the denominator of the Rayleigh quotient. */
+ private final KahanSummation rayleighQuotientDenominator;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** The outdegree of each node ({@code null} if {@link #markovian} is false). */
+ private int[] outdegree;
+ /** The rank vector after the last iteration (only meaningful after at least one step). */
+ public double[] previousRank;
+ /** if true, the matrix will be stocasticized. */
+ public boolean markovian;
+ /** A shift. */
+ public double shift;
+ /** The dominant eigenvalue. */
+ public double lambda;
+ /** The norm. */
+ public Norm norm = DEFAULT_DOMINANT_EIGENVECTOR_NORM;
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public DominantEigenvectorParallelPowerMethod(final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ rayleighQuotientNumerator = new KahanSummation();
+ rayleighQuotientDenominator = new KahanSummation();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public DominantEigenvectorParallelPowerMethod(final ImmutableGraph transpose, final Logger logger) {
+ this(transpose, 0, logger);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph.
+ */
+ public DominantEigenvectorParallelPowerMethod(final ImmutableGraph transpose) {
+ this(transpose, LOGGER);
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+ completed = false;
+ logger.info("Norm: " + norm);
+ logger.info("Shift: " + shift);
+ logger.info("Markovian: " + markovian);
+
+ if (markovian && outdegree == null) {
+ // Compute outdegrees
+ outdegree = new int[n];
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) outdegree[s]++;
+ }
+ }
+ else outdegree = null;
+
+ // Creates the arrays, if necessary
+ if (previousRank == null) previousRank = new double[n];
+ Arrays.fill(rank, 1);
+ norm.normalize(rank, 1);
+
+ rayleighQuotientNumerator.reset();
+ rayleighQuotientDenominator.reset();
+
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 10000;
+
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = DominantEigenvectorParallelPowerMethod.this.graph.copy();
+ final int n = DominantEigenvectorParallelPowerMethod.this.n;
+ final int[] outdegree = DominantEigenvectorParallelPowerMethod.this.outdegree;
+ final boolean markovian = DominantEigenvectorParallelPowerMethod.this.markovian;
+ final double shift = DominantEigenvectorParallelPowerMethod.this.shift;
+ final KahanSummation s = new KahanSummation(), rayleighQuotientNumerator = new KahanSummation(), rayleighQuotientDenominator = new KahanSummation();
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final double[] oldRank = rank, newRank = previousRank;
+
+ rayleighQuotientNumerator.reset();
+ rayleighQuotientDenominator.reset();
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ final NodeIterator nodeIterator = graph.nodeIterator((int)start);
+
+ for(int i = (int)start; i < end; i++) {
+ nodeIterator.nextInt();
+ int indegree = nodeIterator.outdegree();
+ s.reset();
+
+ if (indegree != 0) {
+ final int[] pred = nodeIterator.successorArray();
+ if (markovian) while (indegree-- != 0) s.add(oldRank[pred[indegree]] / outdegree[pred[indegree]]);
+ else while (indegree-- != 0) s.add(oldRank[pred[indegree]]);
+ }
+
+ final double old = oldRank[i];
+
+ newRank[i] = s.value();
+ if (shift != 0) newRank[i] -= shift * old;
+
+ rayleighQuotientNumerator.add(newRank[i] * old);
+ rayleighQuotientDenominator.add(old * old);
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+ }
+
+ synchronized(DominantEigenvectorParallelPowerMethod.this) {
+ DominantEigenvectorParallelPowerMethod.this.rayleighQuotientNumerator.add(rayleighQuotientNumerator.value());
+ DominantEigenvectorParallelPowerMethod.this.rayleighQuotientDenominator.add(rayleighQuotientDenominator.value());
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ @Override
+ public void run() {
+ if (iteration > 0) {
+ progressLogger.done();
+
+ final double t[] = rank;
+ rank = previousRank;
+ previousRank = t;
+
+ // Rayleigh quotient
+ lambda = rayleighQuotientNumerator.value() / rayleighQuotientDenominator.value();
+ final double oneOverLambda = 1 / lambda;
+ for(int i = rank.length; i-- != 0;) rank[i] *= oneOverLambda;
+ lambda += shift;
+ logger.info("Current estimate of the dominant eigenvalue: " + lambda);
+
+ //System.err.println("lambda: " + lambda + " mu: " + norm.compute(rank) + " difference: " + (lambda - norm.compute(rank)));
+
+ if (stoppingCriterion.shouldStop(DominantEigenvectorParallelPowerMethod.this)) completed = true;
+
+ norm.normalize(rank, 1);
+ rayleighQuotientNumerator.reset();
+ rayleighQuotientDenominator.reset();
+
+ // for(int i = n; i-- != 0;) rank[i] = (rank[i] + previousRank[i]) / 2;
+ iterationLogger.setAndDisplay(iteration);
+ if (completed) return;
+ }
+
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + (iteration++) + "...");
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+ @Override
+ public double normDelta() {
+ return norm.compute(rank, previousRank);
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ previousRank = null;
+ outdegree = null;
+ }
+
+ /**
+ * Returns a Properties object that contains all the parameters used by the computation.
+ *
+ * @param graphBasename the basename of the graph.
+ * @return a properties object that represent all the parameters used to calculate the rank.
+ */
+ @Override
+ public Properties buildProperties(String graphBasename) {
+ final Properties prop = super.buildProperties(graphBasename);
+ prop.setProperty("norm", norm);
+ prop.setProperty("lambda", Double.toString(lambda - shift));
+ prop.setProperty("markovian", Boolean.toString(markovian));
+ prop.setProperty("shift", shift);
+ return prop;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException {
+
+ SimpleJSAP jsap = new SimpleJSAP(DominantEigenvectorParallelPowerMethod.class.getName(), "Computes the dominant eigenvalue and eigenvector of a graph, given its transpose, using a parallel implementation of the power method."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold to determine whether to stop."),
+ new FlaggedOption("shift", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "shift", "A shift for the power method."),
+ new Switch("markovian", 'M', "markovian", "Stocasticise the matrix."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph"),
+ new FlaggedOption("norm", JSAP.STRING_PARSER, DEFAULT_DOMINANT_EIGENVECTOR_NORM.toString(), JSAP.NOT_REQUIRED, 'n', "norm", "Norm type. Possible values: " + Arrays.toString(Norm.values())),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped");
+ final boolean markovian = jsapResult.getBoolean("markovian");
+ final String graphBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String norm = jsapResult.getString("norm");
+ final double shift = jsapResult.getDouble("shift", 0);
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+
+ DominantEigenvectorParallelPowerMethod pr = new DominantEigenvectorParallelPowerMethod(graph, threads, LOGGER);
+ pr.markovian = markovian;
+ pr.norm = Norm.valueOf(norm);
+ pr.shift = shift;
+
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename +".ranks");
+ pr.buildProperties(graphBasename).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/KatzParallelGaussSeidel.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/KatzParallelGaussSeidel.java
new file mode 100644
index 0000000..05005a7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/KatzParallelGaussSeidel.java
@@ -0,0 +1,415 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.DataInput;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterators;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes Katz's index using a parallel implementation of the Gau&szlig;&ndash;Seidel method; this is the implementation of choice to be used when computing Katz's index.
+ * It uses less memory (just one vector of doubles) and, experimentally, converges faster than any other implementation. Moreover, it
+ * scales linearly with the number of cores.
+ *
+ * <p><strong>Warning</strong>: Since we need to enumerate the <em>predecessors</em> a node,
+ * you must pass to the {@linkplain #KatzParallelGaussSeidel(ImmutableGraph, int, Logger) constructor} the <strong>transpose</strong>
+ * of the graph.
+ *
+ * <p>This class approximates the infinite summation
+ * <div style="text-align: center">
+ * <var><b>v</b></var> <big>&Sigma;</big><sub><var>k</var> &ge; 0</sub> (&alpha;<var>G</var>)<sup><var>k</var></sup> = <var><b>v</b></var>(1 &minus; &alpha;<var>G</var>)<sup>-1</sup>
+ * </div>
+ * by solving iteratively the linear system
+ * <div style="text-align: center">
+ * <var><b>x</b></var> (1 &minus; &alpha;<var>G</var>) = <var><b>v</b></var>,
+ * </div>
+ * where <var><b>v</b></var> is a <em>{@linkplain #preference preference vector}</em> that defaults to <b>1</b>, and
+ * <var>G</var> is the graph adjacency matrix. Note that the {@link #step()} method is not available: due to the need for some synchronization logic, only {@link #stepUntil(StoppingCriterion)}
+ * is available.
+ *
+ * <p>Technically, the iteration performed by this class is a <em>step-asynchronous</em> Gau&szlig;&ndash;Seidel iteration: we simply start a number of
+ * threads, and each thread updates a value using a Gau&szlig;&ndash;Seidel-like rule.
+ * As a result, each update uses some old and some new values: in other
+ * words, the <em>regular splitting</em>
+ * <div style="text-align: center; margin: 1em">
+ * <var>M &minus; N</var> = 1 &minus; &alpha;<var>G</var>
+ * </div>
+ * of the matrix associated to each update is always different (in a Gau&szlig;&ndash;Seidel iteration,
+ * <var>M</var> is upper triangular, and <var>N</var> is strictly lower triangular). Nonetheless, it is easy to check that
+ * <var>M</var> is still (up to permutation) upper triangular and invertible, independently of the specific update sequence.
+ *
+ * <p>{@link #normDelta()} returns the following values:
+ * <ul>
+ * <li>if a {@linkplain #normVector(double[], double) suitable norm vector has been set}, an upper bound on the error (the &#x2113;<sub>&#x221E;</sub> distance from the rank to be computed);
+ * <li>otherwise, an estimate of the &#x2113;<sub>&#x221E;</sub> norm of the residual obtained by multiplying the &#x2113;<sub>&#x221E;</sub> norm
+ * of the difference between the last two approximations by the &#x2113;<sub>&#x221E;</sub> norm of &alpha;<var>G</var>
+ * (i.e., the maximum indegree multiplied by &alpha;).</ul>
+ *
+ * <p>To be able to set a norm vector, you need to use {@link PowerSeries} to compute a suitable vector.
+ * To do so, you must provide an &alpha; and use the {@link PowerSeries#MAX_RATIO_STOPPING_CRITERION}. If the computation
+ * terminates without errors with maximum ratio &sigma;, the {@linkplain PowerSeries#previousRank resulting vector} can be used
+ * with this class for all &alpha; &lt; 1 / &sigma; (strictness
+ * is essential). Note that this is the <em>only</em> way to obtain a bound on the error (unless
+ * your graph is so small that you can evaluate the &#x2113;<sub>&#x221E;</sub> norm of (1 &minus; &alpha;<var>G</var>)<sup>-1</sup> and use it to bound the error using the residual).
+ * Details about the method are described by Sebastiano Vigna in &ldquo;<a href="http://vigna.di.unimi.it/papers.php#VigSNCSASOM">Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices</a>&ldquo;, 2014.
+ *
+ * @see PageRank
+ * @see SpectralRanking
+ * @see PowerSeries
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class KatzParallelGaussSeidel extends SpectralRanking {
+ private final static Logger LOGGER = LoggerFactory.getLogger(KatzParallelGaussSeidel.class);
+
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** The norm of the difference vector between the new approximation and the previous one. */
+ private double normDelta;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** An array of bytes containing the opposite of a lower bound on the binary logarithm of the elements of a norm vector, or {@code null} to stop the computation using residue estimation. */
+ private byte[] normVector;
+ /** The value for which {@link #normVector} is suitable. */
+ private double sigma;
+ /** The maximum indegree, or -1 if the maximum indegree has not been computed yet. */
+ private int maxIndegree = -1;
+
+ /** The attenuation factor. Must be smaller than the reciprocal of the dominant eigenvalue. */
+ public double alpha;
+ /** The preference vector to be used (or {@code null} if the uniform preference vector should be used). */
+ public DoubleList preference;
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute Katz's index.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public KatzParallelGaussSeidel(final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute Katz's index.
+ */
+ public KatzParallelGaussSeidel(final ImmutableGraph transpose) {
+ this(transpose, 0, LOGGER);
+ }
+
+ /** Sets the norm vector.
+ *
+ * @param normVectorFilename a file containing a norm vector as a list of doubles in {@link DataInput} format, or {@code null} for no norm vector.
+ * @param sigma the value for which the provided norm vector is suitable.
+ */
+ public void normVector(final String normVectorFilename, final double sigma) throws IOException {
+ normVector = normVectorFilename == null ? null : approximateNormVector(BinIO.asDoubleIterator(normVectorFilename));
+ this.sigma = sigma;
+ }
+
+ /** Sets the norm vector.
+ *
+ * @param normVector the new norm vector.
+ * @param sigma the value for which the provided norm vector is suitable.
+ */
+ public void normVector(final double[] normVector, final double sigma) {
+ this.normVector = approximateNormVector(DoubleIterators.wrap(normVector));
+ this.sigma = sigma;
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+
+ logger.info("Attentuation factor: " + alpha);
+
+ if (normVector != null && alpha >= 1 / sigma) throw new IllegalStateException("The specified norm vector can be used only with values of alpha smaller than " + 1 / sigma);
+
+ // Check the preference vector
+ if (preference != null) {
+ if (preference.size() != n) throw new IllegalArgumentException("The preference vector size (" + preference.size() + ") is different from graph dimension (" + n + ").");
+ logger.info("Using a specified preference vector");
+ for(int i = n; i-- != 0;) rank[i] = preference.getDouble(i);
+ }
+ else {
+ logger.info("Using the uniform preference vector");
+ Arrays.fill(rank, 1);
+ }
+
+ if (normVector == null && maxIndegree == -1) {
+ // TODO: refactor using outdegrees().
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ maxIndegree = Math.max(maxIndegree, nodeIterator.outdegree());
+ }
+ }
+
+ // Replace initialization
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing initial dangling rank...");
+
+ progressLogger.done();
+
+ completed = false;
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 10000;
+
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = KatzParallelGaussSeidel.this.graph.copy();
+ final int n = KatzParallelGaussSeidel.this.n;
+ final double alpha = KatzParallelGaussSeidel.this.alpha;
+ final double rank[] = KatzParallelGaussSeidel.this.rank;
+ final byte[] normVector = KatzParallelGaussSeidel.this.normVector;
+ final DoubleList preference = KatzParallelGaussSeidel.this.preference;
+ final KahanSummation s = new KahanSummation();
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ double normDelta = 0;
+ final NodeIterator nodeIterator = graph.nodeIterator((int)start);
+
+ for(int i = (int)start; i < end; i++) {
+ nodeIterator.nextInt();
+
+ double sigma = 0;
+ double selfLoopFactor = 1;
+ s.reset();
+ int indegree = nodeIterator.outdegree();
+
+ if (indegree != 0) {
+ final int[] pred = nodeIterator.successorArray();
+ //Determine the rank from all incoming real links except possibly for a self-loop.
+ while(indegree-- != 0) {
+ final int currPred = pred[indegree];
+ if (i == currPred) selfLoopFactor = 1 - alpha;
+ else sigma += rank[currPred];
+ }
+ }
+
+ sigma = ((preference != null ? preference.getDouble(i) : 1) + alpha * sigma) / selfLoopFactor;
+
+ // update the supremum of vector difference between the new and old rank
+
+ if (normVector != null) normDelta = Math.max(normDelta, Math.abs(sigma - rank[i]) * (1L << (0xFF & normVector[i])));
+ else normDelta = Math.max(normDelta, Math.abs(sigma - rank[i]));
+
+ //update the rank
+ rank[i] = sigma;
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+
+ synchronized (KatzParallelGaussSeidel.this) {
+ KatzParallelGaussSeidel.this.normDelta = Math.max(KatzParallelGaussSeidel.this.normDelta, normDelta);
+ }
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ @Override
+ public void run() {
+ if (iteration > 0) {
+ progressLogger.done();
+
+ iterationLogger.setAndDisplay(iteration);
+
+ if (stoppingCriterion.shouldStop(KatzParallelGaussSeidel.this)) {
+ completed = true;
+ return;
+ }
+ }
+
+ normDelta = 0;
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + iteration++ + "...");
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+ @Override
+ public double normDelta() {
+ return normVector == null ? alpha * normDelta * maxIndegree : (alpha * sigma) * normDelta / (1 - alpha * sigma);
+ }
+
+ /**
+ * Returns a Properties object that contains all the parameters used by the computation.
+ *
+ * @param graphBasename file name of the graph
+ * @param preferenceFilename file name of preference vector. It can be {@code null}.
+ * @return a properties object that represent all the parameters used to calculate the rank.
+ */
+ public Properties buildProperties(String graphBasename, String preferenceFilename) {
+ final Properties prop = super.buildProperties(graphBasename);
+ prop.setProperty("alpha", Double.toString(alpha));
+ prop.setProperty("norm", normDelta);
+ prop.setProperty("preferencefilename", preferenceFilename);
+ return prop;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ SimpleJSAP jsap = new SimpleJSAP(KatzParallelGaussSeidel.class.getName(), "Computes Katz's index of a graph, given its transpose, using a parallel implementation of Gauss-Seidel's method."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'a', "alpha", "Attenuation factor (must be smaller than the reciprocal of the dominant eigenvalue)."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold to determine whether to stop."),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new FlaggedOption("normVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "norm-vector", "A vector inducing the correct weighted supremum norm."),
+ new FlaggedOption("sigma", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "sigma", "The value for which the norm vector is suitable (i.e., the maximum ratio from its properties)."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final String graphBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String normVectorFilename = jsapResult.getString("normVector");
+ if (normVectorFilename != null && ! jsapResult.userSpecified("sigma")) throw new IllegalArgumentException("You must specify the sigma for which the norm vector is suitable");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+
+ DoubleList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector"))
+ preference = DoubleArrayList.wrap(BinIO.loadDoubles(preferenceFilename = jsapResult.getString("preferenceVector")));
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ final KatzParallelGaussSeidel pr = new KatzParallelGaussSeidel(graph, threads, LOGGER);
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.preference = preference;
+ if (normVectorFilename != null) pr.normVector(normVectorFilename, jsapResult.getDouble("sigma"));
+
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(graphBasename, preferenceFilename).save(rankBasename + ".properties");
+
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/LeftSingularVectorParallelPowerMethod.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/LeftSingularVectorParallelPowerMethod.java
new file mode 100644
index 0000000..f1d0084
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/LeftSingularVectorParallelPowerMethod.java
@@ -0,0 +1,369 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes the left singular vector of a graph using a parallel implementation of the power method.
+ *
+ * <p>This class computes iteratively and in parallel an approximation of the left singular vector of a graph
+ * with adjacency matrix <var>M</var>, that is, the dominant eigenvector of <var>M</var><var>M</var><sup>T</sup>.
+ * Note that the {@link #step()} method is not available: due to the need for some synchronization logic, only {@link #stepUntil(StoppingCriterion)}
+ * is available.
+ *
+ * <p>This class can be run using {@link SpectralRanking.NormStoppingCriterion} that stops, as usual, when {@link #normDelta()},
+ * which returns the {@linkplain #norm} of the difference between the two last approximations,
+ * is below a specified threshold.
+ *
+ * <p>It is <strong>strongly suggested</strong> that you apply a <em>{@linkplain #shift}</em> (e.g., -1). A negative shift
+ * guarantees convergence even in the presence of eigenvalues of maximum modulus that are not real positive (see a numerical
+ * linear algebra textbook for details).
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class LeftSingularVectorParallelPowerMethod extends SpectralRanking {
+ private final static Logger LOGGER = LoggerFactory.getLogger(LeftSingularVectorParallelPowerMethod.class);
+
+ /** The transpose of the current graph. */
+ private ImmutableGraph transpose;
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The norm. */
+ public Norm norm = DEFAULT_NORM;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** If true, the computation was interrupted by the detection of an error condition. */
+ private volatile boolean interrupted;
+ /** If true, we are performing the first half of a round (i.e., the first multiplication). */
+ private volatile boolean roundFirstHalf;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** Compute the SALSA score (only for historical and testing reasons: please use the {@link Salsa} class instead). */
+ public boolean salsa;
+ /** A shift. */
+ public double shift;
+ /** The rank vector obtained after the first half of a round. */
+ public double[] intermediateRank;
+ /** The rank vector at the end of the last round. */
+ public double[] previousRank;
+
+
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph on which the computation must be performed.
+ * @param transpose the tranpose of the graph on which the computation must be performed.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public LeftSingularVectorParallelPowerMethod(final ImmutableGraph graph, final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(graph, logger);
+ this.transpose = transpose;
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph on which the computation must be performed.
+ * @param transpose the tranpose of the graph on which the computation must be performed.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public LeftSingularVectorParallelPowerMethod(final ImmutableGraph graph, final ImmutableGraph transpose, final Logger logger) {
+ this(graph, transpose, 0, logger);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph on which the computation must be performed.
+ * @param transpose the tranpose of the graph on which the computation must be performed.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ */
+ public LeftSingularVectorParallelPowerMethod(final ImmutableGraph graph, final ImmutableGraph transpose, final int requestedThreads) {
+ this(graph, transpose, requestedThreads, LOGGER);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph on which the computation must be performed.
+ * @param transpose the tranpose of the graph on which the computation must be performed.
+ */
+ public LeftSingularVectorParallelPowerMethod(final ImmutableGraph graph, final ImmutableGraph transpose) {
+ this(graph, transpose, 0);
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+
+ logger.info("Norm: " + norm);
+ logger.info("Shift: " + shift);
+ logger.info("SALSA: " + salsa);
+
+ interrupted = completed = false;
+ roundFirstHalf = true;
+ // Creates the arrays, if necessary
+ if (previousRank == null) previousRank = new double[n];
+ if (intermediateRank == null) intermediateRank = new double[n];
+
+ // Check the preference vector
+ logger.info("Using the uniform preference vector");
+ Arrays.fill(rank, 1);
+ norm.normalize(rank, 1);
+
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 10000;
+
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = LeftSingularVectorParallelPowerMethod.this.graph.copy();
+ final ImmutableGraph transpose = LeftSingularVectorParallelPowerMethod.this.transpose.copy();
+ final int n = LeftSingularVectorParallelPowerMethod.this.n;
+ final double shift = LeftSingularVectorParallelPowerMethod.this.shift;
+ final boolean salsa = LeftSingularVectorParallelPowerMethod.this.salsa;
+ final KahanSummation s = new KahanSummation();
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+
+ final boolean roundFirstHalf = LeftSingularVectorParallelPowerMethod.this.roundFirstHalf;
+ final ImmutableGraph graphToBeUsed = roundFirstHalf? graph : transpose;
+ final ImmutableGraph otherGraph = roundFirstHalf? transpose : graph;
+ final double[] rank = LeftSingularVectorParallelPowerMethod.this.rank;
+ final double[] oldRank = roundFirstHalf? rank : intermediateRank;
+ final double[] newRank = roundFirstHalf? intermediateRank : previousRank;
+ final boolean doShift = ! roundFirstHalf && shift != 0;
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ final NodeIterator nodeIterator = graphToBeUsed.nodeIterator((int)start);
+
+ for(int i = (int)start; i < end; i++) {
+ nodeIterator.nextInt();
+ int indegree = nodeIterator.outdegree();
+ s.reset();
+
+ if (indegree != 0) {
+ final int[] pred = nodeIterator.successorArray();
+ if (salsa) while (indegree-- != 0) {
+ final int p = pred[indegree];
+ s.add(oldRank[p] / otherGraph.outdegree(p));
+ }
+ else while (indegree-- != 0) s.add(oldRank[pred[indegree]]);
+ }
+
+ newRank[i] = s.value();
+ if (doShift) newRank[i] -= shift * rank[i];
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+ }
+
+ synchronized(LeftSingularVectorParallelPowerMethod.this) {
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+
+ @Override
+ public void run() {
+ if (iteration > 0) {
+ progressLogger.done();
+
+ if (!roundFirstHalf) {
+ final double t[] = rank;
+ rank = previousRank;
+ previousRank = t;
+
+ norm.normalize(rank, 1); // Normalize
+ iterationLogger.setAndDisplay(iteration);
+
+ if (stoppingCriterion.shouldStop(LeftSingularVectorParallelPowerMethod.this)) {
+ completed = true;
+ return;
+ }
+ }
+ roundFirstHalf = !roundFirstHalf;
+ }
+
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + (iteration++) + "...");
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (interrupted) throw new RuntimeException("Computation interrupted.");
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+ @Override
+ public double normDelta() {
+ return norm.compute(rank, previousRank);
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ previousRank = null;
+ intermediateRank = null;
+ }
+
+ /**
+ * Returns a Properties object that contains all the parameters used by the computation.
+ *
+ * @param graphBasename file name of the graph
+ * @return a properties object that represent all the parameters used to calculate the rank.
+ */
+ public Properties buildProperties(String graphBasename) {
+ final Properties prop = super.buildProperties(graphBasename);
+ prop.setProperty("norm", norm);
+ prop.setProperty("salsa", salsa);
+ prop.setProperty("shift", shift);
+ return prop;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException {
+
+ SimpleJSAP jsap = new SimpleJSAP(LeftSingularVectorParallelPowerMethod.class.getName(), "Computes the left singular eigenvector a graph using a parallel implementation." +
+ " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("norm", JSAP.STRING_PARSER, DEFAULT_NORM.toString(), JSAP.NOT_REQUIRED, 'n', "norm", "Norm type. Possible values: " + Arrays.toString(Norm.values())),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold to determine whether to stop (not use for suitable vectors)."),
+ new FlaggedOption("shift", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "shift", "A shift for the power method."),
+ new Switch("salsa", 'S', "salsa", "Compute SALSA (only for historical and testing reasons: please use the Salsa class instead)."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final boolean salsa = jsapResult.getBoolean("salsa", false);
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final String transposeBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String norm = jsapResult.getString("norm");
+ final double shift = jsapResult.getDouble("shift", 0);
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ final ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+ final ImmutableGraph transpose = mapped? ImmutableGraph.loadMapped(transposeBasename, progressLogger) : ImmutableGraph.load(transposeBasename, progressLogger);
+
+ LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, transpose, threads, LOGGER);
+ lsv.norm = Norm.valueOf(norm);
+ lsv.salsa = salsa;
+ lsv.shift = shift;
+
+ lsv.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(lsv.rank, rankBasename + ".ranks");
+ lsv.buildProperties(graphBasename).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRank.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRank.java
new file mode 100644
index 0000000..32ce9e5
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRank.java
@@ -0,0 +1,139 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.BitSet;
+
+import org.slf4j.Logger;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+// RELEASE-STATUS: DIST
+
+/** An abstract class defining methods and attributes supporting PageRank computations. Provides
+ * settable damping factor, preference vector and starting vector.
+ *
+ * <h2>Formulae and preferences</h2>
+ *
+ * <p>There are two main formulae for PageRank in the literature. The first one, which we shall
+ * call <em>weakly preferential</em>, patches all dangling nodes by adding a uniform transition towards
+ * all other nodes. The second one, which we shall call <em>strongly preferential</em>, patches all
+ * dangling nodes adding transitions weighted following the preference vector <var><b>v</b></var>.
+ * We can consider the two formulae together, letting <var><b>u</b></var> be a vector that is uniform
+ * in the weak case and coincides with <var><b>v</b></var> in the strong case.
+ *
+ * <P>If we denote with <var>P</var> the normalised adjacency matrix of the graph, with <var><b>d</b></var>
+ * the characteristic vector of dangling nodes, and with &alpha; the damping factor, the generic
+ * equation is
+ * <div style="text-align: center">
+ * <var><b>x</b></var> = <var><b>x</b></var> (&alpha; <var>P</var> + &alpha; <var><b>d</b><sup><i>T</i></sup></var><var><b>u</b></var> + (1 &minus; &alpha;) <b>1</b><sup><i>T</i></sup> <var><b>v</b></var>),
+ * </div>
+ * which, distributing over the sum, makes it possible to express PageRank as
+ * <div style="text-align: center">
+ * (1 &minus; &alpha;) <var><b>v</b></var><big>(</big> <var>P</var> + <var><b>d</b><sup><i>T</i></sup></var><var><b>u</b></var> <big>)</big><sup>-1</sup>,
+ * </div>
+ * or even
+ * <div style="text-align: center">
+ * (1 &minus; &alpha;) <var><b>v</b></var> <big>&Sigma;</big><sub><var>k</var> &ge; 0</sub> &alpha;<sup><var>k</var></sup> <big>(</big> <var>P</var> + <var><b>d</b><sup><i>T</i></sup></var><var><b>u</b></var> <big>)</big><sup><var>k</var></sup>.
+ * </div>
+ *
+ * <p>By default, weakly preferential PageRank is computed; strongly preferential
+ * PageRank computation is enforced by setting {@link #stronglyPreferential} to true.
+ *
+ * @see SpectralRanking
+ */
+
+public abstract class PageRank extends SpectralRanking {
+ /** The default damping factor. */
+ public final static double DEFAULT_ALPHA = 0.85;
+
+ /** The damping factor. In the random surfer interpretation, this is the probability that the
+ * surfer will follow a link in the current page. */
+ public double alpha = DEFAULT_ALPHA;
+ /** The preference vector to be used (or {@code null} if the uniform preference vector should be used). */
+ public DoubleList preference;
+ /** The vector used used to patch null rows of the adjacency matrix (<b><var>u</var></b> in the general formula).
+ * It coincides with the preference vector if {@link #stronglyPreferential} is true. If {@code null},
+ * the uniform distribution will be used. */
+ public DoubleList danglingNodeDistribution;
+ /** If not {@code null}, the set of buckets of {@link #graph}. */
+ public BitSet buckets;
+ /** Decides whether we use the strongly or weakly (the default) preferential algorithm. */
+ public boolean stronglyPreferential;
+
+ /** Creates a new instance.
+ *
+ * @param g the graph.
+ * @param logger a logger.
+ */
+ public PageRank(final ImmutableGraph g, final Logger logger) {
+ super(g, logger);
+ }
+
+ /** Returns a {@link Properties} object that contains all the parameters used by the computation.
+ *
+ * @param graphBasename the basename of the graph.
+ * @param preferenceFilename the filename of preference vector. It can be {@code null}.
+ * @param danglingFilename the filename of dangling-node distribution. It can be {@code null}.
+ * @return a properties object that represent all the parameters used to calculate the rank.
+ */
+ public Properties buildProperties(String graphBasename, String preferenceFilename, String danglingFilename) {
+ final Properties prop = super.buildProperties(graphBasename);
+ prop.setProperty("alpha", Double.toString(alpha));
+ prop.setProperty("norm", normDelta());
+ prop.setProperty("stronglypreferential", stronglyPreferential);
+ if (preferenceFilename != null) prop.setProperty("preferencefilename", preferenceFilename);
+ if (danglingFilename != null) prop.setProperty("danglingfilename", danglingFilename);
+ return prop;
+ }
+
+ /** Basic initialization: we log the damping factor, check that the preference vector is correctly sized and stochastic,
+ * fill {@link #rank} with the preference vector and set the dangling-node distribution
+ * depending on the value of {@link #stronglyPreferential}.
+ */
+ @Override
+ public void init() throws IOException {
+ super.init();
+ logger.info("Damping factor: " + alpha);
+
+ // Check the preference vector
+ if (preference != null) {
+ if (preference.size() != n) throw new IllegalArgumentException("The preference vector size (" + preference.size() + ") is different from graph dimension (" + n + ").");
+ if (! isStochastic(preference)) throw new IllegalArgumentException("The preference vector is not a stochastic vector. ");
+ logger.info("Using a specified preference vector");
+ }
+ else logger.info("Using the uniform preference vector");
+
+ if (preference != null) preference.toArray(rank);
+ else Arrays.fill(rank, 1. / n);
+
+ // Initializes the preferentialAdjustment vector
+ if (stronglyPreferential) {
+ if (preference == null) throw new IllegalArgumentException("The strongly preferential flag is true but the preference vector has not been set.");
+ danglingNodeDistribution = preference;
+ }
+ else danglingNodeDistribution = null;
+ logger.info("Computing " + (stronglyPreferential ? "strongly" : "weakly") + " preferential PageRank");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankFromCoefficients.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankFromCoefficients.java
new file mode 100644
index 0000000..74e3f2c
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankFromCoefficients.java
@@ -0,0 +1,178 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2007-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** Computes PageRank using its power series.
+ *
+ * <p>This class uses the vectors of coefficients
+ * of the PageRank power series computed optionally by {@link PageRankPowerSeries}
+ * to approximate PageRank and any of its derivatives for a given value of the damping factor &alpha;.
+ * The computation is based on the results
+ * described in &ldquo;PageRank: Functional dependencies&rdquo;, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna,
+ * <i>ACM Trans. Inf. Sys.</i>, 27(4):1&minus;23, 2009.
+ */
+
+public class PageRankFromCoefficients {
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankFromCoefficients.class);
+
+ private final static int BUFFER_SIZE = 1024 * 1024;
+
+ /** Computes PageRank and its derivatives for given damping factor values.
+ *
+ * <p>For each pair of damping factor and derivative order passed to this method, a
+ * suitable rank file will be generated. Information about the precision guarantee will be logged.
+ *
+ * @param coeffBasename the basename of coefficient files (as computed by {@link PageRankPowerSeries}); the
+ * actual names are obtained appending <code>-<var>i</var></code> for the <var>i</var>-th coefficient.
+ * @param numCoeff the number of coefficient files.
+ * @param rankBasename the basename of the computed ranks.
+ * @param alpha an array of values of the damping factor, one for each rank to compute.
+ * @param order an array of derivative orders (0 means PageRank), parallel to <code>alpha</code>.
+ */
+
+ public static void compute(final String coeffBasename, final int numCoeff, final String rankBasename, final double[] alpha, final int[] order) throws IOException {
+ LOGGER.info("Opening files...");
+ final int numDerivatives = order.length;
+ final int numAlphas = alpha.length;
+
+ final DoubleIterator[] coeff = new DoubleIterator[numCoeff];
+ final DataInputStream[] coeffStream = new DataInputStream[numCoeff];
+ for(int i = numCoeff; i-- != 0;) coeff[i] = BinIO.asDoubleIterator(coeffStream[i] = new DataInputStream(new FastBufferedInputStream(new FileInputStream(coeffBasename + "-" + i), BUFFER_SIZE)));
+ final DataOutputStream[][] result = new DataOutputStream[numAlphas][numDerivatives];
+ for(int l = numAlphas; l-- != 0;)
+ for(int i = numDerivatives; i-- != 0;)
+ result[l][i] = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(rankBasename + "-" + alpha[l] + (order[i] == 0 ? ".ranks" : ".der-" + order[i])), BUFFER_SIZE));
+
+
+ LOGGER.info("Computing coefficients...");
+
+ // a[i][n] is n falling order[i] * alpha^n
+ final double[][][] a = new double[numAlphas][numDerivatives][numCoeff];
+ for(int i = numDerivatives; i-- != 0;) {
+ final int k = order[i];
+ for(int l = numAlphas; l-- != 0;) {
+ double alphaNMinusK = 1;
+ a[l][i][0] = it.unimi.dsi.law.Util.falling(0, k);
+ for(int n = 1; n < numCoeff; n++) {
+ // Note that until n >= k fallink(n, k) = 0, so the value of alphanminusk is irrelevant.
+ if (n > k) alphaNMinusK *= alpha[l];
+ a[l][i][n] = alphaNMinusK * it.unimi.dsi.law.Util.falling(n, k);
+ }
+
+ //System.err.println("Order " + order[i] + ": " + Arrays.toString(a[i]));
+ }
+ }
+
+
+ final int numNodes = (int)(new File(coeffBasename + "-0").length() / (Double.SIZE / 8));
+ final double[][] partialSum = new double[numAlphas][numDerivatives];
+ final double[][] infinityNorm = new double[numAlphas][numDerivatives];
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = numNodes;
+
+ pl.start("Computing PageRank and derivatives...");
+ for(int i = 0; i < numNodes; i++) {
+ for(int l = numAlphas; l-- != 0;) Arrays.fill(partialSum[l], 0);
+ double t = 0;
+
+ for(int n = 0; n < numCoeff; n++) {
+ t = coeff[n].nextDouble();
+ for(int l = numAlphas; l-- != 0;)
+ for(int j = numDerivatives; j-- != 0;)
+ partialSum[l][j] += a[l][j][n] * t;
+ }
+
+ for(int l = numAlphas; l-- != 0;)
+ for(int j = numDerivatives; j-- != 0;) {
+ infinityNorm[l][j] = Math.max(infinityNorm[l][j], a[l][j][numCoeff - 1] * t);
+ result[l][j].writeDouble(partialSum[l][j]);
+ }
+
+ pl.update();
+ }
+
+ pl.done();
+
+ for(int i = 0; i < numCoeff; i++) coeffStream[i].close();
+ for(int l = numAlphas; l-- != 0;) {
+ for(int i = 0; i < numDerivatives; i++) {
+ if (numCoeff - 1 < order[i] / (1 - alpha[l])) LOGGER.info("Error bound for derivative of order " + order[i] + " (alpha=" + alpha[l] + "): unknown");
+ else {
+ final double delta = alpha[l] * (numCoeff - 1) / (numCoeff - 1 + order[i]);
+ LOGGER.info("Error bound for derivative of order " + order[i] + " (alpha=" + alpha[l] + "): " + infinityNorm[l][i] * delta / (1 - delta));
+ }
+ result[l][i].close();
+ }
+ }
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(PageRankFromCoefficients.class.getName(),
+ "Computes PageRank and possibly its derivatives using the coefficients of the PageRank power series (usually computed by PageRankPowerSeries).",
+ new Parameter[] {
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "Damping factor(s), one for each desired output.").setAllowMultipleDeclarations(true),
+ new FlaggedOption("numCoeff", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'n', "num-coeff", "The number of coefficients to use."),
+ new FlaggedOption("derivative", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'd', "derivative", "The order(s) of the derivative(s) to be computed (0 means PageRank); the orders are interpreted as a list of specifications parallel to that of the damping factors.").setAllowMultipleDeclarations(true),
+ new UnflaggedOption("coeffBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the coefficients."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String coeffBasename = jsapResult.getString("coeffBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final double[] alpha = jsapResult.getDoubleArray("alpha");
+ final int[] order = jsapResult.getIntArray("derivative");
+ final int numCoeff = jsapResult.getInt("numCoeff");
+
+ compute(coeffBasename, numCoeff, rankBasename, alpha, order);
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankGaussSeidel.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankGaussSeidel.java
new file mode 100644
index 0000000..a60452b
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankGaussSeidel.java
@@ -0,0 +1,382 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.BitSet;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2005-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes PageRank of a graph using the Gau&szlig;&ndash;Seidel method. This class is now mainly of historical
+ * interest, as {@link PageRankParallelGaussSeidel} is faster and provides exact bounds via
+ * {@linkplain PageRankParallelGaussSeidel#normVector(double[], double) vector norms}.
+ *
+ * <p>The general formula described in {@link it.unimi.dsi.law.rank.PageRank} can be rewritten as the following linear system:
+ * <div style="text-align: center">
+ * <var><b>x</b></var> (<var>I</var> &minus; &alpha; (<var>P</var> + <var><b>u</b></var><sup><i>T</i></sup><var><b>d</b></var>)) = (1 &minus; &alpha;)<var><b>v</b></var>.
+ * </div>
+ *
+ * <p>The system
+ * <div style="text-align: center">
+ * <var><b>x</b></var> <var>M</var> = <var><b>b</b></var>
+ * </div>
+ * can be solved using the Gauss&minus;Seidel method,
+ * which updates in-place a <em>single</em> vector, using the formula
+ * <div style="text-align: center">
+ * <var>x<sub>i</sub></var><sup>(<var>t</var>+1)</sup> =
+ * <big>(</big> <var>b<sub>i</sub></var>
+ * &minus; <big>&Sigma;</big><sub><var>j</var>&lt;<var>i</var></sub> <var>m<sub>ij</sub></var><var>x<sub>j</sub></var><sup>(<var>t</var>+1)</sup>
+ * &minus; <big>&Sigma;</big><sub><var>j</var>&gt;<var>i</var></sub> <var>m<sub>ij</sub></var><var>x<sub>j</sub></var><sup>(<var>t</var>)</sup>
+ * <big>)</big>
+ * <big>&frasl;</big> <var>m<sub>ii</sub></var>.
+ * </div>
+ *
+ * <p>The {@link #normDelta()} method returns an upper bound to the &#x2113;<sub>1</sub> norm of the error, obtained multiplying by
+ * &alpha; / (1 &minus; &alpha;) the &#x2113;<sub>1</sub> norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
+ *
+ * <p><strong>Warning</strong>: Since we need to enumerate the <em>predecessors</em> a node,
+ * you must pass to the {@linkplain #PageRankGaussSeidel(ImmutableGraph, Logger) constructor} the <strong>transpose</strong>
+ * of the graph.
+ *
+ * <p>Substituting to <var>M</var> and to <var><b>b</b></var> the corresponding terms present in the first formula, we obtain the following update rule:
+ * <div style="text-align: center">
+ * <var>x<sub>i</sub></var><sup>(<var>t</var>+1)</sup> =
+ * <big>(</big>
+ * (1 &minus; &alpha;) <var>v<sub>i</sub></var> + &alpha;
+ * <big>(</big>
+ * <big>&Sigma;</big><sub><var>j</var>&lt;<var>i</var></sub> (<var>p<sub>ji</sub></var> + <var>u<sub>i</sub>d<sub>j</sub></var>) <var>x<sub>j</sub></var><sup>(<var>t</var>+1)</sup> +
+ * <big>&Sigma;</big><sub><var>j</var>&gt;<var>i</var></sub> (<var>p<sub>ji</sub></var> + <var>u<sub>i</sub>d<sub>j</sub></var>) <var>x<sub>j</sub></var><sup>(<var>t</var>)</sup>
+ * <big>)</big>
+ * <big>)</big>
+ * <big>&frasl;</big>
+ * (1 &minus; &alpha;<var>p<sub>ii</sub></var> &minus; &alpha;<var>u<sub>i</sub>d<sub>i</sub></var>)
+ * </div>
+ *
+ * <p>We can rearrange the previous formula sums in two different sums: one for the nodes <var>j</var> &rarr; <var>i</var>
+ * and one for the dangling nodes (nondangling nodes that are not predecessors of <var>i</var>
+ * give no contribution). So the Gau&szlig;&ndash;Seidel method can be implemented as follows:
+ * <ul>
+ * <li>initialize <var><b>x</b></var> as the uniform vector;
+ * <li>while some stopping criterion has not been met,
+ * <ul>
+ * <li>for all <var>i</var> = 0,1,&hellip;, <var>N</var>&minus;1
+ * <ul>
+ * <li>&sigma; = 0;
+ * <li>for all <var>j</var> &rarr; <var>i</var>, with <var>j</var> &#x2260; <var>i</var>, &sigma; += <var>p<sub>ji</sub></var> <var>x<sub>j</sub></var>
+ * <li>for all dangling <var>j</var>, &sigma; += <var>u<sub>i</sub></var> <var>x<sub>j</sub></var>
+ * <li><var>x<sub>i</sub></var> = <big>(</big> (1 &minus; &alpha;) <var>v<sub>i</sub></var> + &alpha;&sigma; <big>)</big>
+ * <big>&frasl;</big>
+ * (1 &minus; &alpha;<var>p<sub>ii</sub></var> &minus; &alpha;<var>d<sub>i</sub>u<sub>i</sub></var>)
+ * </ul>
+ * </ul>
+ * </ul>
+ * <p>Remember that <var>u<sub>i</sub></var> is 1&frasl;<var>n</var> when the <em>weakly preferential</em> version of the PageRank formula is considered,
+ * <var>v<sub>i</sub></var> when the version is the <em>strongly preferential</em> one.
+ *
+ * <p>The second &ldquo;for all&rdquo; can be avoided if we keep track of the summation
+ * of all ranks of dangling nodes up to index <var>i</var>
+ * (exclusive) and for all dangling nodes from index <var>i</var> on in
+ * two variables <var>B</var> (<em>before</em> <var>i</var>) and <var>A</var> (<em>after</em> <var>i</var>):
+ *<ul>
+ * <li>initialize <var><b>x</b></var> as the uniform vector;
+ * <li><var>B</var> = 0;
+ * <li>for all dangling <var>j</var>, <var>A</var> += <var>x<sub>j</sub></var>;
+ * <li>while some stopping criterion has not been met,
+ * <ul>
+ * <li>for all <var>i</var> = 0,1,&hellip;,<var>N</var>&minus;1
+ * <ul>
+ * <li>&sigma; = 0;
+ * <li>for all <var>j</var> &rarr; <var>i</var>, with <var>j</var> &#x2260; <var>i</var>, &sigma; += <var>p<sub>ji</sub></var> <var>x<sub>j</sub></var>
+ * <li>&sigma; += (<var>A</var> + <var>B</var> &minus; <var>d<sub>i</sub> x<sub>i</sub></var>) <var>u<sub>i</sub></var>
+ * <li>&sigma; = <big>(</big> (1 &minus; &alpha;) <var>v<sub>i</sub></var> + &alpha;&sigma; <big>)</big>
+ * <big>&frasl;</big>
+ * (1 &minus; &alpha;<var>p<sub>ii</sub></var> &minus; &alpha;<var>d<sub>i</sub> u<sub>i</sub></var>)
+ * <li>if <var>i</var> is dangling
+ * <ul>
+ * <li><var>B</var> += &sigma;
+ * <li><var>A</var> &minus;= <var>x<sub>i</sub></var>
+ * </ul>
+ * <li><var>x<sub>i</sub></var> = &sigma;
+ * </ul>
+ * </ul>
+ * </ul>
+ *
+ *
+ * @see PageRank
+ * @see SpectralRanking
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class PageRankGaussSeidel extends PageRank {
+ /** The class logger */
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankGaussSeidel.class);
+
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The amount of ranking in dangling nodes. */
+ private double danglingRank;
+ /** The norm of the difference vector between the new page rank vector and the previous one. */
+ private double norm;
+ /** The outdegree of each node of the original graph. */
+ private int[] outdegree;
+
+ /** If true, an everywhere zero dangling-node distribution will be simulated, resulting in the computation of a pseudorank. */
+ public boolean pseudoRank;
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ protected PageRankGaussSeidel(final ImmutableGraph transpose, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ */
+ public PageRankGaussSeidel(final ImmutableGraph transpose) {
+ this(transpose, LOGGER);
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+
+ if (outdegree == null) {
+ // We allocate and compute the outdegree vector.
+ outdegree = new int[n];
+
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing outdegrees...");
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ final int[] pred = nodeIterator.successorArray();
+ for (int d = nodeIterator.outdegree(); d-- != 0;) outdegree[pred[d]]++;
+ progressLogger.lightUpdate();
+ }
+
+ progressLogger.done();
+ }
+
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing initial dangling rank...");
+
+ danglingRank = 0;
+ /* The number of dangling nodes. */
+ int danglingNodes = 0;
+ for (int i = n; i-- != 0;) {
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) {
+ danglingRank += rank[i];
+ if (outdegree[i] == 0) danglingNodes++;
+ }
+ }
+
+ progressLogger.done();
+ logger.info(danglingNodes + " dangling nodes");
+ if (buckets != null) logger.info(buckets.cardinality() + " buckets");
+ logger.info("Initial dangling rank: " + danglingRank);
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ outdegree = null;
+ }
+ /** Return an upper bound to the &#x2113;<sub>1</sub> norm of the error, obtained multiplying by
+ * &alpha; / (1 &minus; &alpha;) the &#x2113;<sub>1</sub> norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
+ *
+ * @return an upper bound to the &#x2113;<sub>1</sub> norm of the error.
+ */
+ @Override
+ public double normDelta() {
+ return norm * alpha / (1 - alpha);
+ }
+
+
+ public void step() {
+ final double oneMinusAlpha = 1.0 - alpha,
+ oneMinusAlphaOverNumNodes = oneMinusAlpha / n;
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ int inDegree, currPred;
+ int[] pred;
+ double sigma,
+ selfLoopFactor,
+ selfDanglingRank,
+ B = 0, // See class description
+ A = danglingRank;
+ boolean hasLoop;
+ final double[] rank = this.rank;
+ final int[] outdegree = this.outdegree;
+ final BitSet buckets = this.buckets;
+ final boolean pseudoRank = this.pseudoRank;
+ final double alpha = this.alpha;
+ final DoubleList danglingNodeDistribution = this.danglingNodeDistribution;
+ final DoubleList preference = this.preference;
+ final int n = this.n;
+
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + iteration++ + "...");
+
+ norm = 0;
+ for(int i = 0; i < n; i++) {
+ nodeIterator.nextInt();
+ inDegree = nodeIterator.outdegree();
+ pred = nodeIterator.successorArray();
+ sigma = 0.0;
+ hasLoop = false;
+
+ //Determine the rank from all incoming real links except possibly self link.
+ for (int j = inDegree; j-- != 0;) {
+ currPred = pred[j];
+ // Skip buckets
+ if (buckets != null && buckets.get(pred[j])) continue;
+ if (i == currPred) hasLoop = true;
+ else sigma += rank[currPred] / outdegree[currPred];
+ }
+
+ //Determine the diagonal rank contribution
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) { //i is a dangling node
+ selfDanglingRank = rank[i];
+ selfLoopFactor = pseudoRank ? 1.0 :
+ (danglingNodeDistribution != null) ? 1.0 - alpha * danglingNodeDistribution.getDouble(i)
+ : 1.0 - alpha / n;
+ } else {
+ selfDanglingRank = 0.0;
+ selfLoopFactor = (hasLoop) ? 1.0 - alpha / outdegree[i]
+ : 1.0; //i has no selfloop and it is not dangling
+ }
+
+ sigma += pseudoRank ? 0 :
+ (danglingNodeDistribution != null) ? (B + A - selfDanglingRank) * danglingNodeDistribution.getDouble(i)
+ : (B + A - selfDanglingRank) / n;
+
+ sigma = (preference != null) ? (oneMinusAlpha * preference.getDouble(i) + alpha * sigma) / selfLoopFactor
+ : (oneMinusAlphaOverNumNodes + alpha * sigma) / selfLoopFactor;
+
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) {
+ B += sigma;
+ A -= rank[i];
+ }
+
+ //update the L_1 norm of vector difference between the new and old rank
+ norm += Math.abs(sigma - rank[i]);
+ //update the rank
+ rank[i] = sigma;
+
+ progressLogger.update();
+ }
+ danglingRank = B;
+ progressLogger.done();
+ iterationLogger.setAndDisplay(iteration);
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ super.stepUntil(stoppingCriterion);
+ iterationLogger.done();
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(PageRankGaussSeidel.class.getName(), "Computes PageRank of a graph, given its transpose, using Gauss-Seidel's method."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "Damping factor."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold in l_1 norm to determine whether to stop."),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new Switch("pseudoRank", JSAP.NO_SHORTFLAG, "pseudorank", "Compute pseudoranks (the dangling preference is set to 0)."),
+ new FlaggedOption("buckets", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'b', "buckets", "The buckets of the graph; if supplied, buckets will be treated as dangling nodes."),
+ new Switch("offline", 'o', "offline", "No-op for compatibility."),
+ new Switch("strongly", 'S', "strongly", "Use the preference vector to redistribute the dangling rank."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename where the results are stored. <rankBasename>.properties contains the parameter values used in the computation. <rankBasename>.ranks contains ranks (doubles in binary form).")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean strongly = jsapResult.getBoolean("strongly", false);
+ final String buckets = jsapResult.getString("buckets");
+ final String graphBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+ ImmutableGraph graph = null;
+
+ graph = ImmutableGraph.loadOffline(graphBasename, progressLogger);
+
+ DoubleList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector"))
+ preference = DoubleArrayList.wrap(BinIO.loadDoubles(preferenceFilename = jsapResult.getString("preferenceVector")));
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ if (strongly && preference == null) throw new IllegalArgumentException("The 'strongly' option requires a preference vector");
+
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(graph);
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.preference = preference;
+ pr.buckets = (BitSet)(buckets == null ? null : BinIO.loadObject(buckets));
+ pr.stronglyPreferential = strongly;
+ pr.pseudoRank = jsapResult.getBoolean("pseudoRank");
+
+ // cycle until we reach maxIter interations or the norm is less than the given threshold (whichever comes first)
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(graphBasename, preferenceFilename, null).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankParallelGaussSeidel.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankParallelGaussSeidel.java
new file mode 100644
index 0000000..49ec216
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankParallelGaussSeidel.java
@@ -0,0 +1,496 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.DataInput;
+import java.io.IOException;
+import java.util.BitSet;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterators;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes PageRank using a parallel (multicore) implementation of the {@linkplain PageRankGaussSeidel Gau&szlig;&ndash;Seidel} method.
+ *
+ * <p><strong>Note</strong>: this is the implementation of choice to be used when computing PageRank. It uses less memory (one vector
+ * of doubles plus one vector of integers) and, experimentally, converges faster than any other implementation. Moreover, it
+ * scales linearly with the number of cores.
+ *
+ * <p><strong>Warning</strong>: Since we need to enumerate the <em>predecessors</em> a node,
+ * you must pass to the {@linkplain #PageRankParallelGaussSeidel(ImmutableGraph, int, Logger) constructor} the <strong>transpose</strong>
+ * of the graph.
+ *
+ * <p>Technically, the iteration performed by this class is <em>not</em> a Gau&szlig;&ndash;Seidel iteration: we simply start a number of
+ * threads, and each thread updates a value using a Gau&szlig;&ndash;Seidel-like rule.
+ * As a result, each update uses some old and some new values: in other
+ * words, the <em>regular splitting</em>
+ * <div style="text-align: center; margin: 1em">
+ * <var>M &minus; N</var> = <var>I</var> &minus; &alpha; (<var>P</var> + <var><b>u</b></var><sup><i>T</i></sup><var><b>d</b></var>)
+ * </div>
+ * of the matrix associated to each update is always different (in a Gau&szlig;&ndash;Seidel iteration,
+ * <var>M</var> is upper triangular, and <var>N</var> is strictly lower triangular). Nonetheless, it is easy to check that
+ * <var>M</var> is still (up to permutation) upper triangular and invertible, independently of the specific update sequence.
+ *
+ * <p>Note that the {@link #step()} method is not available: due to the need for some synchronization logic, only {@link #stepUntil(StoppingCriterion)}
+ * is available.
+ *
+ * <p>The {@link #normDelta()} method returns the following values:
+ * <ul>
+ * <li>if a {@linkplain #normVector(double[], double) suitable norm vector has been set}, an upper bound on the error (the &#x2113;<sub>&#x221E;</sub> distance from the rank to be computed);
+ * <li>otherwise, an upper bound to the &#x2113;<sub>1</sub> norm of the error, obtained multiplying by
+ * &alpha; / (1 &minus; &alpha;) the &#x2113;<sub>1</sub> norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
+ * </ul>
+ *
+ * <p>To be able to set a norm vector, you need to set the {@link #pseudoRank} flag and use {@link PowerSeries}
+ * (setting the {@linkplain PowerSeries#markovian Markovian} flag) to compute a suitable vector.
+ * To do so, you must provide an &alpha; and use the {@link PowerSeries#MAX_RATIO_STOPPING_CRITERION}. If the computation
+ * terminates without errors with maximum ratio &sigma;, the {@linkplain PowerSeries#previousRank resulting vector} can be used
+ * with this class to compute pseudoranks for all &alpha; &lt; 1 / &sigma; (strictness
+ * is essential). Note that the &#x2113;<sub>1</sub> norm of the error bounds the &#x2113;<sub>&#x221E;</sub>.
+ *
+ * <p>With respect to the description of the exact algorithm in {@link PageRankGaussSeidel}, we operate a simplification that is
+ * essentially in obtaining a coherent update without incurring in too much synchronization: the rank associated with
+ * dangling nodes is computed at the end of each computation, and used unchanged throughout the whole iteration. This corresponds to
+ * permuting the array so that dangling nodes come out last.
+ *
+ * @see PageRankGaussSeidel
+ * @see PageRank
+ * @see SpectralRanking
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class PageRankParallelGaussSeidel extends PageRank {
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankParallelGaussSeidel.class);
+
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** The rank lost through dangling nodes, accumulated incrementally. */
+ private double danglingRankAccumulator;
+ /** The amount of ranking in dangling nodes computed at the previous iteration. */
+ private double danglingRank;
+ /** The &#x2113;<sub>1</sub> norm of the difference between the new approximation and the previous one,
+ * if {@link #normVector} is {@code null}; the {@link #normVector}-weighted supremum norm of the same vector, otherwise. */
+ private double normDelta;
+ /** The outdegree of each node (initialized after the first computation). */
+ public int[] outdegree;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** An array of bytes containing the opposite of a lower bound on the binary logarithm of the elements of a norm vector, or {@code null} to stop the computation using residue estimation. */
+ private byte[] normVector;
+ /** The value for which {@link #normVector} is suitable. */
+ private double sigma;
+ /** If true, an everywhere zero dangling-node distribution will be simulated, resulting in the computation of a pseudorank. */
+ public boolean pseudoRank;
+
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PageRankParallelGaussSeidel(final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ */
+ public PageRankParallelGaussSeidel(final ImmutableGraph transpose) {
+ this(transpose, 0, LOGGER);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PageRankParallelGaussSeidel(final ImmutableGraph transpose, final Logger logger) {
+ this(transpose, 0, logger);
+ }
+
+ /** Sets the norm vector.
+ *
+ * @param normVectorFilename a file containing a norm vector as a list of doubles in {@link DataInput} format, or {@code null} for no norm vector.
+ * @param sigma the value for which the provided norm vector is suitable.
+ */
+ public void normVector(final String normVectorFilename, final double sigma) throws IOException {
+ normVector = normVectorFilename == null ? null : approximateNormVector(BinIO.asDoubleIterator(normVectorFilename));
+ this.sigma = sigma;
+ }
+
+ /** Sets the norm vector.
+ *
+ * @param normVector the new norm vector.
+ * @param sigma the value for which the provided norm vector is suitable.
+ */
+ public void normVector(final double[] normVector, final double sigma) {
+ this.normVector = approximateNormVector(DoubleIterators.wrap(normVector));
+ this.sigma = sigma;
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+
+ if (normVector != null) {
+ if (! pseudoRank) throw new IllegalStateException("Norm vectors can be used only when computing pseudoranks");
+ if (alpha >= 1 / sigma) throw new IllegalStateException("The specified norm vector can be used only with values of alpha smaller than " + 1 / sigma);
+ }
+
+ if (outdegree == null) {
+ // We allocate and compute the outdegree vector.
+ outdegree = new int[n];
+ // TODO: refactor using .outdegrees()
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing outdegrees...");
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ final int[] pred = nodeIterator.successorArray();
+ for (int d = nodeIterator.outdegree(); d-- != 0;) outdegree[pred[d]]++;
+ progressLogger.lightUpdate();
+ }
+
+ progressLogger.done();
+ }
+
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing initial dangling rank...");
+
+ danglingRank = 0;
+ /* The number of dangling nodes. */
+ int danglingNodes = 0;
+ for (int i = n; i-- != 0;) {
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) {
+ danglingRank += rank[i];
+ if (outdegree[i] == 0) danglingNodes++;
+ }
+ }
+
+ progressLogger.done();
+ logger.info(danglingNodes + " dangling nodes");
+ if (buckets != null) logger.info(buckets.cardinality() + " buckets");
+ logger.info("Initial dangling rank: " + danglingRank);
+
+ normDelta = danglingRankAccumulator = 0;
+ completed = false;
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 10000;
+
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = PageRankParallelGaussSeidel.this.graph.copy();
+ final int n = PageRankParallelGaussSeidel.this.n;
+ final double oneMinusAlpha = 1 - alpha;
+ final double oneMinusAlphaOverN = oneMinusAlpha / n;
+ final double[] rank = PageRankParallelGaussSeidel.this.rank;
+ final int[] outdegree = PageRankParallelGaussSeidel.this.outdegree;
+ final BitSet buckets = PageRankParallelGaussSeidel.this.buckets;
+ final boolean pseudoRank = PageRankParallelGaussSeidel.this.pseudoRank;
+ final double alpha = PageRankParallelGaussSeidel.this.alpha;
+ final DoubleList danglingNodeDistribution = PageRankParallelGaussSeidel.this.danglingNodeDistribution;
+ final DoubleList preference = PageRankParallelGaussSeidel.this.preference;
+ final KahanSummation s = new KahanSummation();
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final double danglingRank = PageRankParallelGaussSeidel.this.danglingRank;
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ double danglingRankAccumulator = 0, norm = 0;
+ final NodeIterator nodeIterator = graph.nodeIterator((int)start);
+
+ for(int i = (int)start; i < end; i++) {
+ nodeIterator.nextInt();
+ s.reset();
+ boolean hasLoop = false;
+
+ //Determine the rank from all incoming real links except possibly self link.
+ final int[] pred = nodeIterator.successorArray();
+ for(int indegree = nodeIterator.outdegree(); indegree-- != 0;) {
+ final int currPred = pred[indegree];
+ // Skip buckets
+ if (buckets != null && buckets.get(pred[indegree])) continue;
+ if (i == currPred) hasLoop = true;
+ else s.add(rank[currPred] / outdegree[currPred]);
+ }
+
+ double selfDanglingRank, selfLoopFactor;
+ //Determine the diagonal rank contribution
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) { //i is a dangling node
+ selfDanglingRank = rank[i];
+ selfLoopFactor = pseudoRank ? 1 :
+ (danglingNodeDistribution != null) ? 1 - alpha * danglingNodeDistribution.getDouble(i)
+ : 1.0 - alpha / n;
+ } else {
+ selfDanglingRank = 0;
+ selfLoopFactor = hasLoop ? 1 - alpha / outdegree[i] : 1; //i has no selfloop and it is not dangling
+ }
+
+ if (! pseudoRank) s.add(danglingNodeDistribution != null ? (danglingRank - selfDanglingRank) * danglingNodeDistribution.getDouble(i) : (danglingRank - selfDanglingRank) / n);
+
+ final double newRank = ((preference != null ? oneMinusAlpha * preference.getDouble(i) : oneMinusAlphaOverN) + alpha * s.value()) / selfLoopFactor;
+
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) danglingRankAccumulator += newRank;
+
+ if (normVector != null) norm = Math.max(norm, Math.abs(newRank - rank[i]) * (1L << (0xFF & normVector[i])));
+ else norm += Math.abs(newRank - rank[i]);
+
+ //update the rank
+ rank[i] = newRank;
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+
+ synchronized (PageRankParallelGaussSeidel.this) {
+ PageRankParallelGaussSeidel.this.danglingRankAccumulator += danglingRankAccumulator;
+ if (normVector != null) PageRankParallelGaussSeidel.this.normDelta = Math.max(PageRankParallelGaussSeidel.this.normDelta, norm);
+ else PageRankParallelGaussSeidel.this.normDelta += norm;
+ }
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ @Override
+ public void run() {
+ if (iteration > 0) {
+ progressLogger.done();
+ iterationLogger.setAndDisplay(iteration);
+
+ /*
+ // Compute the supremum norm of the residual
+ double res = 0;
+ double res1 = 0;
+ double err = 0;
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = 0; i < n ; i++) {
+ nodeIterator.nextInt();
+ double prod = 0;
+ final LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) prod += rank[s] / outdegree[s];
+ final double pref = preference == null ? 1. / n : preference.getDouble(i);
+ final double delta = Math.abs(rank[i]
+ - alpha * prod
+ - alpha * danglingRankAccumulator * pref
+ - (1 - alpha) * pref);
+ if (res < delta) res = delta;
+ res1 += delta;
+ }
+
+ LOGGER.info("Supremum norm of the residual: " + res);
+ LOGGER.info("l_1 norm of the residual: " + res1);
+ LOGGER.info("Bound on the l_1 norm of the error: " + normDelta() / (1 - alpha));
+ LOGGER.info("Bound on the supremum norm of the error: " + (1 + alpha) * res / (1 - alpha));
+ LOGGER.info("Supremum norm of the error: " + err);
+ if (err > (1 + alpha) * res / (1 - alpha)) LOGGER.warn("Wrong bound on error");
+ if (res1 > normDelta()) LOGGER.warn("Wrong bound on residual: " + res1 + " > " + normDelta());
+ */
+
+ if (stoppingCriterion.shouldStop(PageRankParallelGaussSeidel.this)) {
+ completed = true;
+ return;
+ }
+
+ danglingRank = danglingRankAccumulator;
+ danglingRankAccumulator = 0;
+ }
+
+ normDelta = danglingRankAccumulator = 0;
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + iteration++ + "...");
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+ /** Return the following values: if a {@linkplain #normVector(double[], double) suitable norm vector has been set}, an upper bound on the error (the &#x2113;<sub>&#x221E;</sub> distance from the rank to be computed);
+ * otherwise, an upper bound to the &#x2113;<sub>1</sub> norm of the error, obtained multiplying by
+ * &alpha; / (1 &minus; &alpha;) the &#x2113;<sub>1</sub> norm of the difference between the last two approximations (this idea arose in discussions with David Gleich).
+ *
+ * @return an upper bound on the error.
+ */
+ @Override
+ public double normDelta() {
+ return normVector == null ? normDelta * alpha / (1 - alpha) : (alpha * sigma) * normDelta / (1 - alpha * sigma);
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ outdegree = null;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(PageRankParallelGaussSeidel.class.getName(), "Computes PageRank of a graph, given its transpose, using a parallel implementation of Gauss-Seidel's method."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "Damping factor."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold (in l_1 norm, if no norm vector has been specified; in the weighted supremum norm otherwise) to determine whether to stop."),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new Switch("pseudoRank", JSAP.NO_SHORTFLAG, "pseudorank", "Compute pseudoranks (the dangling preference is set to 0)."),
+ new FlaggedOption("normVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "norm-vector", "A vector inducing the correct weighted supremum norm."),
+ new FlaggedOption("sigma", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "sigma", "The value for which the norm vector is suitable (i.e., the maximum ratio from its properties)."),
+ new FlaggedOption("buckets", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'b', "buckets", "The buckets of the graph; if supplied, buckets will be treated as dangling nodes."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new Switch("strongly", 'S', "strongly", "use the preference vector to redistribute the dangling rank."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final boolean strongly = jsapResult.getBoolean("strongly", false);
+ final String graphBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String normVectorFilename = jsapResult.getString("normVector");
+ if (normVectorFilename != null && ! jsapResult.userSpecified("sigma")) throw new IllegalArgumentException("You must specify the sigma for which the norm vector is suitable");
+ final String buckets = jsapResult.getString("buckets");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+
+ DoubleList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector"))
+ preference = DoubleArrayList.wrap(BinIO.loadDoubles(preferenceFilename = jsapResult.getString("preferenceVector")));
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ if (strongly && preference == null) throw new IllegalArgumentException("The 'strongly' option requires a preference vector");
+
+ if (jsapResult.userSpecified("expand")) graph = new ArrayListMutableGraph(graph).immutableView();
+
+ PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(graph, threads, LOGGER);
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.preference = preference;
+ pr.buckets = (BitSet)(buckets == null ? null : BinIO.loadObject(buckets));
+ pr.stronglyPreferential = strongly;
+ pr.pseudoRank = jsapResult.userSpecified("pseudoRank");
+ if (normVectorFilename != null) pr.normVector(normVectorFilename, jsapResult.getDouble("sigma"));
+
+ // cycle until we reach maxIter iterations or the norm is less than the given threshold (whichever comes first)
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(graphBasename, preferenceFilename, null).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankParallelPowerSeries.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankParallelPowerSeries.java
new file mode 100644
index 0000000..3d265a8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankParallelPowerSeries.java
@@ -0,0 +1,348 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.BitSet;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.apache.commons.lang.mutable.MutableDouble;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes PageRank using a parallel (multicore) implementation of the {@linkplain PageRankPowerSeries power-series method}, which runs
+ * the power method starting from the preference vector, thus evaluating the truncated PageRank power series (see {@link PageRankPowerSeries}).
+ *
+ * <p>Note that the {@link #step()} method is not available: due to the need for some synchronization logic, only {@link #stepUntil(StoppingCriterion)}
+ * is available.
+ *
+ * <p><em>This class exists for debugging and comparison purposes only</em>. The class of choice for computing PageRank is {@link PageRankParallelGaussSeidel}.
+ *
+ * <p><strong>Warning</strong>: Since we need to enumerate the <em>predecessors</em> a node,
+ * you must pass to the {@linkplain #PageRankParallelPowerSeries(ImmutableGraph, int, Logger) constructor} the <strong>transpose</strong>
+ * of the graph.
+ *
+ * @see PageRankPowerSeries
+ * @see PageRank
+ * @see SpectralRanking
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class PageRankParallelPowerSeries extends PageRank {
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankParallelPowerSeries.class);
+
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** The rank lost through dangling nodes. */
+ private final MutableDouble danglingRankAccumulator;
+ /** The outdegree of each node. */
+ private int[] outdegree;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** The rank vector after the last iteration (only meaningful after at least one step). */
+ public double[] previousRank;
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PageRankParallelPowerSeries(final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ danglingRankAccumulator = new MutableDouble();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the transpose of the graph on which to compute PageRank.
+ */
+ public PageRankParallelPowerSeries(final ImmutableGraph transpose) {
+ this(transpose, 0, LOGGER);
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+
+ // Creates the arrays, if necessary
+ if (previousRank == null) previousRank = new double[n];
+
+ if (outdegree == null) {
+ // We allocate and compute the outdegree vector.
+ outdegree = new int[n];
+ // TODO: refactor using .outdegrees().
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing outdegrees...");
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ final int[] pred = nodeIterator.successorArray();
+ for (int d = nodeIterator.outdegree(); d-- != 0;) outdegree[pred[d]]++;
+ progressLogger.lightUpdate();
+ }
+
+ progressLogger.done();
+ }
+
+ danglingRankAccumulator.setValue(0);
+ completed = false;
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 10000;
+
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = PageRankParallelPowerSeries.this.graph.copy();
+ final BitSet buckets = PageRankParallelPowerSeries.this.buckets;
+ final int[] outdegree = PageRankParallelPowerSeries.this.outdegree;
+ final int n = PageRankParallelPowerSeries.this.n;
+ final double alpha = PageRankParallelPowerSeries.this.alpha;
+ final DoubleList preference = PageRankParallelPowerSeries.this.preference;
+ final KahanSummation s = new KahanSummation();
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final double[] oldRank = rank, newRank = previousRank;
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ double accum = 0.0;
+ final NodeIterator nodeIterator = graph.nodeIterator((int)start);
+
+ for(int i = (int)start; i < end; i++) {
+ nodeIterator.nextInt();
+ if (outdegree[i] == 0 || buckets != null && buckets.get(i)) accum += oldRank[i];
+ int indegree = nodeIterator.outdegree();
+ s.reset();
+ if (indegree != 0) {
+ final int[] pred = nodeIterator.successorArray();
+ if (buckets == null) while (indegree-- != 0) s.add(oldRank[pred[indegree]] / outdegree[pred[indegree]]);
+ else while (indegree-- != 0) if (! buckets.get(pred[indegree])) s.add(oldRank[pred[indegree]] / outdegree[pred[indegree]]);
+ }
+
+ if (preference != null) newRank[i] = alpha * s.value() + (1 - alpha) * preference.getDouble(i);
+ else newRank[i] = alpha * s.value() + (1 - alpha) / n;
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+
+ synchronized (danglingRankAccumulator) {
+ danglingRankAccumulator.add(accum);
+ }
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ @Override
+ public void run() {
+ if (iteration > 0) {
+ progressLogger.done();
+
+ final double t[] = rank;
+ rank = previousRank;
+ previousRank = t;
+
+ final double adjustment = danglingNodeDistribution == null ? alpha * danglingRankAccumulator.doubleValue() / n : alpha * danglingRankAccumulator.doubleValue();
+ if (preference != null)
+ if (danglingNodeDistribution == null)
+ for(int i = n; i-- != 0;) rank[i] += adjustment;
+ else
+ for(int i = n; i-- != 0;) rank[i] += adjustment * danglingNodeDistribution.getDouble(i);
+ else
+ if (danglingNodeDistribution == null)
+ for(int i = n; i-- != 0;) rank[i] += adjustment;
+ else
+ for(int i = n; i-- != 0;) rank[i] += adjustment * danglingNodeDistribution.getDouble(i);
+
+ iterationLogger.setAndDisplay(iteration);
+
+ if (stoppingCriterion.shouldStop(PageRankParallelPowerSeries.this)) {
+ completed = true;
+ return;
+ }
+ }
+
+ danglingRankAccumulator.setValue(0);
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + iteration++ + "...");
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+
+ @Override
+ public double normDelta() {
+ return Norm.L_1.compute(rank, previousRank) * alpha / (1 - alpha);
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ previousRank = null;
+ outdegree = null;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ SimpleJSAP jsap = new SimpleJSAP(PageRankParallelPowerSeries.class.getName(), "Computes PageRank of a graph, given its transpose, using a parallel implementation of the power-series method."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "Damping factor."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold to determine whether to stop."),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new FlaggedOption("buckets", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'b', "buckets", "The buckets of the graph; if supplied, buckets will be treated as dangling nodes."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new Switch("strongly", 'S', "strongly", "use the preference vector to redistribute the dangling rank."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final boolean strongly = jsapResult.getBoolean("strongly", false);
+ final String graphBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String buckets = jsapResult.getString("buckets");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+
+ DoubleList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector"))
+ preference = DoubleArrayList.wrap(BinIO.loadDoubles(preferenceFilename = jsapResult.getString("preferenceVector")));
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ if (strongly && preference == null) throw new IllegalArgumentException("The 'strongly' option requires a preference vector");
+
+ if (jsapResult.userSpecified("expand")) graph = new ArrayListMutableGraph(graph).immutableView();
+
+ final PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(graph, threads, LOGGER);
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.preference = preference;
+ pr.buckets = (BitSet)(buckets == null ? null : BinIO.loadObject(buckets));
+ pr.stronglyPreferential = strongly;
+
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(graphBasename, preferenceFilename, null).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankPowerSeries.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankPowerSeries.java
new file mode 100644
index 0000000..42ca878
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankPowerSeries.java
@@ -0,0 +1,351 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.DataOutputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.BitSet;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes PageRank (and possibly its derivatives in the damping factor) using its power series.
+ *
+ * <P>PageRank has a power series expansion in &alpha; (see {@link PageRank} for definitions):
+ *
+ * <div style="text-align: center; margin: 1em">
+ * <var><b>r</b></var> = <var><b>v</b></var> +
+ * <var><b>v</b></var><big>&Sigma;</big><sub><var>n</var>&ge;<var>1</var></sub>&alpha;<sup><var>n</var></sup><big>(</big> (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n</var></sup> &minus; (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n&minus;1</var></sup> <big>)</big>
+ * </div>
+ *
+ * <p>In &ldquo;PageRank: Functional dependencies&rdquo;, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna,
+ * <i>ACM Trans. Inf. Sys.</i>, 27(4):1&minus;23, 2009, we show that the truncation of order <var>k</var> of the power series
+ * is exactly the value attained by the power method using <var><b>v</b></var> as starting vector. This class exploits the equivalence to compute iteratively the Maclaurin polynomials
+ * of order <var>k</var>:
+ *
+ * <div style="text-align: center">
+ * <var><b>r</b></var><sup>(<var>k</var>)</sup> =
+ * <var><b>v</b></var> <big>(</big> &alpha; <var>P</var> + &alpha; <var><b>d</b><sup>T</sup></var><var><b>u</b></var> + (1&minus;&alpha;)<b>1</b><sup><var>T</var></sup> <var><b>v</b></var> <big>)</big><sup><var>k</var></sup> =
+ * <var><b>v</b></var> +
+ * <var><b>v</b></var><big>&Sigma;</big><sub>1&le;<var>n</var>&le;<var>k</var></sub> &alpha;<sup><var>n</var></sup><big>(</big> (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n</var></sup> &minus; (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n&minus;1</var></sup> <big>)</big>.
+ * </div>
+ *
+ * <p>The remainder (i.e., the difference with PageRank) at the <var>k</var>-th iteration can be bounded exactly
+ * using the norm of the difference <var><b>r</b></var><sup>(<var>k</var>)</sup>&nbsp;&minus;&nbsp;<var><b>r</b></var><sup>(<var>k</var>&minus;1)</sup>, as
+ * <div style="text-align: center; margin: 1em">
+ * <var><b>r</b></var> &minus; <var><b>r</b></var><sup>(<var>k</var>)</sup>
+ * = <var><b>v</b></var><big>&Sigma;</big><sub><var>n</var>&ge;<var>k</var>+1</sub> &alpha;<sup><var>n</var></sup><big>(</big> (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n</var></sup> &minus; (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n&minus;1</var></sup> <big>)</big>
+ * = <var><b>v</b></var> &alpha;<sup><var>k</var></sup><big>(</big> (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>k</var></sup>
+ * &minus; (<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>k&minus;1</var></sup> <big>)</big> <big>&Sigma;</big><sub><var>n</var>&ge;1</sub>&alpha;<sup><var>n</var></sup>(<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n</var></sup>
+ * = (<var><b>r</b></var><sup>(<var>k</var>)</sup>&nbsp;&minus;&nbsp;<var><b>r</b></var><sup>(<var>k</var>&minus;1)</sup>) <big>&Sigma;</big><sub><var>n</var>&ge;1</sub>&alpha;<sup><var>n</var></sup>(<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n</var></sup>.
+ * </div>
+ *
+ * <p>Hence,
+ * <div style="text-align: center; margin: 1em">
+ * &#x2016;<var><b>r</b></var> &minus; <var><b>r</b></var><sup>(<var>k</var>)</sup>&#x2016;<sub>1</sub> &le;
+ *
+ * &#x2016;<var><b>r</b></var><sup>(<var>k</var>)</sup>&nbsp;&minus;&nbsp;<var><b>r</b></var><sup>(<var>k</var>&minus;1)</sup>&#x2016;<sub>1</sub> &#x2016;<big>&Sigma;</big><sub><var>n</var>&ge;1</sub>&alpha;<sup><var>n</var></sup>(<var>P</var> + <var><b>d</b></var><sup><var>T</var></sup><var><b>u</b></var>)<sup><var>n</var></sup>&#x2016;<sub>1</sub>
+ * &le; &#x2016;<var><b>r</b><sup>(k)</sup></var>&nbsp;&minus;&nbsp;<var><b>r</b></var><sup>(<var>k</var>&minus;1)</sup>&#x2016;<sub>1</sub> &alpha; / (1 &minus; &alpha;),
+ * </div>
+ * and this is the value returned by {@link #normDelta()}.
+ *
+ * <p>It is worth remarking that a folklore justification for convergence, that is, the spectral gap between the dominant eigenvalue and the second eigenvalue, does <em>not</em> apply
+ * directly to PageRank, as it is applicable only to normal (i.e., diagonalizable) matrices. The inequality above, instead, makes it possible to bound in a precise manner the
+ * error in the current estimation.
+ *
+ * <p>Note that
+ * <div style="text-align: center; margin: 1em">
+ * <var><b>r</b></var><sup>(<var>t</var>+1)</sup> =
+ * <var><b>r</b></var><sup>(<var>t</var>)</sup> (&alpha; <var>P</var> + &alpha; <var><b>d</b><sup>T</sup></var><var><b>u</b></var> + (1&minus;&alpha;)<b>1</b><sup><var>T</var></sup> <var><b>v</b></var>) =
+ * &alpha;<var><b>r</b></var><sup>(<var>t</var>)</sup> <var>P</var> + &alpha; <var><b>r</b></var><sup>(<var>t</var>)</sup><var><b>d</b></var> <var><b>u</b></var> + (1&minus;&alpha;) <var><b>v</b></var>.
+ * </div>
+ *
+ * <p>The latter formula means that
+ * <div style="text-align: center; margin: 1em">
+ * <var>r<sub>i</sub></var><sup>(<var>t</var>+1)</sup>=
+ * &alpha;<big>&Sigma;</big><sub><var>j</var> &rarr; <var>i</var></sub> <var>p<sub>ji</sub></var><var>r<sub>j</sub></var><sup>(<var>t</var>)</sup>
+ * + &alpha;&kappa;<var>u</var><sub><var>i</var></sub>
+ * + (1&minus;&alpha;)<var>v</var><sub><var>i</var></sub>,
+ * </div>
+ * where &kappa; is the sum of <var>r<sub>j</sub></var><sup>(<var>t</var>)</sup>
+ * over all dangling nodes. This is the formula used in the code.
+ *
+ * <P>The attribute {@link #previousRank} represents the ranking at the previous step.
+ *
+ * <h2>Derivatives and coefficients of the Maclaurin polynomials</h2>
+ *
+ * <p>Using results from &ldquo;PageRank: Functional dependencies&rdquo;, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna,
+ * <i>ACM Trans. Inf. Sys.</i>, 27(4):1&minus;23, 2009, this class is able also to approximate the derivatives
+ * of PageRank in {@linkplain #alpha &alpha;}, and to compute, for each node, the
+ * coefficients of Maclaurin polynomials. You have to set a non-empty {@link #order} array specifying
+ * the order of the derivatives desired, or a {@linkplain #coeffBasename basename for the coefficients},
+ * respectively . The derivatives will be evaluated (as PageRank is) in the value set for &alpha;.
+ *
+ * @see PageRank
+ * @see SpectralRanking
+ */
+
+public class PageRankPowerSeries extends PageRank {
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankPowerSeries.class);
+
+ /** The rank vector after the last iteration (only meaningful after at least one step). */
+ public double[] previousRank;
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** If not {@code null}, the subset of nodes over which the derivatives should be computed. */
+ public int[] subset;
+ /** The value of derivatives (only for the subset of nodes specified in {@link #subset}, if not {@code null}). */
+ public double[][] derivative;
+ /** The order of the derivatives. Must be non-{@code null}, but it can be the empty array. */
+ public int[] order = IntArrays.EMPTY_ARRAY;
+ /** If not {@code null}, the basename for coefficents. */
+ public String coeffBasename;
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PageRankPowerSeries(final ImmutableGraph graph, final Logger logger) {
+ super(graph, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ }
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph.
+ */
+ public PageRankPowerSeries(final ImmutableGraph graph) {
+ this(graph, LOGGER);
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+ // Creates the arrays, if necessary
+ if (previousRank == null) previousRank = new double[n];
+ derivative = new double[order.length][subset != null ? subset.length : n];
+ if (IntArrayList.wrap(order).indexOf(0) != -1) throw new IllegalArgumentException("You cannot compute the derivative of order 0 (use PageRank instead)");
+ if (coeffBasename != null) BinIO.storeDoubles(rank, coeffBasename + "-" + 0);
+
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ @Override
+ public void step() throws IOException {
+ final double[] oldRank = rank, newRank = previousRank;
+ Arrays.fill(newRank, 0);
+
+ // for each node, calculate its outdegree and redistribute its rank among pointed nodes
+ double accum = 0.0;
+
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + iteration++ + "...");
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ int outdegree;
+ int[] succ;
+
+ for(int i = 0; i < n; i++) {
+ nodeIterator.nextInt();
+ outdegree = nodeIterator.outdegree();
+
+ if (outdegree == 0 || buckets != null && buckets.get(i)) accum += oldRank[i];
+ else {
+ int j = outdegree;
+ succ = nodeIterator.successorArray();
+ while (j-- != 0) newRank[succ[j]] += oldRank[i] / outdegree;
+ }
+ progressLogger.update();
+ }
+ progressLogger.done();
+
+ final double accumOverNumNodes = accum / n;
+
+ final double oneOverNumNodes = 1.0 / n;
+ if (preference != null)
+ if (danglingNodeDistribution == null)
+ for(int i = n; i-- != 0;) newRank[i] = alpha * newRank[i] + (1 - alpha) * preference.getDouble(i) + alpha * accumOverNumNodes;
+ else
+ for(int i = n; i-- != 0;) newRank[i] = alpha * newRank[i] + (1 - alpha) * preference.getDouble(i) + alpha * accum * danglingNodeDistribution.getDouble(i);
+ else
+ if (danglingNodeDistribution == null)
+ for(int i = n; i-- != 0;) newRank[i] = alpha * newRank[i] + (1 - alpha) * oneOverNumNodes + alpha * accumOverNumNodes;
+ else
+ for(int i = n; i-- != 0;) newRank[i] = alpha * newRank[i] + (1 - alpha) * oneOverNumNodes + alpha * accum * danglingNodeDistribution.getDouble(i);
+
+ //make the rank just computed the new rank
+ rank = newRank;
+ previousRank = oldRank;
+
+ // Compute derivatives.
+ if (subset == null) {
+ for(int i = 0; i < order.length; i++) {
+ final int k = order[i];
+ final double alphak = Math.pow(alpha, k);
+ final double nFallingK = it.unimi.dsi.law.Util.falling(iteration, k);
+ for(int j = 0; j < n; j++) derivative[i][j] += nFallingK * (rank[j] - previousRank[j]) / alphak;
+ }
+ }
+ else {
+ for(int i = 0; i < order.length; i++) {
+ final int k = order[i];
+ final double alphak = Math.pow(alpha, k);
+ final double nFallingK = it.unimi.dsi.law.Util.falling(iteration, k);
+
+ for(final int t: subset) derivative[i][t] += nFallingK * (rank[t] - previousRank[t]) / alphak;
+ }
+ }
+
+ // Compute coefficients, if required.
+
+ if (coeffBasename != null) {
+ final DataOutputStream coefficients = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(coeffBasename + "-" + (iteration))));
+ final double alphaN = Math.pow(alpha, iteration);
+ for(int i = 0; i < n; i++) coefficients.writeDouble((rank[i] - previousRank[i]) / alphaN);
+ coefficients.close();
+ }
+
+ iterationLogger.setAndDisplay(iteration);
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ super.stepUntil(stoppingCriterion);
+
+ for(int i = 0; i < order.length; i++) {
+ if (iteration < order[i] / (1 - alpha)) LOGGER.info("Error bound for derivative of order " + order[i] + " (alpha=" + alpha + "): unknown");
+ else {
+ final int k = order[i];
+ final double delta = alpha * iteration / (iteration + k);
+ final double alphak = Math.pow(alpha, k);
+ final double nFallingK = it.unimi.dsi.law.Util.falling(iteration, k);
+ double infinityNorm = 0;
+ for(int j = 0; j < n; j++) infinityNorm = Math.max(infinityNorm, nFallingK * (rank[j] - previousRank[j]) / alphak);
+
+ LOGGER.info("Error bound for derivative of order " + k + " (alpha=" + alpha + "): " + infinityNorm * delta / (1 - delta));
+ }
+ }
+
+ iterationLogger.done();
+ }
+
+ @Override
+ public double normDelta() {
+ return Norm.L_1.compute(rank, previousRank) * alpha / (1 - alpha);
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ previousRank = null;
+ derivative = null;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(PageRankPowerSeries.class.getName(), "Computes PageRank of a graph using the power-series method. Additionally, computes derivatives and coefficients of Maclaurin polynomials."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "Damping factor."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold to determine whether to stop."),
+ new FlaggedOption("coeff", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'c', "coeff", "Save the k-th coefficient of the Maclaurin polynomial using this basename."),
+ new FlaggedOption("derivative", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'd', "derivative", "The order(s) of the derivative(s) to be computed (>0).").setAllowMultipleDeclarations(true),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new FlaggedOption("buckets", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'b', "buckets", "The buckets of the graph; if supplied, buckets will be treated as dangling nodes."),
+ new Switch("offline", 'o', "offline", "No-op for compatibility."),
+ new Switch("strongly", 'S', "strongly", "use the preference vector to redistribute the dangling rank."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean strongly = jsapResult.getBoolean("strongly", false);
+ final int[] order = jsapResult.getIntArray("derivative");
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final String buckets = jsapResult.getString("buckets");
+ final String coeffBasename = jsapResult.getString("coeff");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ final ImmutableGraph graph = ImmutableGraph.loadOffline(graphBasename, progressLogger);
+
+ DoubleList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector"))
+ preference = DoubleArrayList.wrap(BinIO.loadDoubles(preferenceFilename = jsapResult.getString("preferenceVector")));
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ if (strongly && preference == null) throw new IllegalArgumentException("The 'strongly' option requires a preference vector");
+
+ final PageRankPowerSeries pr = new PageRankPowerSeries(graph);
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.preference = preference;
+ pr.buckets = (BitSet)(buckets == null ? null : BinIO.loadObject(buckets));
+ pr.stronglyPreferential = strongly;
+ pr.order = order != null ? order : null;
+ pr.coeffBasename = coeffBasename;
+
+ // cycle until we reach maxIter iterations or the norm is less than the given threshold (whichever comes first)
+ pr.stepUntil(or(new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(graphBasename, preferenceFilename, null).save(rankBasename + ".properties");
+
+ if (order != null) for(int i = 0; i < order.length; i++) BinIO.storeDoubles(pr.derivative[i], rankBasename + ".der-" + order[i]);
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankPush.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankPush.java
new file mode 100644
index 0000000..30e6578
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PageRankPush.java
@@ -0,0 +1,430 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+import it.unimi.dsi.fastutil.booleans.BooleanArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.fastutil.ints.Int2IntOpenHashMap;
+import it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntPriorityQueue;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes strongly preferential PageRank for a preference vector concentrated on a node using the push algorithm.
+ *
+ * <p>The <em>push algorithm</em> is an incremental way of computing PageRank that is particularly useful when the preference
+ * vector is nonzero on a single node, the root (i.e., the entire probability mass of the preference vector is in a single node).
+ * It was first proposed by Glen Jeh and Jennifer Widom in &ldquo;Scaling personalized web search&rdquo;, <i>Proc. of the
+ * Twelfth International World Wide Web Conference</i>, pages 271&minus;279, ACM Press, 2003.
+ *
+ * <p>This implementation, in particular, computes strongly preferential PageRank for a single root node (i.e., dangling nodes donate all their rank to the root).
+ * Since often the set of nodes involved in the computation is a small fraction, we represent implicitly such nodes using their discovery order. As a result,
+ * the {@link PageRank#rank} vector does <em>not</em> contain the ranks, which can be recovered using the following code instead:
+ * <pre>
+ * double[] rank = new double[graph.numNodes()];
+ * for(int i = pageRank.node2Seen.size(); i-- != 0;) rank[pageRank.seen2Node[i]] = pageRank.rank[i] / pageRank.pNorm;
+ * </pre>
+ *
+ * <p>In case you are interested in the <em>pseudorank</em>, instead, you should use
+ * <pre>
+ * double[] rank = new double[graph.numNodes()];
+ * for(int i = pageRank.node2Seen.size(); i-- != 0;) rank[pageRank.seen2Node[i]] = pageRank.rank[i] / (1 - pageRank.backToRoot);
+ * </pre>
+ *
+ * <p>Details on the push algorithm have been given by Paolo Boldi and Sebastiano Vigna in
+ * &ldquo;<a href="http://vigna.di.unimi.it/papers.php#BoVPASR">The Push Algorithm for Spectral Ranking</a>&rdquo;.
+ * We implement both a priority-based update rule, and a simple FIFO update rule. Moreover, we implement <em>loop elimination</em> at the root, as described
+ * by Pavel Berkhin in &ldquo;Bookmark-coloring approach to personalized PageRank computing&rdquo;, <i>Internet Math.</i>, 3(1):41&minus;62, 2006.
+ *
+ * @author Sebastiano Vigna
+ */
+public class PageRankPush extends PageRank {
+ private static final int INITIAL_SIZE = 16;
+
+ private final static Logger LOGGER = LoggerFactory.getLogger(PageRankPush.class);
+
+ /** A progress logger. */
+ public final ProgressLogger progressLogger;
+ /** The node where the preference vector is concentrated. */
+ public int root;
+ /** The threshold for stopping. */
+ public double threshold = DEFAULT_THRESHOLD;
+ /** The vector <var>r</var> (the r&ocirc;ole of <var>p</var> is covered by {@link #rank}). */
+ public double[] residual;
+ /** Whether we should use a {@link #fifoQueue} instead of an {@link #indirectQueue}. */
+ private final boolean fifo;
+ /** The update FIFO queue. You can choose at construction time whether to use this queue or {@link #indirectQueue}. */
+ private IntPriorityQueue fifoQueue;
+ /** The update priority queue. You can choose at construction time whether to use this queue or {@link #fifoQueue}. */
+ private IntHeapIndirectPriorityQueue indirectQueue;
+ /** The norm of the {@link #residual}. */
+ private double rNorm;
+ /** The stopping threshold multiplied by the number of nodes of the graph. */
+ private double thresholdByNumNodes;
+ /** Represents implicitly the set of elements in {@link #fifoQueue}, if the latter is used. */
+ private boolean[] inQueue;
+
+ /** The norm of the {@link #rank}. */
+ public double pNorm;
+ /** A map from nodes to the seen-order. */
+ public Int2IntOpenHashMap node2Seen;
+ /** A map from seen-order to nodes. */
+ public int[] seen2Node;
+ /** The amount of ranking going back to the root. */
+ public double backToRoot;
+
+ public final static class IntHeapIndirectPriorityQueue {
+ /** The reference array. */
+ public double[] refArray;
+ /** The inversion set. */
+ private int[] inv;
+ /** The number of elements currently in the queue. */
+ private int size;
+ /** The heap. */
+ private int[] heap;
+
+ public IntHeapIndirectPriorityQueue() {
+ this.inv = new int[INITIAL_SIZE];
+ Arrays.fill(inv, -1);
+ heap = new int[INITIAL_SIZE];
+ }
+
+ public static int upHeap(final double[] refArray, final int[] heap, final int[] inv, int i) {
+ final int e = heap[i];
+ final double E = refArray[e];
+ int parent;
+ while (i != 0 && (parent = (i - 1) / 2) >= 0) {
+ if (refArray[heap[parent]] >= E) break;
+ heap[i] = heap[parent];
+ inv[heap[i]] = i;
+ i = parent;
+ }
+ heap[i] = e;
+ inv[e] = i;
+ return i;
+ }
+
+ public static int downHeap(final double[] refArray, final int[] heap, final int[] inv, final int size, int i) {
+ final int e = heap[i];
+ final double E = refArray[e];
+ int child;
+ while ((child = 2 * i + 1) < size) {
+ if (child + 1 < size && refArray[heap[child + 1]] > refArray[heap[child]]) child++;
+ if (E >= refArray[heap[child]]) break;
+ heap[i] = heap[child];
+ inv[heap[i]] = i;
+ i = child;
+ }
+ heap[i] = e;
+ inv[e] = i;
+ return i;
+ }
+
+ public void enqueue(final int x) {
+ if (contains(x)) throw new IllegalArgumentException("Index " + x + " belongs to the queue");
+ if (size == heap.length) heap = IntArrays.grow(heap, size + 1);
+
+ if (x >= inv.length) {
+ final int l = inv.length;
+ inv = IntArrays.grow(inv, x + 1);
+ Arrays.fill(inv, l, inv.length, -1);
+ }
+
+ inv[heap[size] = x] = size++;
+
+ upHeap(refArray, heap, inv, size - 1);
+ }
+
+ public boolean contains(final int index) {
+ return index < inv.length && inv[index] >= 0;
+ }
+
+ public int dequeue() {
+ if (size == 0) throw new NoSuchElementException();
+ final int result = heap[0];
+ if (--size != 0) inv[heap[0] = heap[size]] = 0;
+ inv[result] = -1;
+
+ if (size != 0) downHeap(refArray, heap, inv, size, 0);
+ return result;
+ }
+
+ public void changed() {
+ downHeap(refArray, heap, inv, size, 0);
+ }
+
+ public boolean changed(final int index) {
+ if (index >= inv.length) return false;
+ final int pos = inv[index];
+ if (pos < 0) return false;
+ final int newPos = upHeap(refArray, heap, inv, pos);
+ downHeap(refArray, heap, inv, size, newPos);
+ return true;
+ }
+
+ public void clear() {
+ size = 0;
+ Arrays.fill(inv, -1);
+ }
+
+ public boolean isEmpty() {
+ return size == 0;
+ }
+ }
+
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph on which to compute PageRank.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ * @param fifo whether to use a FIFO queue instead of a priority queue to choose the next node to update.
+ */
+ protected PageRankPush(final ImmutableGraph graph, final Logger logger, boolean fifo) {
+ super(graph, logger);
+ this.fifo = fifo;
+ progressLogger = new ProgressLogger(logger);
+ }
+
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph on which to compute PageRank.
+ * @param fifo whether to use a FIFO queue instead of a priority queue to choose the next node to update.
+ */
+ public PageRankPush(final ImmutableGraph graph, final boolean fifo) {
+ this(graph, LOGGER, fifo);
+ }
+
+ @Override
+ public void clear() {
+ fifoQueue = null;
+ indirectQueue = null;
+ residual = null;
+ rank = null;
+ seen2Node = null;
+ node2Seen = null;
+ inQueue = null;
+ }
+
+ @Override
+ public void init() throws IOException {
+ // We do not call super(), as we do not want to initialize rank.
+ int prevSize = -1;
+ if (node2Seen == null) (node2Seen = new Int2IntOpenHashMap()).defaultReturnValue(-1);
+ else {
+ prevSize = node2Seen.size();
+ node2Seen.clear();
+ }
+
+ if (seen2Node == null) seen2Node = new int[INITIAL_SIZE];
+
+ if (residual == null) residual = new double[INITIAL_SIZE];
+ else Arrays.fill(residual, 0, prevSize, 0);
+
+ if (rank == null) rank = new double[INITIAL_SIZE];
+ else Arrays.fill(rank, 0, prevSize, 0);
+
+ if (fifo) {
+ if (fifoQueue == null) fifoQueue = new IntArrayFIFOQueue();
+ else fifoQueue.clear();
+ if (inQueue == null) inQueue = new boolean[INITIAL_SIZE];
+ else Arrays.fill(inQueue, 0, prevSize, false);
+ }
+ else {
+ if (indirectQueue == null) indirectQueue = new IntHeapIndirectPriorityQueue();
+ else indirectQueue.clear();
+ }
+
+ logger.info("Initialising...");
+ logger.info("alpha = " + alpha);
+
+ if (fifoQueue == null) indirectQueue.refArray = residual;
+
+ pNorm = 0;
+ rNorm = 1;
+ backToRoot = 0;
+ thresholdByNumNodes = threshold / n;
+
+ node2Seen.put(root, 0);
+ seen2Node[0] = root;
+ if (fifo) fifoQueue.enqueue(0);
+ else indirectQueue.enqueue(0);
+ residual[0] = 1;
+
+ progressLogger.itemsName = "pushes";
+ progressLogger.start("Computing...");
+ }
+
+
+ public void step() {
+ final boolean fifo = this.fifo;
+ int curr = fifo ? fifoQueue.dequeueInt() : indirectQueue.dequeue();
+ if (fifo) inQueue[curr] = false;
+ double residualCurr = residual[curr];
+ rank[curr] += (1 - alpha) * residualCurr;
+ pNorm += (1 - alpha) * residualCurr;
+
+ final int node = seen2Node[curr];
+ final int d = graph.outdegree(node);
+ final LazyIntIterator successors = graph.successors(node);
+
+ final double alphaByd = alpha / d;
+ int nonRoot = d;
+
+ for(int i = d; i-- != 0;) {
+ final int s = successors.nextInt();
+ if (s == root) {
+ // Berkhin's loop elimination
+ backToRoot += alphaByd * residualCurr;
+ nonRoot--;
+ continue;
+ }
+
+ int u = node2Seen.get(s);
+ if (u == -1) {
+ node2Seen.put(s, u = node2Seen.size());
+ if (u >= residual.length) {
+ rank = DoubleArrays.grow(rank, u + 1);
+ residual = DoubleArrays.grow(residual, u + 1);
+ seen2Node = IntArrays.grow(seen2Node, u + 1);
+ if (fifo) inQueue = BooleanArrays.grow(inQueue, u + 1);
+ else indirectQueue.refArray = residual;
+ }
+ seen2Node[u] = s;
+ }
+ else if (u == curr) throw new IllegalArgumentException("The graph must be loopless");
+
+ residual[u] += alphaByd * residualCurr;
+
+ if (fifo) {
+ if ((1 - backToRoot) * residual[u] >= thresholdByNumNodes * pNorm && ! inQueue[u]) {
+ inQueue[u] = true;
+ fifoQueue.enqueue(u);
+ }
+ }
+ else {
+ if (indirectQueue.contains(u)) indirectQueue.changed(u);
+ else if (residual[u] * (1 - backToRoot) >= thresholdByNumNodes * pNorm) indirectQueue.enqueue(u);
+ }
+ }
+
+ residual[curr] = 0;
+ rNorm -= residualCurr;
+
+ if (d != 0) rNorm += (alphaByd * nonRoot) * residualCurr;
+
+ progressLogger.lightUpdate();
+ }
+
+ public boolean queueIsEmpty() {
+ return fifo ? fifoQueue.isEmpty() : indirectQueue.isEmpty();
+ }
+
+ public static class EmptyQueueStoppingCritertion implements StoppingCriterion {
+ @Override
+ public boolean shouldStop(SpectralRanking p) {
+ return ((PageRankPush)p).queueIsEmpty();
+ }
+ }
+
+ public static class L1NormStoppingCritertion implements StoppingCriterion {
+ @Override
+ public boolean shouldStop(SpectralRanking p) {
+ final PageRankPush pracl = (PageRankPush)p;
+ return pracl.queueIsEmpty() || (1 - pracl.backToRoot) * pracl.rNorm <= pracl.pNorm * pracl.threshold;
+ }
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(PageRankPush.class.getName(),
+ "Computes strongly preferential PageRank for a preference vector concentrated on a node using the push algorithm.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, Double.toString(PageRank.DEFAULT_ALPHA), JSAP.NOT_REQUIRED, 'a', "alpha", "The damping factor."),
+ new FlaggedOption("root", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'r', "root", "The node where the preference vector is concentrated."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "The L1-norm threshold."),
+ new Switch("l1Norm", '1', "l1-norm", "Use the relativized L1-norm as stopping criterion."),
+ new Switch("fifo", 'f', "FIFO", "Whether to use a FIFO queue instead of a priority queue to choose the next node to update."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename where the results will be stored. <rankBasename>.properties will contain the parameter values used in the computation. <rankBasename>.ranks will contain the ranks (doubles in binary form).")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final double threshold = jsapResult.getDouble("threshold");
+
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+ ImmutableGraph graph = null;
+
+ graph = ImmutableGraph.load(graphBasename, progressLogger);
+ if (jsapResult.userSpecified("expand")) graph = new ArrayListMutableGraph(graph).immutableView();
+
+ PageRankPush pr = new PageRankPush(graph, LOGGER, jsapResult.userSpecified("fifo"));
+ pr.alpha = jsapResult.getDouble("alpha");
+ pr.root = jsapResult.getInt("root");
+ pr.threshold = threshold;
+
+ if (jsapResult.userSpecified("l1Norm")) pr.stepUntil(new L1NormStoppingCritertion());
+ else pr.stepUntil(new EmptyQueueStoppingCritertion());
+ pr.progressLogger.done();
+
+ System.err.print("Saving ranks...");
+ double[] rank = new double[graph.numNodes()];
+ for(int i = pr.node2Seen.size(); i-- != 0;) rank[pr.seen2Node[i]] = pr.rank[i] / pr.pNorm;
+
+ BinIO.storeDoubles(rank, rankBasename +".ranks");
+ Properties prop = new Properties();
+ prop.setProperty("rank.alpha", Double.toString(pr.alpha));
+ prop.setProperty("graph.fileName", graphBasename);
+ prop.setProperty("root", pr.root);
+ prop.setProperty("fifo", jsapResult.userSpecified("fifo"));
+ prop.save(rankBasename + ".properties");
+
+ System.err.println(" done.");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PowerSeries.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PowerSeries.java
new file mode 100644
index 0000000..cac53b0
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/PowerSeries.java
@@ -0,0 +1,497 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.commons.configuration.ConfigurationException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.law.util.KahanSummation;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+// RELEASE-STATUS: DIST
+
+/** Computes a power series on a graph using a parallel implementation.
+ *
+ * <p>This class is a generic power series approximator. It computes iteratively finite truncations of power series of the form
+ * <div style="text-align: center">
+ * <var><b>v</b></var> <big>&Sigma;</big><sub><var>k</var> &ge; 0</sub> (&alpha;<var>M</var>)<sup><var>k</var></sup>.
+ * </div>
+ * where <var><b>v</b></var> is a <em>{@linkplain #preference preference vector}</em> that defaults to <b>1</b>, and
+ * <var>M</var> is the graph adjacency matrix, possibly with stochasticised rows if {@link #markovian} is true.
+ * Note that the {@link #step()} method is not available: due to the need for some synchronization logic, only {@link #stepUntil(StoppingCriterion)}
+ * is available.
+ *
+ * <p><strong>Warning</strong>: Since we need to enumerate the <em>predecessors</em> a node,
+ * you must pass to the {@linkplain #PowerSeries(ImmutableGraph, int, Logger) constructor} the <strong>transpose</strong>
+ * of the graph.
+
+ * <p>This class can be run using two different stopping criteria:
+ * <ul>
+ * <li>{@link #MAX_RATIO_STOPPING_CRITERION} stops when the maximum ratio between a component of the vector given by
+ * the previous approximation multiplied by <var>M</var> and the respective component in the previous approximation
+ * is smaller than the reciprocal of {@linkplain #alpha &alpha;};
+ * <li>{@link SpectralRanking.NormStoppingCriterion} stops when {@link #normDelta()},
+ * which returns
+ * the &#x2113;<sub>&#x221E;</sub> norm of the difference between the two last approximations,
+ * is below a specified threshold.
+ * </ul>
+ *
+ * <p>In the first case, this class computes <em>suitable vectors</em> that can be used to control
+ * the error of Gau&szlig;&ndash;Seidel's method applied to {@linkplain KatzParallelGaussSeidel Katz's index} or {@linkplain PageRankParallelGaussSeidel PageRank}.
+ * Details about the method are described by Sebastiano Vigna in &ldquo;<a href="http://vigna.di.unimi.it/papers.php#VigSNCSASOM">Supremum-Norm Convergence for Step-Asynchronous Successive Overrelaxation on M-matrices</a>&ldquo;, 2014.
+ *
+ * <p>In the second case, we compute Katz's index, or a pseudorank divided by 1 &minus; &alpha; (in both cases, the computation converges
+ * more slowly than using {@link KatzParallelGaussSeidel} or even {@link PageRankParallelPowerSeries}, so
+ * this feature is of marginal interest).
+ *
+ * <p>At the end of the computation, {@link #scale} contains the scaling that has been applied to the {@linkplain #rank result}
+ * so that it is normalized in &#x2113;<sub>&#x221E;</sub> norm. It is possible to obtain the unscaled result dividing all components of
+ * the results by {@link #scale}. Note that when using the {@link #MAX_RATIO_STOPPING_CRITERION} if the parameter {@linkplain #alpha} is not smaller
+ * than the reciprocal of the dominant eigenvalue the computation will stop with an error either because a lower bound proves this fact, or because
+ * the scale will go below {@link #MIN_SCALE} (the latter event might also be due to a very bad non-normal transient behaviour, but
+ * this shouldn't happen with real, non-pathological data).
+ *
+ * <p>During the computation, the maximum and minimum ratios between a component of the vector given by
+ * the previous approximation multiplied by <var>M</var> and the respective component in the previous approximation are printed;
+ * they provide upper and lower bounds to the dominant eigenvalue by Collatz's theorem. At the end of the computation,
+ * the current bounds can be found in {@link #maxRatio} and {@link #minRatio}.
+ *
+ * <p><strong>Warning</strong>: if the computation stops because of the {@link #MAX_RATIO_STOPPING_CRITERION},
+ * the vector suitable for {@link #maxRatio} is stored in {@link #previousRank}, not in {@link #rank rank}, as the maximum ratio was evaluated
+ * for {@link #previousRank} while {@link #rank rank} was computed. Moreover, if you provided a {@linkplain #preference preference vector}
+ * with some zero component, you <strong>must</strong> check manually that the suitable vector obtained contains no zero entries.
+ *
+ * @see SpectralRanking
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class PowerSeries extends SpectralRanking {
+ private final static Logger LOGGER = LoggerFactory.getLogger(PowerSeries.class);
+
+ /** Below this scale, we stop the iterative process. */
+ public final static double MIN_SCALE = 1E-300;
+
+ private final static class MaxRatioStoppingCriterion implements StoppingCriterion {
+ public boolean shouldStop(final SpectralRanking spectralRanking) {
+ if (! (spectralRanking instanceof PowerSeries)) throw new IllegalArgumentException(MaxRatioStoppingCriterion.class.getName() + " can be used with instances of " + PowerSeries.class.getName() + " only.");
+ final PowerSeries powerSeries = (PowerSeries)spectralRanking;
+ powerSeries.logger.info("Current max ratio: " + powerSeries.maxRatio + " (will stop below " + 1 / powerSeries.alpha + ")");
+ return powerSeries.maxRatio * powerSeries.alpha < 1;
+ }
+ }
+
+ /** A stopping criterion that stops when {@link PowerSeries#maxRatio} is smaller than the reciprocal of {@link PowerSeries#alpha}.
+ *
+ * <p>Note that this criterion can be applied to instances of {@link PowerSeries} only.
+ */
+ public static StoppingCriterion MAX_RATIO_STOPPING_CRITERION = new MaxRatioStoppingCriterion();
+
+ /** If {@link #markovian} is true, the outdegrees. */
+ private int[] outdegree;
+ /** A progress logger monitoring each iteration. */
+ private final ProgressLogger progressLogger;
+ /** A progress logger monitoring the iterations. */
+ private final ProgressLogger iterationLogger;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be picked. */
+ private final AtomicLong nextNode;
+ /** If true, the computation is over. */
+ private volatile boolean completed;
+ /** If true, the computation was interrupted by the detection of an error condition. */
+ private volatile boolean interrupted;
+ /** The barrier used to synchronize threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in threads. */
+ private volatile Throwable threadThrowable;
+ /** The accumulator for the supremum norm of {@link #rank}. */
+ private double norm;
+ /** The accumulator for the supremum norm of {@link #rank} minus {@link #previousRank}. */
+ private double normDelta;
+ /** The maximum ratio between components. */
+ public double maxRatio;
+ /** The minimum ratio between components. */
+ public double minRatio;
+ /** The attenuation factor. Must be smaller than the reciprocal of the dominant eigenvalue. */
+ public double alpha;
+ /** If true, the matrix adjacency graph will be stochasticised, thus computing a pseudorank. */
+ public boolean markovian;
+ /** The overall scaling that has been applied to the current approximation. */
+ public double scale;
+ /** The preference vector to be used (or {@code null} if the uniform preference vector should be used). */
+ public DoubleList preference;
+ /** The approximation obtained after the last iteration (only meaningful after at least one step). */
+ public double[] previousRank;
+
+ /** Creates a new instance.
+ *
+ * @param transpose the tranpose of the graph on which the power series must be computed.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PowerSeries(final ImmutableGraph transpose, final int requestedThreads, final Logger logger) {
+ super(transpose, logger);
+ progressLogger = new ProgressLogger(logger, "nodes");
+ iterationLogger = new ProgressLogger(logger, "iterations");
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ nextNode = new AtomicLong();
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the tranpose of the graph on which the power series must be computed.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ */
+ public PowerSeries(final ImmutableGraph transpose, final int requestedThreads) {
+ this(transpose, requestedThreads, LOGGER);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the tranpose of the graph on which the power series must be computed.
+ * @param logger a logger that will be passed to <code>super()</code>.
+ */
+ public PowerSeries(final ImmutableGraph transpose, final Logger logger) {
+ this(transpose, 0, logger);
+ }
+
+ /** Creates a new instance.
+ *
+ * @param transpose the tranpose of the graph on which the power series must be computed.
+ */
+ public PowerSeries(final ImmutableGraph transpose) {
+ this(transpose, 0);
+ }
+
+ @Override
+ public void init() throws IOException {
+ super.init();
+ if (alpha == 0) throw new IllegalArgumentException("The attenuation factor must be nonzero");
+ logger.info("Attenuation factor: " + alpha);
+ maxRatio = Double.NEGATIVE_INFINITY;
+ minRatio = Double.POSITIVE_INFINITY;
+ normDelta = norm = 0;
+ interrupted = completed = false;
+ // Creates the arrays, if necessary
+ if (previousRank == null) previousRank = new double[n];
+
+ // Check the preference vector
+ if (preference != null) {
+ if (preference.size() != n) throw new IllegalArgumentException("The preference vector size (" + preference.size() + ") is different from graph dimension (" + n + ").");
+ logger.info("Using a specified preference vector");
+ for(int i = n; i-- != 0;) rank[i] = preference.getDouble(i);
+ scale = 1 / Norm.L_INFINITY.compute(rank);
+ for(int i = n; i-- != 0;) rank[i] *= scale;
+ }
+ else {
+ logger.info("Using the uniform preference vector");
+ scale = 1;
+ Arrays.fill(rank, 1);
+ }
+
+ if (markovian && outdegree == null) {
+ // We allocate and compute the outdegree vector.
+ outdegree = new int[n];
+ // TODO: refactor using .outdegrees().
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Computing outdegrees...");
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ final int[] pred = nodeIterator.successorArray();
+ for (int d = nodeIterator.outdegree(); d-- != 0;) outdegree[pred[d]]++;
+ progressLogger.lightUpdate();
+ }
+
+ progressLogger.done();
+ }
+
+ logger.info("Completed.");
+ iterationLogger.start();
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 10000;
+
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final ImmutableGraph graph = PowerSeries.this.graph.copy();
+ final int n = PowerSeries.this.n;
+ final KahanSummation s = new KahanSummation();
+ final boolean markovian = PowerSeries.this.markovian;
+ final int[] outdegree = PowerSeries.this.outdegree;
+ final double alpha = PowerSeries.this.alpha;
+ final DoubleList preference = PowerSeries.this.preference;
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final double[] oldRank = rank, newRank = previousRank;
+
+ final double scale = PowerSeries.this.scale;
+ double norm = 0, normDelta = 0, maxRatio = Double.NEGATIVE_INFINITY, minRatio = Double.POSITIVE_INFINITY;
+
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = nextNode.getAndAdd(GRANULARITY);
+ if (start >= n) {
+ nextNode.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(n, start + GRANULARITY));
+
+ // for each node, enumerate predecessors and compute an updated value
+ final NodeIterator nodeIterator = graph.nodeIterator((int)start);
+
+ for(int i = (int)start; i < end; i++) {
+ nodeIterator.nextInt();
+ int indegree = nodeIterator.outdegree();
+ s.reset();
+
+ if (indegree != 0) {
+ final int[] pred = nodeIterator.successorArray();
+ if (markovian) while (indegree-- != 0) s.add(oldRank[pred[indegree]] / outdegree[pred[indegree]]);
+ else while (indegree-- != 0) s.add(oldRank[pred[indegree]]);
+ }
+
+ final double t = alpha * s.value() + (preference != null ? scale * preference.getDouble(i) : scale);
+ newRank[i] = t;
+ norm = Math.max(norm, t);
+ normDelta = Math.max(normDelta, Math.abs(t - oldRank[i]));
+ if (oldRank[i] != 0) {
+ final double ratio = s.value() / oldRank[i];
+ maxRatio = Math.max(maxRatio, ratio);
+ minRatio = Math.min(minRatio, ratio);
+ }
+ }
+
+ synchronized (progressLogger) {
+ progressLogger.update(end - start);
+ }
+
+ }
+
+ synchronized(PowerSeries.this) {
+ PowerSeries.this.normDelta = Math.max(PowerSeries.this.normDelta, normDelta);
+ PowerSeries.this.norm = Math.max(PowerSeries.this.norm, norm);
+ PowerSeries.this.maxRatio = Math.max(PowerSeries.this.maxRatio, maxRatio);
+ PowerSeries.this.minRatio = Math.min(PowerSeries.this.minRatio, minRatio);
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+ @Override
+ public void step() throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ @Override
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+
+ @Override
+ public void run() {
+ if (iteration > 0) {
+ progressLogger.done();
+
+ final double t[] = rank;
+ rank = previousRank;
+ previousRank = t;
+
+ final double s = 1 / norm;
+ for(int i = n; i-- != 0;) {
+ // We must keep rank and previousRank scaled in the same way.
+ rank[i] *= s;
+ previousRank[i] *= s;
+ }
+ scale *= s;
+
+ logger.info("Scale: " + scale + "; min ratio: " + minRatio + "; max ratio: " + maxRatio);
+
+ if (maxRatio == Double.NEGATIVE_INFINITY) {
+ logger.error("The preference vector is null");
+ interrupted = completed = true;
+ return;
+ }
+
+ if (alpha * minRatio >= 1) {
+ logger.error("The current lower bound for the spectral radius (" + minRatio + ") is larger than or equal to the inverse of the attenuation factor (" + 1 / alpha + ")");
+ interrupted = completed = true;
+ return;
+ }
+
+ if (scale < MIN_SCALE) {
+ logger.error("Scale went below " + MIN_SCALE + ": " + 1 / alpha + " is likely to be larger than or equal to the spectral radius (current estimate: [" + minRatio + ".." + maxRatio + "])");
+ interrupted = completed = true;
+ return;
+ }
+
+ iterationLogger.setAndDisplay(iteration);
+
+ if (stoppingCriterion.shouldStop(PowerSeries.this)) {
+ completed = true;
+ return;
+ }
+ }
+
+ maxRatio = normDelta = norm = 0;
+ minRatio = Double.POSITIVE_INFINITY;
+ nextNode.set(0);
+ progressLogger.expectedUpdates = n;
+ progressLogger.start("Iteration " + (iteration++) + "...");
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (interrupted) throw new RuntimeException("Computation interrupted.");
+ if (progressLogger != null) progressLogger.done();
+
+ iterationLogger.done();
+ }
+
+ @Override
+ public double normDelta() {
+ return normDelta;
+ }
+
+ @Override
+ public void clear() {
+ super.clear();
+ previousRank = null;
+ outdegree = null;
+ }
+
+ /**
+ * Returns a Properties object that contains all the parameters used by the computation.
+ *
+ * @param graphBasename file name of the graph
+ * @param preferenceFilename file name of preference vector. It can be {@code null}.
+ * @return a properties object that represent all the parameters used to calculate the rank.
+ */
+ public Properties buildProperties(String graphBasename, String preferenceFilename) {
+ final Properties prop = super.buildProperties(graphBasename);
+ prop.setProperty("alpha", Double.toString(alpha));
+ prop.setProperty("maxratio", maxRatio);
+ prop.setProperty("minratio", minRatio);
+ prop.setProperty("normdelta", normDelta);
+ prop.setProperty("markovian", markovian);
+ prop.setProperty("scale", Double.toString(scale));
+ prop.setProperty("preferencefilename", preferenceFilename);
+ return prop;
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException, ConfigurationException, ClassNotFoundException {
+
+ SimpleJSAP jsap = new SimpleJSAP(PowerSeries.class.getName(), "Computes a power series on a graph, given its transpose, using a parallel implementation."
+ + " Alternatively, computes a suitable vector for the graph adjacency matrix, possibly stochasticised."
+ + " The file <rankBasename>.properties stores metadata about the computation, whereas the file <rankBasename>.ranks stores the result as a sequence of doubles in DataInput format.",
+ new Parameter[] {
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'a', "alpha", "Attenuation factor (must be smaller than the dominant eigenvalue)."),
+ new Switch("markovian", 'M', "markovian", "Stochasticise the matrix."),
+ new Switch("suitable", 'S', "suitable", "Compute a vector suitable for attenuation factors smaller than alpha."),
+ new FlaggedOption("maxIter", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_MAX_ITER), JSAP.NOT_REQUIRED, 'i', "max-iter", "Maximum number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(DEFAULT_THRESHOLD), JSAP.NOT_REQUIRED, 't', "threshold", "Threshold to determine whether to stop (not used for suitable vectors)."),
+ new FlaggedOption("preferenceVector", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "preference-vector", "A preference vector stored as a vector of binary doubles."),
+ new FlaggedOption("preferenceObject", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'P', "preference-object", "A preference vector stored as a serialised DoubleList."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("transposeBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose of the graph."),
+ new UnflaggedOption("rankBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final double alpha = jsapResult.getDouble("alpha");
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final boolean suitable = jsapResult.getBoolean("suitable");
+ final String transposeBasename = jsapResult.getString("transposeBasename");
+ final String rankBasename = jsapResult.getString("rankBasename");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+
+ final ImmutableGraph transpose = mapped? ImmutableGraph.loadMapped(transposeBasename, progressLogger) : ImmutableGraph.load(transposeBasename, progressLogger);
+
+ DoubleList preference = null;
+ String preferenceFilename = null;
+ if (jsapResult.userSpecified("preferenceVector"))
+ preference = DoubleArrayList.wrap(BinIO.loadDoubles(preferenceFilename = jsapResult.getString("preferenceVector")));
+
+ if (jsapResult.userSpecified("preferenceObject")) {
+ if (jsapResult.userSpecified("preferenceVector")) throw new IllegalArgumentException("You cannot specify twice the preference vector");
+ preference = (DoubleList)BinIO.loadObject(preferenceFilename = jsapResult.getString("preferenceObject"));
+ }
+
+ PowerSeries pr = new PowerSeries(transpose, threads, LOGGER);
+ pr.alpha = alpha;
+ pr.markovian = jsapResult.getBoolean("markovian");
+ pr.preference = preference;
+
+ pr.stepUntil(or(suitable ? MAX_RATIO_STOPPING_CRITERION : new SpectralRanking.NormStoppingCriterion(jsapResult.getDouble("threshold")), new SpectralRanking.IterationNumberStoppingCriterion(jsapResult.getInt("maxIter"))));
+
+ BinIO.storeDoubles(suitable ? pr.previousRank : pr.rank, rankBasename + ".ranks");
+ pr.buildProperties(transposeBasename, preferenceFilename).save(rankBasename + ".properties");
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/SpectralRanking.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/SpectralRanking.java
new file mode 100644
index 0000000..ac886ac
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/SpectralRanking.java
@@ -0,0 +1,299 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.IOException;
+
+import org.slf4j.Logger;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.law.stat.KendallTau;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.util.Properties;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+// RELEASE-STATUS: DIST
+
+/** A base abstract class defining methods and attributes supporting computations
+ * of graph spectral rankings such as {@linkplain DominantEigenvectorParallelPowerMethod the dominant eigenvector},
+ * {@linkplain PageRank PageRank} or {@linkplain PowerSeries Katz's index}. For some elaboration on the
+ * relationships between these rankings, see &ldquo;<a href="http://arxiv.org/abs/0912.0238">Spectral Ranking</a>&rdquo; by Sebastiano Vigna.
+ *
+ * <P>The usage pattern for concrete subclasses is as follows: first create an instance specifying the graph over which the spectral ranking should be computed.
+ * Then, modify public available attributes to fine-tune the ranking algorithm. Finally,
+ * <UL>
+ * <LI>either call the {@link #init()} method, which initializes the state, and then repeatedly call the
+ * {@link #step()} method; every call will compute the next approximation; the current approximation is contained
+ * in the {@link #rank} attribute;
+ * <LI>or call the {@link #stepUntil(SpectralRanking.StoppingCriterion)} method, which calls {@link #init()} and then iterates
+ * the {@link #step()} method until a certain stopping criterion is met; a {@link SpectralRanking.StoppingCriterion}
+ * is a class that decides whether the iteration can be stopped.
+ * The {@link SpectralRanking} class provides two ready-to-use implementations of stopping criteria:
+ * <UL>
+ * <LI>{@linkplain SpectralRanking.NormStoppingCriterion one} that decides to stop depending on whether
+ * the value returned by {@link #normDelta()} (if implemented) is smaller than a certain threshold.
+ * <LI>{@linkplain SpectralRanking.IterationNumberStoppingCriterion another} that decides to stop on the basis of the number of iterations.
+ * </UL>
+ *
+ * <P>Moreover, this class provides two static methods that compose stopping criteria in a {@linkplain #and(SpectralRanking.StoppingCriterion, SpectralRanking.StoppingCriterion) conjunctive}
+ * or {@linkplain #or(SpectralRanking.StoppingCriterion, SpectralRanking.StoppingCriterion) disjunctive} way.
+ * </UL>
+ *
+ * <P>At any time, the user may re-initialize the computation, by calling the {@link #init()} method, or call
+ * {@link #clear()} to get rid of the large arrays that the implementing subclasses usually manage. In the latter case, the arrays
+ * are rebuilt on the next call to {@link #init()}.
+ *
+ * <h2>Choosing a threshold</h2>
+ *
+ * <p>The stopping threshold used by {@link NormStoppingCriterion} should be set so to obtain a reasonable number
+ * of significant digits. In some cases this requires to adapt the threshold to the graph: for instance, {@link PageRank}
+ * is a stochastic vector, so its entries tend to be very small (of order 1/<var>n</var>, where <var>n</var> is the number of
+ * nodes in the graph). You should be wary about digits that are not significant, as they can lead to very biased results when comparing using
+ * {@linkplain KendallTau Kendall's &tau;} rankings with a significant amount of ties (see &ldquo;Traps and pitfalls of topic-biased PageRank&rdquo;, by
+ * Paolo Boldi, Roberto Posenato, Massimo Santini, and Sebastiano Vigna,
+ * <i>WAW 2006. Fourth Workshop on Algorithms and Models for the Web-Graph</i>, volume 4936 of Lecture Notes in Computer Science, pages 107&minus;116, Springer, 2008).
+ *
+ * <p>Note that, depending on the implementation, {@link #normDelta()} might return a bound on the norm of the difference
+ * between the current iterate and the target ranking (e.g., {@link PageRankPowerSeries} and {@link PageRankParallelPowerSeries},
+ * or any {@linkplain KatzParallelGaussSeidel Gau&szlig;&ndash;Seidel implementation} using a {@linkplain KatzParallelGaussSeidel#normVector(double[], double) <var><b>w</b></var>-norm}), a tentative estimate
+ * of the same bound (e.g., {@link DominantEigenvectorParallelPowerMethod})
+ * or simply the norm of the difference between two successive iterates. Precision up to the
+ * chosen threshold is guaranteed only in the first case.
+ *
+ * @author Sebastiano Vigna
+ */
+
+public abstract class SpectralRanking {
+ /** Default threshold (note that this value is used as a default by main methods). */
+ public final static double DEFAULT_THRESHOLD = 1E-6;
+ /** Default maximum number of iterations (note that this value is used as a default by main methods). */
+ public final static int DEFAULT_MAX_ITER = Integer.MAX_VALUE;
+ /** The default norm ({@link Norm#L_INFINITY}). */
+ public final static Norm DEFAULT_NORM = Norm.L_INFINITY;
+ /** The admitted tolerance in the {@linkplain #isStochastic(DoubleList) verification that a vector is a stochastic one}.
+ * A stochastic vector is nonnegative and has &#x2113;<sub>1</sub> norm equal to 1 &plusmn; {@link #STOCHASTIC_TOLERANCE}. */
+ protected final static double STOCHASTIC_TOLERANCE = 1E-6;
+
+ /** The graph. */
+ public final ImmutableGraph graph;
+ /** The number of nodes of {@link #graph}, cached. */
+ public final int n;
+ /** A logger defined by the implementing subclasses. */
+ public final Logger logger;
+ /** The current rank vector. */
+ public double[] rank;
+ /** The current step (0 after {@linkplain #init() initialization}). */
+ public int iteration;
+
+ /** Creates a new instance.
+ *
+ * @param graph the graph.
+ * @param logger a logger.
+ */
+ public SpectralRanking(final ImmutableGraph graph, final Logger logger) {
+ this.graph = graph;
+ this.logger = logger;
+ this.n = graph.numNodes();
+ logger.info("Nodes: " + n);
+ }
+
+
+ /** A a strategy that decides when a computation should be stopped. */
+ public interface StoppingCriterion {
+ /** Determines if the computation should be stopped.
+ *
+ * @param spectralRanking the instance incapsulating the computation.
+ * @return true if the computation should be stopped.
+ */
+ public boolean shouldStop(SpectralRanking spectralRanking);
+ };
+
+ /** A stopping criterion that stops whenever the number of iterations exceeds a given bound. */
+ public static class IterationNumberStoppingCriterion implements StoppingCriterion {
+ private final int maxIter;
+ /** Creates an instance with a given number of iterations.
+ *
+ * @param maxIter the maximum number of iterations.
+ */
+ public IterationNumberStoppingCriterion(final int maxIter) {
+ this.maxIter = maxIter;
+ }
+
+ public boolean shouldStop(final SpectralRanking spectralRanking) {
+ // If maxIter is infinity, we just return.
+ if (maxIter == Integer.MAX_VALUE) return false;
+ spectralRanking.logger.info("Iterations performed: " + spectralRanking.iteration + " (will stop after " + maxIter + ")");
+ return spectralRanking.iteration >= maxIter;
+ }
+ }
+
+ /** A stopping criterion that evaluates {@link SpectralRanking#normDelta()}, and stops
+ * if this value is smaller than a given threshold.
+ *
+ * <p>Note that this criterion assumes {@link SpectralRanking#normDelta()} has been properly implemented.
+ */
+ public static class NormStoppingCriterion implements StoppingCriterion {
+ private final double threshold;
+
+ /** Creates an instance with given threshold.
+ *
+ * @param threshold the threshold.
+ */
+ public NormStoppingCriterion(final double threshold) {
+ this.threshold = threshold;
+ }
+
+ public boolean shouldStop(final SpectralRanking spectralRanking) {
+ spectralRanking.logger.info("Current norm delta: " + spectralRanking.normDelta() + " (will stop below " + threshold + ")");
+ return spectralRanking.normDelta() < threshold;
+ }
+ }
+
+ /** Composes two stopping criteria, producing a single stopping criterion (the computation stops iff both
+ * conditions become true; lazy boolean evaluation is applied).
+ *
+ * @param stop1 a stopping criterion.
+ * @param stop2 a stopping criterion.
+ * @return a criterion that decides to stop as soon as both criteria are satisfied.
+ */
+ public static StoppingCriterion and(final StoppingCriterion stop1, final StoppingCriterion stop2) {
+ return new StoppingCriterion() {
+ public boolean shouldStop(final SpectralRanking p) {
+ return stop1.shouldStop(p) && stop2.shouldStop(p);
+ }
+ };
+ }
+
+ /** Composes two stopping criteria, producing a single stopping criterion (the computation stops iff either
+ * condition becomes true; lazy boolean evaluation is applied).
+ *
+ * @param stop1 a stopping criterion.
+ * @param stop2 a stopping criterion.
+ * @return a criterion that decides to stop as soon as one of the two criteria is satisfied.
+ */
+ public static StoppingCriterion or(final StoppingCriterion stop1, final StoppingCriterion stop2) {
+ return new StoppingCriterion() {
+ public boolean shouldStop(final SpectralRanking p) {
+ return stop1.shouldStop(p) || stop2.shouldStop(p);
+ }
+ };
+ }
+
+ /** Commodity method checking whether a vector is stochastic (nonnegative entries summing up to one within {@link #STOCHASTIC_TOLERANCE}).
+ *
+ * <p>This method uses <a href="http://en.wikipedia.org/wiki/Kahan_summation_algorithm">Kahan's summation algorithm</a>.
+ *
+ * @param v the vector to check.
+ * @return true if the vector is stochastic.
+ */
+ protected static boolean isStochastic(DoubleList v) {
+ double normL1 = 0.0, c = 0.0, t, y;
+ int i;
+ //Kahan method to minimize the round errors in doubles sum.
+ for (i = v.size(); i-- != 0 && v.getDouble(i) >= 0;) {
+ y = v.getDouble(i) - c;
+ t = (normL1 + y);
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ return (i == -1 && Math.abs(normL1 - 1.0) <= STOCHASTIC_TOLERANCE);
+ }
+
+ /** Returns a {@link Properties} object that contains all parameters used by the computation.
+ *
+ * <p>Implementing subclasses should extends this method by calling <code>super()</code>
+ * and setting additional properties on the resulting {@link Properties}.
+ *
+ * @param graphBasename basename of the graph
+ * @return a properties object that represent all the parameters used to calculate the ranking.
+ */
+ public Properties buildProperties(final String graphBasename) {
+ final Properties prop = new Properties();
+ prop.setProperty("iterations", iteration);
+ prop.setProperty("normdelta", Double.toString(normDelta()));
+ prop.setProperty("nodes", n);
+ prop.setProperty("graph", graphBasename);
+ return prop;
+ }
+
+ /** Initializes the rank vector, zeroes {@link #iteration} and logs basic data. Please extend this method to handle additional attributes. */
+ @SuppressWarnings("unused")
+ public void init() throws IOException {
+ logger.info("Initializing...");
+ iteration = 0;
+ // Creates the array, if necessary
+ if (rank == null) rank = new double[n];
+ }
+
+
+ /** Performs one computation step. */
+ public abstract void step() throws IOException;
+
+ /** Returns the norm of an estimation of the distance to the limit of the iterative process: depending
+ * on the implementation, this can be an actual bound or, for example, just the difference between the
+ * last two approximations.
+ *
+ * <p>This method must be implemented by concrete subclasses if you want to use {@link NormStoppingCriterion}.
+ *
+ * @return the norm of an estimation of the distance to the limit.
+ * @throws IllegalStateException if called before the first iteration.
+ * @throws UnsupportedOperationException if it is not possible to compute a norm.
+ */
+ public double normDelta() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Calls {@link #init()} and steps until a given stopping criterion is met.
+ * The criterion is checked <i>a posteriori</i> (i.e., after each step); this means that
+ * at least one step is performed.
+ *
+ * @param stoppingCriterion the stopping criterion to be used.
+ */
+ public void stepUntil(final StoppingCriterion stoppingCriterion) throws IOException {
+ init();
+ do step(); while (!stoppingCriterion.shouldStop(this));
+ }
+
+ /** Clears all data and releases resources by nulling {@link #rank} (i.e., results we no longer be available).
+ * Please extend this method to handle additional attributes. */
+ public void clear() {
+ rank = null;
+ }
+
+ /** Returns a compact logarithmic approximation of a norm vector.
+ *
+ * @param doubleIterator an iterator enumerating a norm vector.
+ * @return an array of bytes containing the opposite of a lower bound on the binary logarithm of the doubles returned by the iterator.
+ */
+ protected byte[] approximateNormVector(final DoubleIterator doubleIterator) {
+ final byte[] normVector = new byte[n];
+ for(int i = 0; i < n; i++) {
+ final double e = doubleIterator.nextDouble();
+ if (e == 0) throw new IllegalArgumentException("A norm vector cannot contain zeroes");
+ if (e > 1) throw new IllegalArgumentException("The norm vector contains an entry larger than one: " + e);
+ final int approx = (int)Math.ceil(- Math.log(e) / Math.log(2));
+ if (approx > 62) throw new IllegalArgumentException("The norm vector has an entry smaller than 1/2^62 (" + e + ")");
+ normVector[i] = (byte)approx;
+ }
+
+ return normVector;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/package.html
new file mode 100644
index 0000000..8fa6e2f
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/rank/package.html
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>Spectral Ranking</title>
+ </head>
+
+ <body>
+
+ <P>Computation of {@linkplain it.unimi.dsi.law.rank.SpectralRanking spectral rankings} and associated utilities.
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/AveragePrecisionCorrelation.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/AveragePrecisionCorrelation.java
new file mode 100644
index 0000000..b7e2934
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/AveragePrecisionCorrelation.java
@@ -0,0 +1,193 @@
+package it.unimi.dsi.law.stat;
+
+import java.io.IOException;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.law.util.Precision;
+
+// RELEASE-STATUS: DIST
+
+/** Computes the AP (average-precision) correlation between two score vectors without ties. More precisely,
+ * this class computes the formula given by
+ * Emine Yilmaz, Javed A. Aslam, and Stephen Robertson in &ldquo;A new rank correlation coefficient for information retrieval&rdquo;,
+ * <i>Proc. of the 31st annual international ACM SIGIR conference on Research and development in information retrieval</i>,
+ * pages 587&minus;594, ACM, 2008,
+ * using the algorithm described by
+ * Sebastiano Vigna in &ldquo;<a href="http://vigna.di.unimi.it/papers.php#VigWCIRT">A weighted correlation index for rankings
+ * with ties</a>&rdquo;, 2014.
+ *
+ * <p>This class is a singleton: methods must be invoked on {@link #INSTANCE}.
+ * Additional methods inherited from {@link CorrelationIndex} make it possible to
+ * compute directly AP correlation bewteen two files, to bound the number of significant digits, or
+ * to reverse the standard association between scores and ranks (by default,
+ * a larger score corresponds to a higher rank, i.e., to a smaller rank index; the largest score gets
+ * rank 0).
+ *
+ * <p>A main method is provided for command-line usage.
+ */
+
+public class AveragePrecisionCorrelation extends CorrelationIndex {
+ private AveragePrecisionCorrelation() {}
+
+ /** The singleton instance of this class. */
+ public static final AveragePrecisionCorrelation INSTANCE = new AveragePrecisionCorrelation();
+
+ private static final class ExchangeWeigher {
+ /** A support array used by MergeSort. */
+ private final int[] temp;
+ /** The first score vector. */
+ private final double[] v0;
+ /** An array of integers, initially sorted by the second score vector. */
+ private final int perm[];
+ /** The inverse of {@link #perm}.*/
+ private final int[] rank;
+
+ public ExchangeWeigher(final double v0[], double[] v1) {
+ final int length = v0.length;
+ this.v0 = v0;
+ this.temp = new int[length];
+ perm = Util.identity(length);
+ // First of all we sort perm stably by the first rank vector (higher ranks come first!).
+ DoubleArrays.radixSortIndirect(perm, v1, true);
+ IntArrays.reverse(perm);
+ // We null v1 so that the garbage collector can reclaim it.
+ v1 = null;
+ rank = Util.invertPermutation(perm);
+ }
+
+ public double weigh() {
+ return weigh(0, perm.length);
+ }
+
+ /** Orders an array fragment of {@link #perm} and returns the weight of the necessary exchanges.
+ *
+ * @param offset the starting element of the array fragment.
+ * @param length the number of elements of the array fragment.
+ * @return the weight of exchanges used to sort the array fragment.
+ */
+ private double weigh(final int offset, final int length) {
+ /* Using a non-recursive sort for small subarrays gives no noticeable
+ * improvement, as most of the cost is given by floating-point computations. */
+ if (length == 1) return 0;
+
+ final int length0 = length / 2, length1 = length - length / 2, middle = offset + length0;
+ double weight = weigh(offset, length0);
+ weight += weigh(middle, length1);
+
+ /* If the last element of the first subarray is larger than or equal to the first element of
+ * the second subarray, there is nothing to do. */
+ if (v0[perm[middle - 1]] < v0[perm[middle]]) {
+ // We merge the lists into temp, adding the number of forward moves to concordances.
+ int i = 0, j = 0, k = 0;
+ while(j < length0 && k < length1) {
+ //System.err.println("j: " + j + " k: " + k + " " + v1[perm[offset + j]] + " <> " + v1[perm[middle + k]]);
+ if (v0[perm[offset + j]] > v0[perm[middle + k]]) {
+ //System.err.println(v1[perm[offset + j]] + " > " + v1[perm[middle + k]] + " -> "+residual);
+ temp[i] = perm[offset + j++];
+ }
+ else {
+ temp[i] = perm[middle + k++];
+ weight += (length0 - j) * (1. / rank[temp[i]]);
+ }
+ i++;
+ }
+
+ System.arraycopy(perm, offset + j, perm, offset + i, length0 - j);
+ System.arraycopy(temp, 0, perm, offset, i);
+ }
+
+ return weight;
+ }
+ }
+
+
+ /** Computes AP correlation between two score vectors.
+ *
+ * <p>Note that this method must be called with some care. More precisely, the two
+ * arguments should be built on-the-fly in the method call, and not stored in variables,
+ * as the first argument array will be {@code null}'d during the execution of this method
+ * to free some memory: if the array is referenced elsewhere the garbage collector will not
+ * be able to collect it.
+ *
+ * @param v0 the first score vector.
+ * @param v1 the second score vector (inducing the reference ranking).
+ * @return AP correlation.
+ */
+ public double compute(double v0[], final double v1[]) {
+ if (v0.length != v1.length) throw new IllegalArgumentException("Array lengths differ: " + v0.length + ", " + v1.length);
+ final int length = v0.length;
+ if (length == 0) throw new IllegalArgumentException("AP correlation is undefined on empty score vectors");
+
+ final double e = new ExchangeWeigher(v0, v1).weigh() / (length - 1);
+
+ // Ensure interval [-1..1] (small deviations might appear because of numerical errors).
+ return Math.min(1, Math.max(-1, (1 - 2 * e)));
+ }
+
+ public static void main(String[] arg) throws NumberFormatException, IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(AveragePrecisionCorrelation.class.getName(),
+ "Computes the AP correlation between the score vectors contained in two given files. " +
+ "The two files must contain the same number of doubles, written " +
+ "in Java binary format. The option -t makes it possible to specify a different " +
+ "type (possibly for each input file)." +
+ "\n" +
+ "If one or more truncations are specified with the option -T, the values of " +
+ "AP correlation for the given files truncated to the given number of binary " +
+ "fractional digits, in the same order, will be printed to standard output." +
+ "If there is more than one value, the vectors will be loaded in memory just " +
+ "once and copied across computations.",
+ new Parameter[] {
+ new Switch("reverse", 'r', "reverse", "Use reversed ranks."),
+ new FlaggedOption("type", JSAP.STRING_PARSER, "double", JSAP.NOT_REQUIRED, 't', "type", "The type of the input files, of the form kind[:kind] where kind is one of int, long, float, double, text"),
+ new UnflaggedOption("file0", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The first rank file."),
+ new UnflaggedOption("file1", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The second rank file."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String f0 = jsapResult.getString("file0");
+ final String f1 = jsapResult.getString("file1");
+ final boolean reverse = jsapResult.userSpecified("reverse");
+ final Class<?>[] inputType = parseInputTypes(jsapResult);
+
+ int[] digits = jsapResult.getIntArray("digits");
+ if (digits.length == 0) digits = new int[] { Integer.MAX_VALUE };
+
+ if (digits.length == 1) System.out.println(INSTANCE.compute(f0, inputType[0], f1, inputType[1], reverse, digits[0]));
+ else {
+ final double[] v0 = loadAsDoubles(f0, inputType[0], reverse), v1 = loadAsDoubles(f1, inputType[1], reverse);
+ for(int d: digits) System.out.println(INSTANCE.compute(Precision.truncate(v0.clone(), d), Precision.truncate(v1.clone(), d)));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/CorrelationIndex.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/CorrelationIndex.java
new file mode 100644
index 0000000..365443e
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/CorrelationIndex.java
@@ -0,0 +1,344 @@
+package it.unimi.dsi.law.stat;
+
+import java.io.DataInputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+import com.martiansoftware.jsap.JSAPResult;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.DoubleIterators;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.util.Precision;
+
+// RELEASE-STATUS: DIST
+
+/** An abstract class providing basic infrastructure for all classes computing some correlation index between two score vectors,
+ * such as {@link KendallTau}, {@link WeightedTau} and {@link AveragePrecisionCorrelation}.
+ *
+ * <p>Implementing classes have just to implement {@link #compute(double[], double[])} to get a wealth of support method,
+ * including {@linkplain #loadAsDoubles(CharSequence, Class, boolean) loading data in different formats}
+ * and {@linkplain #parseInputTypes(JSAPResult) parsing file types}.
+ */
+
+public abstract class CorrelationIndex {
+
+ protected CorrelationIndex() {}
+
+ /** Computes the correlation between two score vectors.
+ *
+ * <p>Note that this method must be called with some care if you're right on memory. More precisely, the two
+ * arguments should be built on the fly in the method call, and not stored in variables,
+ * as the some of the argument arrays might be {@code null}'d during the execution of this method
+ * to free some memory: if the array is referenced elsewhere the garbage collector will not
+ * be able to collect it.
+ *
+ * @param v0 the first score vector.
+ * @param v1 the second score vector; in asymmetric correlation indices, this should be the reference score.
+ * @return the correlation.
+ */
+ public abstract double compute(double v0[], double v1[]);
+
+ /** Computes the correlation between two score vectors.
+ *
+ * @param f0 the binary file of doubles containing the first score vector.
+ * @param f1 the binary file of doubles containing the second score vector.
+ * @return the correlation.
+ */
+ public double computeDoubles(final CharSequence f0, final CharSequence f1) throws IOException {
+ return computeDoubles(f0, f1, Integer.MAX_VALUE);
+ }
+
+ /** Computes the correlation between two (possible reversed) score vectors.
+ *
+ * @param f0 the binary file of doubles containing the first score vector.
+ * @param f1 the binary file of doubles containing the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @return the correlation.
+ */
+ public double computeDoubles(final CharSequence f0, final CharSequence f1, final boolean reverse) throws IOException {
+ return computeDoubles(f0, f1, reverse, Integer.MAX_VALUE);
+ }
+
+ /** Computes the correlation between two score vectors with a given precision.
+ *
+ * @param f0 the binary file of doubles containing the first score vector.
+ * @param f1 the binary file of doubles containing the second score vector.
+ * @param digits the number of digits to be preserved when computing the correlation.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double computeDoubles(final CharSequence f0, final CharSequence f1, final int digits) throws IOException {
+ return computeDoubles(f0, f1, false, digits);
+ }
+
+ /** Computes the correlation between two (possible reversed) score vectors with a given precision.
+ *
+ * @param f0 the binary file of doubles containing the first score vector.
+ * @param f1 the binary file of doubles containing the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @param digits the number of digits to be preserved when computing the correlation.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double computeDoubles(final CharSequence f0, final CharSequence f1, final boolean reverse, final int digits) throws IOException {
+ return compute(f0, Double.class, f1, Double.class, reverse, digits);
+ }
+
+ /** Computes the correlation between two score vectors.
+ *
+ * @param f0 the binary file of floats containing the first score vector.
+ * @param f1 the binary file of floats containing the second score vector.
+ * @return the correlation.
+ */
+ public double computeFloats(final CharSequence f0, final CharSequence f1) throws IOException {
+ return computeFloats(f0, f1, Integer.MAX_VALUE);
+ }
+
+ /** Computes the correlation between two (possibly reversed) score vectors.
+ *
+ * @param f0 the binary file of floats containing the first score vector.
+ * @param f1 the binary file of floats containing the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @return the correlation.
+ */
+ public double computeFloats(final CharSequence f0, final CharSequence f1, final boolean reverse) throws IOException {
+ return computeFloats(f0, f1, reverse, Integer.MAX_VALUE);
+ }
+
+ /** Computes the correlation between two score vectors with a given precision.
+ *
+ * @param f0 the binary file of floats containing the first score vector.
+ * @param f1 the binary file of floats containing the second score vector.
+ * @param digits the number of digits to be preserved when computing the correlation.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double computeFloats(final CharSequence f0, final CharSequence f1, final int digits) throws IOException {
+ return computeFloats(f0, f1, false, digits);
+ }
+
+ /** Computes the correlation between two (possibly reversed) score vectors with a given precision.
+ *
+ * @param f0 the binary file of floats containing the first score vector.
+ * @param f1 the binary file of floats containing the second score vector.
+ * @param digits the number of digits to be preserved when computing the correlation.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double computeFloats(final CharSequence f0, final CharSequence f1, final boolean reverse, final int digits) throws IOException {
+ return compute(f0, Float.class, f1, Float.class, reverse, digits);
+ }
+
+ /** Computes the correlation between two score vectors.
+ *
+ * @param f0 the binary file of integers containing the first score vector.
+ * @param f1 the binary file of integers containing the second score vector.
+ * @return the correlation.
+ */
+ public double computeInts(final CharSequence f0, final CharSequence f1) throws IOException {
+ return computeInts(f0, f1, false);
+ }
+
+ /** Computes the correlation between two (possibly reversed) score vectors.
+ *
+ * @param f0 the binary file of integers containing the first score vector.
+ * @param f1 the binary file of integers containing the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @return the correlation.
+ */
+ public double computeInts(final CharSequence f0, final CharSequence f1, final boolean reverse) throws IOException {
+ return compute(f0, Integer.class, f1, Integer.class, reverse, Integer.MAX_VALUE);
+ }
+
+ /** Computes the correlation between two score vectors.
+ *
+ * @param f0 the binary file of longs containing the first score vector.
+ * @param f1 the binary file of longs containing the second score vector.
+ * @return the correlation.
+ */
+ public double computeLongs(final CharSequence f0, final CharSequence f1) throws IOException {
+ return computeLongs(f0, f1, false);
+ }
+
+ /** Computes the correlation between (possibly reversed) two score vectors.
+ *
+ * @param f0 the binary file of longs containing the first score vector.
+ * @param f1 the binary file of longs containing the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @return the correlation.
+ */
+ public double computeLongs(final CharSequence f0, final CharSequence f1, final boolean reverse) throws IOException {
+ return compute(f0, Long.class, f1, Long.class, reverse, Integer.MAX_VALUE);
+ }
+
+ /** Computes the correlation between two (possibly reversed) score vectors with a given precision.
+ *
+ * @param f0 the file containing the first score vector.
+ * @param inputType0 the input type of the first score vector.
+ * @param f1 the file containing the second score vector.
+ * @param inputType1 the input type of the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * @param digits the number of digits to be preserved when computing the correlation.
+ * they are assumed to be in binary format.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double compute(final CharSequence f0, final Class<?> inputType0, final CharSequence f1, final Class<?> inputType1, final boolean reverse, final int digits) throws IOException {
+ return compute(Precision.truncate(loadAsDoubles(f0, inputType0, reverse), digits), Precision.truncate(loadAsDoubles(f1, inputType1, reverse), digits));
+ }
+
+ /** Computes the correlation between two (possibly reversed) score vectors.
+ *
+ * @param f0 the file containing the first score vector.
+ * @param inputType0 the input type of the first score vector.
+ * @param f1 the file containing the second score vector.
+ * @param inputType1 the input type of the second score vector.
+ * @param reverse whether to reverse the ranking induced by the score vectors by loading opposite values.
+ * they are assumed to be in binary format.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double compute(final CharSequence f0, final Class<?> inputType0, final CharSequence f1, final Class<?> inputType1, final boolean reverse) throws IOException {
+ return compute(Precision.truncate(loadAsDoubles(f0, inputType0, reverse), Integer.MAX_VALUE), Precision.truncate(loadAsDoubles(f1, inputType1, reverse), Integer.MAX_VALUE));
+ }
+
+ /** Computes the correlation between two score vectors with a given precision.
+ *
+ * @param f0 the file containing the first score vector.
+ * @param inputType0 the input type of the first score vector.
+ * @param f1 the file containing the second score vector.
+ * @param inputType1 the input type of the second score vector.
+ * @param digits the number of digits to be preserved when computing the correlation.
+ * they are assumed to be in binary format.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double compute(final CharSequence f0, final Class<?> inputType0, final CharSequence f1, final Class<?> inputType1, final int digits) throws IOException {
+ return compute(Precision.truncate(loadAsDoubles(f0, inputType0, false), digits), Precision.truncate(loadAsDoubles(f1, inputType1, false), digits));
+ }
+
+ /** Computes the correlation between two score vectors.
+ *
+ * @param f0 the file containing the first score vector.
+ * @param f1 the file containing the second score vector.
+ * @param inputType the input type.
+ * @return the correlation.
+ * @see Precision#truncate(double[], int)
+ */
+ public double compute(final CharSequence f0, final CharSequence f1, final Class<?> inputType) throws IOException {
+ return compute(f0, inputType, f1, inputType, Integer.MAX_VALUE);
+ }
+
+ /** Loads a vector of doubles, either in binary or textual form.
+ *
+ * @param f a filename.
+ * @param inputType the input type, expressed as a class: {@link Double}, {@link Float}, {@link Integer}, {@link Long}
+ * or {@link String} to denote a text file.
+ * @param reverse whether to reverse the ranking induced by the score vector by loading opposite values.
+ * @return an array of double obtained reading <code>f</code>.
+ * @throws IllegalArgumentException if {@code reverse} is true, the type is integer or long and
+ * {@link Integer#MIN_VALUE} or {@link Long#MIN_VALUE}, respectively, appear in the file, as we
+ * cannot take the opposite.
+ */
+ public static double[] loadAsDoubles(final CharSequence f, final Class<?> inputType, final boolean reverse) throws IOException {
+ final double[] array;
+ if (inputType == String.class) {
+ array = DoubleIterators.unwrap(TextIO.asDoubleIterator(f));
+ if (reverse) for(int i = array.length; i-- != 0;) array[i] = -array[i];
+ return array;
+ }
+ final File file = new File(f.toString());
+ long length;
+ final FileInputStream fis = new FileInputStream(file);
+ final DataInputStream dis = new DataInputStream(new FastBufferedInputStream(fis));
+ try {
+ if (inputType == Integer.class || inputType == Float.class) length = fis.getChannel().size() / 4;
+ else if (inputType == Long.class || inputType == Double.class) length = fis.getChannel().size() / 8;
+ else throw new IllegalArgumentException();
+ if (length > Integer.MAX_VALUE) throw new IllegalArgumentException("File too long: " + fis.getChannel().size()+ " bytes (" + length + " elements)");
+ array = new double[(int)length];
+
+ if (reverse) {
+ if (inputType == Float.class) for(int i = 0; i < length; i++) array[i] = -dis.readFloat();
+ if (inputType == Double.class) for(int i = 0; i < length; i++) array[i] = -dis.readDouble();
+ if (inputType == Integer.class)
+ for(int i = 0; i < length; i++) {
+ array[i] = -dis.readInt();
+ if (array[i] == Integer.MIN_VALUE) throw new IllegalArgumentException("The score vector " + f + " contains Integer.MIN_VALUE, whose opposite cannot be represented");
+ }
+ if (inputType == Long.class)
+ for(int i = 0; i < length; i++) {
+ array[i] = -dis.readLong();
+ if (array[i] == Long.MIN_VALUE) throw new IllegalArgumentException("The score vector " + f + " contains Long.MIN_VALUE, whose opposite cannot be represented");
+ }
+ }
+ else {
+ if (inputType == Float.class) for(int i = 0; i < length; i++) array[i] = dis.readFloat();
+ if (inputType == Double.class) for(int i = 0; i < length; i++) array[i] = dis.readDouble();
+ if (inputType == Integer.class) for(int i = 0; i < length; i++) array[i] = dis.readInt();
+ if (inputType == Long.class) for(int i = 0; i < length; i++) array[i] = dis.readLong();
+ }
+ }
+ finally {
+ dis.close();
+ }
+ return array;
+ }
+
+ private static final Class<?>[] DOUBLES_DOUBLES = new Class<?>[] { Double.class, Double.class };
+
+ /** Commodity method to extract from a {@link JSAPResult} instance the file type information provided by
+ * the user, or supply the default (doubles in binary form). We look into the parameter {@code type}
+ * and we look for either a single type, or two types separated by a colon. The types can be
+ * {@code double}, {@code float}, {@code int}, {@code long} or {@code text}. If the parameter is not
+ * specified, we return the type {@link Double} for both formats.
+ *
+ * @param jsapResult the result of the parsing of a command line.
+ * @return a array containing two classes, representing the type of the files to be loaded (using {@link #loadAsDoubles(CharSequence, Class, boolean)}'s
+ * conventions).
+ */
+ public static Class<?>[] parseInputTypes(final JSAPResult jsapResult) {
+ if (jsapResult.userSpecified("type")) {
+ Class<?>[] inputType = new Class<?>[2];
+ String[] type = new String[2];
+ String types = jsapResult.getString("type");
+ int pos = types.indexOf(':');
+ if (pos >= 0) {
+ type[0] = types.substring(0, pos);
+ type[1] = types.substring(pos + 1);
+ }
+ else type[0] = type[1] = types;
+ for (int i = 0; i < 2; i++)
+ if (type[i].equals("int")) inputType[i] = Integer.class;
+ else if (type[i].equals("long")) inputType[i] = Long.class;
+ else if (type[i].equals("float")) inputType[i] = Float.class;
+ else if (type[i].equals("double")) inputType[i] = Double.class;
+ else if (type[i].equals("text")) inputType[i] = String.class;
+ else throw new IllegalArgumentException("Type \"" + type[i] + "\" is not one of int, long, float, double, text");
+
+ return inputType;
+ }
+ else return DOUBLES_DOUBLES;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/KendallTau.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/KendallTau.java
new file mode 100644
index 0000000..4c8f629
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/KendallTau.java
@@ -0,0 +1,210 @@
+package it.unimi.dsi.law.stat;
+
+import java.io.IOException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.law.util.ExchangeCounter;
+import it.unimi.dsi.law.util.Precision;
+
+// RELEASE-STATUS: DIST
+
+/** Computes Kendall's &tau; between two score vectors. More precisely, the this class computes the formula given by
+ * Kendall in &ldquo;The treatment of ties in ranking problems&rdquo;, <i>Biometrika</i> 33:239&minus;251, 1945.
+ *
+ * <p>Note that in the literature the 1945 definition is often called &tau;<sub><i>b</i></sub>, and &tau; is reserved for
+ * the original coefficient (&ldquo;A new measure of rank correlation&rdquo;, <i>Biometrika</i> 30:81&minus;93, 1938). But
+ * this distinction is pointless, as the 1938 paper defines &tau; only for rankings with no ties, and the generalisation in the
+ * 1945 paper reduces exactly to the original definition if there are no ties.
+ *
+ * <P>Given two scores vectors for a list of items,
+ * this class provides a {@linkplain #compute(double[], double[]) method to compute efficiently Kendall's &tau;}
+ * using an {@link ExchangeCounter}.
+ *
+ * <p>This class is a singleton: methods must be invoked on {@link #INSTANCE}.
+ * Additional methods inherited from {@link CorrelationIndex} make it possible to
+ * compute directly the score bewteen two files, or to bound the number of significant digits.
+ *
+ * <p>More precisely, given <var>r</var><sub><var>i</var></sub> and <var>s</var><sub><var>i</var></sub>
+ * (<var>i</var> = 0, 1,&nbsp;&hellip;, <var>n</var>&nbsp;&minus;&nbsp;1), we say that a pair (<var>i</var>, <var>j</var>), <var>i</var>&lt;<var>j</var>, is
+ * <ul>
+ * <li><em>concordant</em> iff <var>r</var><sub><var>i</var></sub> &minus; <var>r</var><sub><var>j</var></sub> and
+ * <var>s</var><sub><var>i</var></sub> &minus; <var>s</var><sub><var>j</var></sub> are both non-zero and
+ * have the same sign;
+ * <li><em>discordant</em> iff <var>r</var><sub><var>i</var></sub> &minus; <var>r</var><sub><var>j</var></sub> and
+ * <var>s</var><sub><var>i</var></sub> &minus; <var>s</var><sub><var>j</var></sub> are both non-zero and
+ * have opposite signs;
+ * <li> an <em><var>r</var>-tie</em> iff <var>r</var><sub><var>i</var></sub> &minus; <var>r</var><sub><var>j</var></sub> = 0;
+ * <li> an <em><var>s</var>-tie</em> iff <var>s</var><sub><var>i</var></sub> &minus; <var>s</var><sub><var>j</var></sub> = 0;
+ * <li> a <em>joint tie</em> iff <var>r</var><sub><var>i</var></sub> &minus; <var>r</var><sub><var>j</var></sub> = 0
+ * and <var>s</var><sub><var>i</var></sub> &minus; <var>s</var><sub><var>j</var></sub> = 0.
+ * </ul>
+ *
+ * <P>Let <var>C</var>, <var>D</var>, <var>T<sub>r</sub></var>, <var>T<sub>s</sub></var>, <var>J</var>
+ * be the number of concordant pairs, discordant pairs,
+ * <var>r</var>-ties, <var>s</var>-ties and joint ties, respectively, and <var>N</var> = <var>n</var>(<var>n</var> &minus; 1)/2. Of course
+ * <var>C</var>+<var>D</var>+<var>T<sub>r</sub></var>+<var>T<sub>s</sub></var> &minus; <var>J</var> = <var>N</var>.
+ * Kendall's &tau; is now
+ * <blockquote>
+ * &tau; = (<var>C</var> &minus; <var>D</var>) / [(<var>N</var> &minus; <var>T<sub>r</sub></var>)(<var>N</var> &minus; <var>T<sub>s</sub></var>)]<sup>1/2</sup>
+ * </blockquote>
+ *
+ * <p>A main method is provided for command-line usage.
+ */
+
+public class KendallTau extends CorrelationIndex {
+ private static final Logger LOGGER = LoggerFactory.getLogger(KendallTau.class);
+
+ private KendallTau() {}
+
+ /** The singleton instance of this class. */
+ public static final KendallTau INSTANCE = new KendallTau();
+
+ /** Computes Kendall's &tau; between two score vectors.
+ *
+ * <p>Note that this method must be called with some care. More precisely, the two
+ * arguments should be built on-the-fly in the method call, and not stored in variables,
+ * as the first argument array will be {@code null}'d during the execution of this method
+ * to free some memory: if the array is referenced elsewhere the garbage collector will not
+ * be able to collect it.
+ *
+ * @param v0 the first score vector.
+ * @param v1 the second score vector.
+ * @return Kendall's &tau;.
+ */
+ public double compute(double v0[], final double v1[]) {
+ if (v0.length != v1.length) throw new IllegalArgumentException("Array lengths differ: " + v0.length + ", " + v1.length);
+ final int length = v0.length;
+ if (length == 0) throw new IllegalArgumentException("Kendall's τ is undefined on empty rankings");
+
+ final int[] perm = Util.identity(length);
+
+ // First of all we sort perm stably by the first rank vector (higher ranks come first!), and then by the second in case of a tie.
+ DoubleArrays.radixSortIndirect(perm, v0, v1, true);
+
+ // Next, we compute the number of joint ties.
+ int i, first = 0;
+ long t = 0;
+ for(i = 1; i < length; i++) {
+ if (v0[perm[first]] != v0[perm[i]] || v1[perm[first]] != v1[perm[i]]) {
+ t += ((i - first) * (i - first - 1L)) / 2;
+ first = i;
+ }
+ }
+
+ t += ((i - first) * (i - first - 1L)) / 2; // Last block
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Joint ties: " + t);
+
+ // Now we compute the number of ties.
+ first = 0;
+ long u = 0;
+ for(i = 1; i < length; i++) {
+ if (v0[perm[first]] != v0[perm[i]]) {
+ u += ((i - first) * (i - first - 1L)) / 2;
+ first = i;
+ }
+ }
+
+ u += ((i - first) * (i - first - 1L)) / 2; // Last block
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Ties after first ordering: " + u);
+
+ // We null v0 so that the garbage collector can reclaim it.
+ v0 = null;
+
+ // Now we use an exchange counter to order stably by the second rank and count the number of exchanges (i.e., discordances).
+ final long exchanges = new ExchangeCounter(perm, v1).count();
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Exchanges: " + exchanges);
+
+ // Now we compute the number of ties.
+ first = 0;
+ long v = 0;
+ for(i = 1; i < length; i++) {
+ if (v1[perm[first]] != v1[perm[i]]) {
+ v += ((i - first) * (i - first - 1L)) / 2;
+ first = i;
+ }
+ }
+
+ v += ((i - first) * (i - first - 1L)) / 2; // Last block
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Ties after second ordering: " + v);
+
+ final long tot = (length * (length - 1L)) / 2;
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Combinations of order two: " + tot);
+
+ // Special case for all ties in both ranks
+ if (tot == u && tot == v) return 1;
+ // Ensure interval [-1..1] (small deviations might appear because of numerical errors).
+ return Math.min(1, Math.max(-1, ((tot - (v + u - t)) - 2.0 * exchanges) / (Math.sqrt((double) (tot - u) * (double) (tot - v)))));
+ }
+
+ public static void main(String[] arg) throws NumberFormatException, IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(KendallTau.class.getName(),
+ "Computes Kendall's τ between the score vectors contained in two given files. " +
+ "The two files must contain the same number of doubles, written " +
+ "in Java binary format. The option -t makes it possible to specify a different " +
+ "type (possibly for each input file)." +
+ "\n" +
+ "If one or more truncations are specified with the option -T, the values of " +
+ "Kendall's τ for the given files truncated to the given number of binary " +
+ "fractional digits, in the same order, will be printed to standard output." +
+ "If there is more than one value, the vectors will be loaded in memory just " +
+ "once and copied across computations.",
+ new Parameter[] {
+ new FlaggedOption("type", JSAP.STRING_PARSER, "double", JSAP.NOT_REQUIRED, 't', "type", "The type of the input files, of the form type[:type] where type is one of int, long, float, double, text"),
+ new FlaggedOption("digits", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'T', "truncate", "Truncate inputs to the given number of binary fractional digits.").setAllowMultipleDeclarations(true),
+ new UnflaggedOption("file0", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The first rank file."),
+ new UnflaggedOption("file1", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The second rank file."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String f0 = jsapResult.getString("file0");
+ final String f1 = jsapResult.getString("file1");
+ final Class<?>[] inputType = parseInputTypes(jsapResult);
+
+ int[] digits = jsapResult.getIntArray("digits");
+ if (digits.length == 0) digits = new int[] { Integer.MAX_VALUE };
+
+ if (digits.length == 1) System.out.println(INSTANCE.compute(f0, inputType[0], f1, inputType[1], digits[0]));
+ else {
+ final double[] v0 = loadAsDoubles(f0, inputType[0], false), v1 = loadAsDoubles(f1, inputType[1], false);
+ for(int d: digits) System.out.println(INSTANCE.compute(Precision.truncate(v0.clone(), d), Precision.truncate(v1.clone(), d)));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/WeightedTau.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/WeightedTau.java
new file mode 100644
index 0000000..3afe05e
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/WeightedTau.java
@@ -0,0 +1,386 @@
+package it.unimi.dsi.law.stat;
+
+import java.io.IOException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2013-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.fastutil.ints.AbstractInt2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.law.util.ExchangeWeigher;
+import it.unimi.dsi.law.util.Precision;
+
+// RELEASE-STATUS: DIST
+
+/** Computes the weighted &tau; between two score vectors. More precisely, this class computes the formula given by
+ * Sebastiano Vigna in &ldquo;<a href="http://vigna.di.unimi.it/papers.php#VigWCIRT">A weighted correlation index for rankings
+ * with ties</a>&rdquo;, <i>Proc&#46; of the 24th International World&ndash;Wide Web
+ * Conference</i>, pages 1166&minus;1176, 2015, ACM Press, using the algorithm therein described (see details below).
+ *
+ * <P>Given two scores vectors for a list of items,
+ * this class provides a {@linkplain #compute(double[], double[]) method to compute efficiently the weighted &tau;}
+ * using an {@link ExchangeWeigher}.
+ *
+ * <p>Instances of this class are immutable. At creation time you can specify a
+ * <em>weigher</em> that turns indices into weights, and
+ * whether to combine weights additively or multiplicatively.
+ * Ready-made weighers include {@link #HYPERBOLIC_WEIGHER}, which is the weigher of choice. Alternatives include
+ * {@link #LOGARITHMIC_WEIGHER} and {@link #QUADRATIC_WEIGHER}.
+ *
+ * Additional methods inherited from {@link CorrelationIndex} make it possible to
+ * compute directly the weighted &tau; bewteen two files, to bound the number of significant digits, or
+ * to reverse the standard association between scores and ranks (by default,
+ * a larger score corresponds to a higher rank, i.e., to a smaller rank index; the largest score gets
+ * rank 0).
+ *
+ * <p>The <em>weighted</em> &tau; is defined as follows: consider a <em>rank</em> function &rho; (returning
+ * natural numbers or &infin;) that provides a <em>ground truth</em>&mdash;it tells us which elements are more or less important. Consider
+ * also a weight function <var>w</var>(&minus;, &minus;) associating with each pair of ranks a nonnegative real number. We define
+ * the <em>rank-weighted &tau;</em> by
+ * <div align=center>
+ * &#x3008;<b><var>r</var></b>, <b><var>s</var></b>&#x3009;<sub>&rho;,<var>w</var></sub> = <big>&Sigma;</big><sub><var>i</var>,&nbsp;<var>j</var></sub>
+ * sgn(<var>r</var><sub><var>i</var></sub> &minus; <var>r</var><sub><var>j</var></sub>)
+ * sgn(<var>s</var><sub><var>i</var></sub> &minus; <var>s</var><sub><var>j</var></sub>) <var>w</var>(&rho;(<var>i</var>), &rho;(<var>j</var>))
+ * </div>
+ * <div align=center>
+ * &#x2016;<b><var>r</var></b>&#x2016;<sub>&rho;,<var>w</var></sub> = &#x3008;<b><var>r</var></b>, <b><var>r</var></b>&#x3009;<sub>&rho;,<var>w</var></sub><sup>1/2</sup>
+ * </div>
+ * <div align=center>
+ * &tau;<sub>&rho;,<var>w</var></sub>(<b><var>r</var></b>, <b><var>s</var></b>) = &#x3008;<b><var>r</var></b>, <b><var>s</var></b>&#x3009;<sub>&rho;,<var>w</var></sub> / (&#x2016;<b><var>r</var></b>&#x2016;<sub>&rho;,<var>w</var></sub> &#x2016;<b><var>s</var></b>&#x2016;<sub>&rho;,<var>w</var></sub>).
+ * </div>
+ *
+ * <p>The weight function can be specified by giving a weigher <var>f</var> (e.g., {@link #HYPERBOLIC_WEIGHER}) and a combination
+ * strategy, which can be additive or multiplicative.
+ * The weight of the exchange between <var>i</var> and <var>j</var>
+ * is then <var>f</var>(<var>i</var>) &#9679; <var>f</var>(<var>j</var>), where &#9679; is the chosen combinator.
+ *
+ * <p>Now, consider the rank function &rho;<sub><b><var>r</var></b>, <b><var>s</var></b></sub> induced
+ * by the lexicographical order by <b><var>r</var></b> and <b><var>s</var></b></sub>. We define
+ * <div align=center>
+ * &tau;<sub><var>w</var></sub> = (&tau;<sub>&rho;<sub><b><var>r</var></b>, <b><var>s</var></b></sub>, <var>w</var></sub> + &tau;<sub>&rho;<sub><b><var>s</var></b>, <b><var>r</var></b></sub>, <var>w</var></sub>) / 2.
+ * </div>
+ *
+ * <p>In particular, the (additive) <em>hyperbolic &tau;</em> is defined by the weight function <var>h</var>(<var>i</var>) = 1 / (<var>i</var> + 1) combined additively:
+ * <div align=center>
+ * &tau;<sub>h</sub> = (&tau;<sub>&rho;<sub><b><var>r</var></b>, <b><var>s</var></b></sub>, <var>h</var></sub> + &tau;<sub>&rho;<sub><b><var>s</var></b>, <b><var>r</var></b></sub>, <var>h</var></sub>) / 2.
+ * </div>
+ *
+ * <p>The methods inherited from {@link CorrelationIndex} compute the formula above using the provided weigher
+ * and combination method. A ready-made instance {@link #HYPERBOLIC} can be used to compute the additive hyperbolic &tau;. An
+ * <i>ad hoc</i> {@linkplain #compute(double[], double[], int[]) method} can instead compute &tau;<sub>&rho;,<var>w</var></sub>.
+ *
+ * <p>A main method is provided for command-line usage.
+ */
+
+public class WeightedTau extends CorrelationIndex {
+ private final static Logger LOGGER = LoggerFactory.getLogger(WeightedTau.class);
+
+ public static abstract class AbstractWeigher extends AbstractInt2DoubleFunction {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public boolean containsKey(final int x) {
+ return x >= 0;
+ }
+ @Override
+ public int size() {
+ return -1;
+ }
+ }
+
+ private static final class HyperbolicWeigher extends AbstractWeigher {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(final int x) {
+ return 1. / (x + 1);
+ }
+ }
+
+ /** A hyperbolic weigher (the default one). Rank <var>x</var> has weight 1 / (<var>x</var> + 1). */
+ public static final Int2DoubleFunction HYPERBOLIC_WEIGHER = new HyperbolicWeigher();
+
+ private static final class QuadraticWeigher extends AbstractWeigher {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(final int x) {
+ double xPlus1 = x + 1.;
+ return 1. / (xPlus1 * xPlus1);
+ }
+ }
+
+ /** A quadratic weigher. Rank <var>x</var> has weight 1 / (<var>x</var> + 1)<sup>2</sup>. */
+ public static final Int2DoubleFunction QUADRATIC_WEIGHER = new QuadraticWeigher();
+
+ private static final class LogarithmicWeigher extends AbstractWeigher {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(final int x) {
+ return 1. / Math.log(x + Math.E);
+ }
+ }
+
+ /** A logarithmic weigher. Rank <var>x</var> has weight 1 / ln(<var>x</var> + <var>e</var>). */
+ public static final Int2DoubleFunction LOGARITHMIC_WEIGHER = new LogarithmicWeigher();
+
+ private static final class ZeroWeigher extends AbstractWeigher {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(final int x) {
+ return 0;
+ }
+ }
+
+ /** A constant zero weigher. */
+ public static final Int2DoubleFunction ZERO_WEIGHER = new ZeroWeigher();
+
+ /** A singleton instance of the symmetric hyperbolic additive &tau;.*/
+ public final static WeightedTau HYPERBOLIC = new WeightedTau();
+
+ /** The weigher. */
+ private final Int2DoubleFunction weigher;
+ /** Whether to multiply weights, rather than adding them. */
+ private final boolean multiplicative;
+
+ /** Create an additive hyperbolic &tau;.
+ */
+ public WeightedTau() {
+ this(HYPERBOLIC_WEIGHER);
+ }
+
+ /** Create an additive weighted &tau; using the specified weigher.
+ *
+ * @param weigher a weigher.
+ */
+ public WeightedTau(final Int2DoubleFunction weigher) {
+ this(weigher, false);
+ }
+
+ /** Create an additive or multiplicative weighted &tau; using the specified weigher and combination strategy.
+ *
+ * @param weigher a weigher.
+ * @param multiplicative if true, weights are combined multiplicatively, rather than additively.
+ */
+ public WeightedTau(final Int2DoubleFunction weigher, final boolean multiplicative) {
+ this.weigher = weigher;
+ this.multiplicative = multiplicative;
+ }
+
+ /** Computes the symmetrized weighted &tau; between two score vectors.
+ *
+ * @param v0 the first score vector.
+ * @param v1 the second score vector.
+ * @return the symmetric weighted &tau;.
+ */
+ public double compute(final double v0[], final double v1[]) {
+ // Ensure interval [-1..1] (small deviations might appear because of numerical errors).
+ return Math.min(1, Math.max(-1, (compute(v0, v1, null) + compute(v1, v0, null)) / 2));
+ }
+
+ /** Computes the weighted &tau; between two score vectors, given a reference rank.
+ *
+ * <p>Note that this method must be called with some care. More precisely, the two
+ * arguments should be built on-the-fly in the method call, and not stored in variables,
+ * as the first argument array will be {@code null}'d during the execution of this method
+ * to free some memory: if the array is referenced elsewhere the garbage collector will not
+ * be able to collect it.
+ *
+ * @param v0 the first score vector.
+ * @param v1 the second score vector.
+ * @param rank the &ldquo;ground truth&rdquo; ranking used to weight exchanges, or {@code null} to use the
+ * ranking induced lexicographically by {@code v1} and {@code v0} as ground truth.
+ * @return the weighted &tau;.
+ */
+ public double compute(final double v0[], double v1[], int[] rank) {
+ if (v0.length != v1.length) throw new IllegalArgumentException("Array lengths differ: " + v0.length + ", " + v1.length);
+ final int length = v0.length;
+ if (length == 0) throw new IllegalArgumentException("The weighted τ is undefined on empty rankings");
+ if (rank != null && rank.length != length) throw new IllegalArgumentException("The score array length (" + length + ") and the rank array length (" + rank.length + ") do not match");
+
+ final int[] perm = Util.identity(length);
+
+ // First of all we sort perm stably by the second score vector, and then by the first in case of a tie.
+ DoubleArrays.radixSortIndirect(perm, v1, v0, true);
+
+ if (rank == null) {
+ // To generate a rank array, we must first reverse the permutation (to get higher ranks first) and then invert it.
+ rank = perm.clone();
+ IntArrays.reverse(rank);
+ Util.invertPermutationInPlace(rank);
+ }
+
+ // Next, we compute weight of joint ties.
+ int i, first = 0;
+ double t = 0;
+ double w = weigher.get(rank[perm[first]]);
+ double s = w;
+ double sq = w * w;
+
+ for(i = 1; i < length; i++) {
+ if (v0[perm[first]] != v0[perm[i]] || v1[perm[first]] != v1[perm[i]]) {
+ t += multiplicative ? (s * s - sq) / 2 : s * (i - first - 1);
+ first = i;
+ s = sq = 0;
+ }
+ w = weigher.get(rank[perm[i]]);
+ s += w;
+ sq += w * w;
+ }
+
+ t += multiplicative ? (s * s - sq) / 2 : s * (i - first - 1); // Last block
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Weight of joint ties: " + t);
+
+ // Now we compute the weight of ties in the second score vector.
+ first = 0;
+ double v = 0;
+ w = weigher.get(rank[perm[first]]);
+ s = w;
+ sq = w * w;
+ for(i = 1; i < length; i++) {
+ if (v1[perm[first]] != v1[perm[i]]) {
+ v += multiplicative ? (s * s - sq) / 2 : s * (i - first - 1);
+ first = i;
+ s = sq = 0;
+ }
+ w = weigher.get(rank[perm[i]]);
+ s += w;
+ sq += w * w;
+ }
+
+ v += multiplicative ? (s * s - sq) / 2 : s * (i - first - 1); // Last block
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Weight of ties in the second score vector: " + v);
+
+ // We null v0 so that the garbage collector can reclaim it.
+ v1 = null;
+
+ // Now we use an exchange weigher to order stably by the first score vector and weigh the exchanges (i.e., discordances).
+ final double exchanges = new ExchangeWeigher(weigher, perm, v0, rank, multiplicative, new int[length]).weigh();
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Weight of exchanges: " + exchanges);
+
+ // Now we compute the weight of ties in the first score vector.
+ first = 0;
+ double u = 0;
+ w = weigher.get(rank[perm[first]]);
+ s = w;
+ sq = w * w;
+ for(i = 1; i < length; i++) {
+ if (v0[perm[first]] != v0[perm[i]]) {
+ u += multiplicative ? (s * s - sq) / 2 : s * (i - first - 1);
+ first = i;
+ s = sq = 0;
+ }
+ w = weigher.get(rank[perm[i]]);
+ s += w;
+ sq += w * w;
+ }
+
+ u += multiplicative ? (s * s - sq) / 2 : s * (i - first - 1); // Last block
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Weight of ties in the first score vector: " + u);
+
+ s = sq = 0;
+ for(i = 0; i < length; i++) {
+ w = weigher.get(rank[perm[i]]);
+ s += w;
+ sq += w * w;
+ }
+
+ final double tot = multiplicative ? (s * s - sq) / 2 : s * (length - 1);
+
+ if (LOGGER.isDebugEnabled()) LOGGER.debug("Total weight: " + tot);
+ // Special case for all ties in both ranks
+ if (tot == u && tot == v) return 1;
+
+ // System.out.println(tot + " " + u + " " + v + " " + exchanges + " -> " + Math.min(1, Math.max(-1, (tot - v - u + t - 2 * exchanges) / Math.sqrt((tot - u) * (tot - v)))));
+
+ return Math.min(1, Math.max(-1, (tot - v - u + t - 2 * exchanges) / Math.sqrt((tot - u) * (tot - v))));
+ }
+
+
+ public static void main(String[] arg) throws NumberFormatException, IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(WeightedTau.class.getName(),
+ "Computes a weighted correlation index between two given score files. " +
+ "By default, the index is a symmetric additive hyperbolic τ, but you can set a different choice " +
+ "using the available options. Note that scores need not to be distinct (i.e., you can have an arbitrary number of ties)." +
+ "\n" +
+ "By default, the two files must contain the same number of doubles, written " +
+ "in Java binary (DataOutput) format. The option -t makes it possible to specify a different " +
+ "type (possibly for each input file)." +
+ "\n" +
+ "If one or more truncations are specified with the option -T, the values " +
+ "of specified weighted correlation index for the given files truncated to the given number of binary " +
+ "fractional digits, in the same order, will be printed to standard output." +
+ "If there is more than one value, the vectors will be loaded in memory just " +
+ "once and copied across computations.",
+ new Parameter[] {
+ new Switch("reverse", 'r', "reverse", "Use reverse ranks (that is, rank decreases as score increases)."),
+ new Switch("logarithmic", 'l', "logarithmic", "Use a logarithmic (instead of hyperbolic) weight."),
+ new Switch("quadratic", 'q', "quadratic", "Use a quadratic (instead of hyperbolic) weight."),
+ new Switch("multiplicative", 'm', "multiplicative", "Use a multiplicative (instead of additive) combination of weights."),
+ new FlaggedOption("type", JSAP.STRING_PARSER, "double", JSAP.NOT_REQUIRED, 't', "type", "The type of the input files, of the form type[:type] where type is one of int, long, float, double, text"),
+ new FlaggedOption("digits", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'T', "truncate", "Truncate inputs to the given number of binary fractional digits.").setAllowMultipleDeclarations(true),
+ new UnflaggedOption("file0", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The first score file."),
+ new UnflaggedOption("file1", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The second score file."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String f0 = jsapResult.getString("file0");
+ final String f1 = jsapResult.getString("file1");
+ final boolean reverse = jsapResult.userSpecified("reverse");
+
+ final boolean logarithmic = jsapResult.userSpecified("logarithmic");
+ final boolean quadratic = jsapResult.userSpecified("quadratic");
+ final boolean multiplicative = jsapResult.userSpecified("multiplicative");
+ if (logarithmic && quadratic) throw new IllegalArgumentException("You cannot specify logarithmic and quadratic weighting at the same time");
+
+ final Class<?>[] inputType = parseInputTypes(jsapResult);
+
+ int[] digits = jsapResult.getIntArray("digits");
+ if (digits.length == 0) digits = new int[] { Integer.MAX_VALUE };
+
+ final WeightedTau weightedTau = new WeightedTau(logarithmic
+ ? LOGARITHMIC_WEIGHER
+ : quadratic
+ ? QUADRATIC_WEIGHER
+ : HYPERBOLIC_WEIGHER, multiplicative);
+
+ if (digits.length == 1) System.out.println(weightedTau.compute(f0, inputType[0], f1, inputType[1], reverse, digits[0]));
+ else {
+ final double[] v0 = loadAsDoubles(f0, inputType[0], reverse), v1 = loadAsDoubles(f1, inputType[1], reverse);
+ for(int d: digits) System.out.println(weightedTau.compute(Precision.truncate(v0.clone(), d), Precision.truncate(v1.clone(), d)));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/package.html
new file mode 100644
index 0000000..05df658
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/stat/package.html
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>Statistical tools</title>
+ </head>
+
+ <body>
+
+ <P>Statistical tools (in particular, {@linkplain it.unimi.dsi.law.stat.KendallTau Kendall's &tau;}) for large-size data.
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/CRC64.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/CRC64.java
new file mode 100644
index 0000000..11fe87f
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/CRC64.java
@@ -0,0 +1,146 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2007-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.lang.MutableString;
+
+//RELEASE-STATUS: DIST
+
+/** Provides static methods to compute 64-bit CRCs of strings and byte arrays. It
+ * uses the primitive polynomial
+ * <var>x</var><sup>64</sup>+<var>x</var><sup>4</sup>+<var>x</var><sup>3</sup>+<var>x</var>+1.
+ */
+public class CRC64 {
+
+ private CRC64() {}
+
+ /** This array stores precomputed values: T[i][j] is the CRC of j << 64+8*i. */
+ private final static long[][] T =
+ {
+ { 0x0000000000000000L, 0x1b00000000000000L, 0x3600000000000000L, 0x2d00000000000000L, 0x6c00000000000000L, 0x7700000000000000L, 0x5a00000000000000L, 0x4100000000000000L, 0xd800000000000000L, 0xc300000000000000L, 0xee00000000000000L, 0xf500000000000000L, 0xb400000000000000L, 0xaf00000000000000L, 0x8200000000000000L, 0x9900000000000000L, 0xb00000000000001bL, 0xab0000000000001bL, 0x860000000000001bL, 0x9d0000000000001bL, 0xdc0000000000001bL, 0xc70000000000001bL, 0xea0000000000001bL, 0xf10000000000001bL, 0x680000000000001bL, 0x730000000000001bL, 0x5e0000000000001bL, 0x450000000000001bL, 0x040000000000001bL, 0x1f0000000000001bL, 0x320000000000001bL, 0x290000000000001bL, 0x600000000000002dL, 0x7b0000000000002dL, 0x560000000000002dL, 0x4d0000000000002dL, 0x0c0000000000002dL, 0x170000000000002dL, 0x3a0000000000002dL, 0x210000000000002dL, 0xb80000000000002dL, 0xa30000000000002dL, 0x8e0000000000002dL, 0x950000000000002dL, 0xd40000000000002dL, 0xcf0000000000002dL, 0xe20000000000002dL, 0xf90000000000002dL, 0xd000000000000036L, 0xcb00000000000036L, 0xe600000000000036L, 0xfd00000000000036L, 0xbc00000000000036L, 0xa700000000000036L, 0x8a00000000000036L, 0x9100000000000036L, 0x0800000000000036L, 0x1300000000000036L, 0x3e00000000000036L, 0x2500000000000036L, 0x6400000000000036L, 0x7f00000000000036L, 0x5200000000000036L, 0x4900000000000036L, 0xc00000000000005aL, 0xdb0000000000005aL, 0xf60000000000005aL, 0xed0000000000005aL, 0xac0000000000005aL, 0xb70000000000005aL, 0x9a0000000000005aL, 0x810000000000005aL, 0x180000000000005aL, 0x030000000000005aL, 0x2e0000000000005aL, 0x350000000000005aL, 0x740000000000005aL, 0x6f0000000000005aL, 0x420000000000005aL, 0x590000000000005aL, 0x7000000000000041L, 0x6b00000000000041L, 0x4600000000000041L, 0x5d00000000000041L, 0x1c00000000000041L, 0x0700000000000041L, 0x2a00000000000041L, 0x3100000000000041L, 0xa800000000000041L, 0xb300000000000041L, 0x9e00000000000041L, 0x8500000000000041L, 0xc400000000000041L, 0xdf00000000000041L, 0xf200000000000041L, 0xe900000000000041L, 0xa000000000000077L, 0xbb00000000000077L, 0x9600000000000077L, 0x8d00000000000077L, 0xcc00000000000077L, 0xd700000000000077L, 0xfa00000000000077L, 0xe100000000000077L, 0x7800000000000077L, 0x6300000000000077L, 0x4e00000000000077L, 0x5500000000000077L, 0x1400000000000077L, 0x0f00000000000077L, 0x2200000000000077L, 0x3900000000000077L, 0x100000000000006cL, 0x0b0000000000006cL, 0x260000000000006cL, 0x3d0000000000006cL, 0x7c0000000000006cL, 0x670000000000006cL, 0x4a0000000000006cL, 0x510000000000006cL, 0xc80000000000006cL, 0xd30000000000006cL, 0xfe0000000000006cL, 0xe50000000000006cL, 0xa40000000000006cL, 0xbf0000000000006cL, 0x920000000000006cL, 0x890000000000006cL, 0x80000000000000afL, 0x9b000000000000afL, 0xb6000000000000afL, 0xad000000000000afL, 0xec000000000000afL, 0xf7000000000000afL, 0xda000000000000afL, 0xc1000000000000afL, 0x58000000000000afL, 0x43000000000000afL, 0x6e000000000000afL, 0x75000000000000afL, 0x34000000000000afL, 0x2f000000000000afL, 0x02000000000000afL, 0x19000000000000afL, 0x30000000000000b4L, 0x2b000000000000b4L, 0x06000000000000b4L, 0x1d000000000000b4L, 0x5c000000000000b4L, 0x47000000000000b4L, 0x6a000000000000b4L, 0x71000000000000b4L, 0xe8000000000000b4L, 0xf3000000000000b4L, 0xde000000000000b4L, 0xc5000000000000b4L, 0x84000000000000b4L, 0x9f000000000000b4L, 0xb2000000000000b4L, 0xa9000000000000b4L, 0xe000000000000082L, 0xfb00000000000082L, 0xd600000000000082L, 0xcd00000000000082L, 0x8c00000000000082L, 0x9700000000000082L, 0xba00000000000082L, 0xa100000000000082L, 0x3800000000000082L, 0x2300000000000082L, 0x0e00000000000082L, 0x1500000000000082L, 0x5400000000000082L, 0x4f00000000000082L, 0x6200000000000082L, 0x7900000000000082L, 0x5000000000000099L, 0x4b00000000000099L, 0x6600000000000099L, 0x7d00000000000099L, 0x3c00000000000099L, 0x2700000000000099L, 0x0a00000000000099L, 0x1100000000000099L, 0x8800000000000099L, 0x9300000000000099L, 0xbe00000000000099L, 0xa500000000000099L, 0xe400000000000099L, 0xff00000000000099L, 0xd200000000000099L, 0xc900000000000099L, 0x40000000000000f5L, 0x5b000000000000f5L, 0x76000000000000f5L, 0x6d000000000000f5L, 0x2c000000000000f5L, 0x37000000000000f5L, 0x1a000000000000f5L, 0x01000000000000f5L, 0x98000000000000f5L, 0x83000000000000f5L, 0xae000000000000f5L, 0xb5000000000000f5L, 0xf4000000000000f5L, 0xef000000000000f5L, 0xc2000000000000f5L, 0xd9000000000000f5L, 0xf0000000000000eeL, 0xeb000000000000eeL, 0xc6000000000000eeL, 0xdd000000000000eeL, 0x9c000000000000eeL, 0x87000000000000eeL, 0xaa000000000000eeL, 0xb1000000000000eeL, 0x28000000000000eeL, 0x33000000000000eeL, 0x1e000000000000eeL, 0x05000000000000eeL, 0x44000000000000eeL, 0x5f000000000000eeL, 0x72000000000000eeL, 0x69000000000000eeL, 0x20000000000000d8L, 0x3b000000000000d8L, 0x16000000000000d8L, 0x0d000000000000d8L, 0x4c000000000000d8L, 0x57000000000000d8L, 0x7a000000000000d8L, 0x61000000000000d8L, 0xf8000000000000d8L, 0xe3000000000000d8L, 0xce000000000000d8L, 0xd5000000000000d8L, 0x94000000000000d8L, 0x8f000000000000d8L, 0xa2000000000000d8L, 0xb9000000000000d8L, 0x90000000000000c3L, 0x8b000000000000c3L, 0xa6000000000000c3L, 0xbd000000000000c3L, 0xfc000000000000c3L, 0xe7000000000000c3L, 0xca000000000000c3L, 0xd1000000000000c3L, 0x48000000000000c3L, 0x53000000000000c3L, 0x7e000000000000c3L, 0x65000000000000c3L, 0x24000000000000c3L, 0x3f000000000000c3L, 0x12000000000000c3L, 0x09000000000000c3L },
+ { 0x0000000000000000L, 0x001b000000000000L, 0x0036000000000000L, 0x002d000000000000L, 0x006c000000000000L, 0x0077000000000000L, 0x005a000000000000L, 0x0041000000000000L, 0x00d8000000000000L, 0x00c3000000000000L, 0x00ee000000000000L, 0x00f5000000000000L, 0x00b4000000000000L, 0x00af000000000000L, 0x0082000000000000L, 0x0099000000000000L, 0x01b0000000000000L, 0x01ab000000000000L, 0x0186000000000000L, 0x019d000000000000L, 0x01dc000000000000L, 0x01c7000000000000L, 0x01ea000000000000L, 0x01f1000000000000L, 0x0168000000000000L, 0x0173000000000000L, 0x015e000000000000L, 0x0145000000000000L, 0x0104000000000000L, 0x011f000000000000L, 0x0132000000000000L, 0x0129000000000000L, 0x0360000000000000L, 0x037b000000000000L, 0x0356000000000000L, 0x034d000000000000L, 0x030c000000000000L, 0x0317000000000000L, 0x033a000000000000L, 0x0321000000000000L, 0x03b8000000000000L, 0x03a3000000000000L, 0x038e000000000000L, 0x0395000000000000L, 0x03d4000000000000L, 0x03cf000000000000L, 0x03e2000000000000L, 0x03f9000000000000L, 0x02d0000000000000L, 0x02cb000000000000L, 0x02e6000000000000L, 0x02fd000000000000L, 0x02bc000000000000L, 0x02a7000000000000L, 0x028a000000000000L, 0x0291000000000000L, 0x0208000000000000L, 0x0213000000000000L, 0x023e000000000000L, 0x0225000000000000L, 0x0264000000000000L, 0x027f000000000000L, 0x0252000000000000L, 0x0249000000000000L, 0x06c0000000000000L, 0x06db000000000000L, 0x06f6000000000000L, 0x06ed000000000000L, 0x06ac000000000000L, 0x06b7000000000000L, 0x069a000000000000L, 0x0681000000000000L, 0x0618000000000000L, 0x0603000000000000L, 0x062e000000000000L, 0x0635000000000000L, 0x0674000000000000L, 0x066f000000000000L, 0x0642000000000000L, 0x0659000000000000L, 0x0770000000000000L, 0x076b000000000000L, 0x0746000000000000L, 0x075d000000000000L, 0x071c000000000000L, 0x0707000000000000L, 0x072a000000000000L, 0x0731000000000000L, 0x07a8000000000000L, 0x07b3000000000000L, 0x079e000000000000L, 0x0785000000000000L, 0x07c4000000000000L, 0x07df000000000000L, 0x07f2000000000000L, 0x07e9000000000000L, 0x05a0000000000000L, 0x05bb000000000000L, 0x0596000000000000L, 0x058d000000000000L, 0x05cc000000000000L, 0x05d7000000000000L, 0x05fa000000000000L, 0x05e1000000000000L, 0x0578000000000000L, 0x0563000000000000L, 0x054e000000000000L, 0x0555000000000000L, 0x0514000000000000L, 0x050f000000000000L, 0x0522000000000000L, 0x0539000000000000L, 0x0410000000000000L, 0x040b000000000000L, 0x0426000000000000L, 0x043d000000000000L, 0x047c000000000000L, 0x0467000000000000L, 0x044a000000000000L, 0x0451000000000000L, 0x04c8000000000000L, 0x04d3000000000000L, 0x04fe000000000000L, 0x04e5000000000000L, 0x04a4000000000000L, 0x04bf000000000000L, 0x0492000000000000L, 0x0489000000000000L, 0x0d80000000000000L, 0x0d9b000000000000L, 0x0db6000000000000L, 0x0dad000000000000L, 0x0dec000000000000L, 0x0df7000000000000L, 0x0dda000000000000L, 0x0dc1000000000000L, 0x0d58000000000000L, 0x0d43000000000000L, 0x0d6e000000000000L, 0x0d75000000000000L, 0x0d34000000000000L, 0x0d2f000000000000L, 0x0d02000000000000L, 0x0d19000000000000L, 0x0c30000000000000L, 0x0c2b000000000000L, 0x0c06000000000000L, 0x0c1d000000000000L, 0x0c5c000000000000L, 0x0c47000000000000L, 0x0c6a000000000000L, 0x0c71000000000000L, 0x0ce8000000000000L, 0x0cf3000000000000L, 0x0cde000000000000L, 0x0cc5000000000000L, 0x0c84000000000000L, 0x0c9f000000000000L, 0x0cb2000000000000L, 0x0ca9000000000000L, 0x0ee0000000000000L, 0x0efb000000000000L, 0x0ed6000000000000L, 0x0ecd000000000000L, 0x0e8c000000000000L, 0x0e97000000000000L, 0x0eba000000000000L, 0x0ea1000000000000L, 0x0e38000000000000L, 0x0e23000000000000L, 0x0e0e000000000000L, 0x0e15000000000000L, 0x0e54000000000000L, 0x0e4f000000000000L, 0x0e62000000000000L, 0x0e79000000000000L, 0x0f50000000000000L, 0x0f4b000000000000L, 0x0f66000000000000L, 0x0f7d000000000000L, 0x0f3c000000000000L, 0x0f27000000000000L, 0x0f0a000000000000L, 0x0f11000000000000L, 0x0f88000000000000L, 0x0f93000000000000L, 0x0fbe000000000000L, 0x0fa5000000000000L, 0x0fe4000000000000L, 0x0fff000000000000L, 0x0fd2000000000000L, 0x0fc9000000000000L, 0x0b40000000000000L, 0x0b5b000000000000L, 0x0b76000000000000L, 0x0b6d000000000000L, 0x0b2c000000000000L, 0x0b37000000000000L, 0x0b1a000000000000L, 0x0b01000000000000L, 0x0b98000000000000L, 0x0b83000000000000L, 0x0bae000000000000L, 0x0bb5000000000000L, 0x0bf4000000000000L, 0x0bef000000000000L, 0x0bc2000000000000L, 0x0bd9000000000000L, 0x0af0000000000000L, 0x0aeb000000000000L, 0x0ac6000000000000L, 0x0add000000000000L, 0x0a9c000000000000L, 0x0a87000000000000L, 0x0aaa000000000000L, 0x0ab1000000000000L, 0x0a28000000000000L, 0x0a33000000000000L, 0x0a1e000000000000L, 0x0a05000000000000L, 0x0a44000000000000L, 0x0a5f000000000000L, 0x0a72000000000000L, 0x0a69000000000000L, 0x0820000000000000L, 0x083b000000000000L, 0x0816000000000000L, 0x080d000000000000L, 0x084c000000000000L, 0x0857000000000000L, 0x087a000000000000L, 0x0861000000000000L, 0x08f8000000000000L, 0x08e3000000000000L, 0x08ce000000000000L, 0x08d5000000000000L, 0x0894000000000000L, 0x088f000000000000L, 0x08a2000000000000L, 0x08b9000000000000L, 0x0990000000000000L, 0x098b000000000000L, 0x09a6000000000000L, 0x09bd000000000000L, 0x09fc000000000000L, 0x09e7000000000000L, 0x09ca000000000000L, 0x09d1000000000000L, 0x0948000000000000L, 0x0953000000000000L, 0x097e000000000000L, 0x0965000000000000L, 0x0924000000000000L, 0x093f000000000000L, 0x0912000000000000L, 0x0909000000000000L },
+ { 0x0000000000000000L, 0x00001b0000000000L, 0x0000360000000000L, 0x00002d0000000000L, 0x00006c0000000000L, 0x0000770000000000L, 0x00005a0000000000L, 0x0000410000000000L, 0x0000d80000000000L, 0x0000c30000000000L, 0x0000ee0000000000L, 0x0000f50000000000L, 0x0000b40000000000L, 0x0000af0000000000L, 0x0000820000000000L, 0x0000990000000000L, 0x0001b00000000000L, 0x0001ab0000000000L, 0x0001860000000000L, 0x00019d0000000000L, 0x0001dc0000000000L, 0x0001c70000000000L, 0x0001ea0000000000L, 0x0001f10000000000L, 0x0001680000000000L, 0x0001730000000000L, 0x00015e0000000000L, 0x0001450000000000L, 0x0001040000000000L, 0x00011f0000000000L, 0x0001320000000000L, 0x0001290000000000L, 0x0003600000000000L, 0x00037b0000000000L, 0x0003560000000000L, 0x00034d0000000000L, 0x00030c0000000000L, 0x0003170000000000L, 0x00033a0000000000L, 0x0003210000000000L, 0x0003b80000000000L, 0x0003a30000000000L, 0x00038e0000000000L, 0x0003950000000000L, 0x0003d40000000000L, 0x0003cf0000000000L, 0x0003e20000000000L, 0x0003f90000000000L, 0x0002d00000000000L, 0x0002cb0000000000L, 0x0002e60000000000L, 0x0002fd0000000000L, 0x0002bc0000000000L, 0x0002a70000000000L, 0x00028a0000000000L, 0x0002910000000000L, 0x0002080000000000L, 0x0002130000000000L, 0x00023e0000000000L, 0x0002250000000000L, 0x0002640000000000L, 0x00027f0000000000L, 0x0002520000000000L, 0x0002490000000000L, 0x0006c00000000000L, 0x0006db0000000000L, 0x0006f60000000000L, 0x0006ed0000000000L, 0x0006ac0000000000L, 0x0006b70000000000L, 0x00069a0000000000L, 0x0006810000000000L, 0x0006180000000000L, 0x0006030000000000L, 0x00062e0000000000L, 0x0006350000000000L, 0x0006740000000000L, 0x00066f0000000000L, 0x0006420000000000L, 0x0006590000000000L, 0x0007700000000000L, 0x00076b0000000000L, 0x0007460000000000L, 0x00075d0000000000L, 0x00071c0000000000L, 0x0007070000000000L, 0x00072a0000000000L, 0x0007310000000000L, 0x0007a80000000000L, 0x0007b30000000000L, 0x00079e0000000000L, 0x0007850000000000L, 0x0007c40000000000L, 0x0007df0000000000L, 0x0007f20000000000L, 0x0007e90000000000L, 0x0005a00000000000L, 0x0005bb0000000000L, 0x0005960000000000L, 0x00058d0000000000L, 0x0005cc0000000000L, 0x0005d70000000000L, 0x0005fa0000000000L, 0x0005e10000000000L, 0x0005780000000000L, 0x0005630000000000L, 0x00054e0000000000L, 0x0005550000000000L, 0x0005140000000000L, 0x00050f0000000000L, 0x0005220000000000L, 0x0005390000000000L, 0x0004100000000000L, 0x00040b0000000000L, 0x0004260000000000L, 0x00043d0000000000L, 0x00047c0000000000L, 0x0004670000000000L, 0x00044a0000000000L, 0x0004510000000000L, 0x0004c80000000000L, 0x0004d30000000000L, 0x0004fe0000000000L, 0x0004e50000000000L, 0x0004a40000000000L, 0x0004bf0000000000L, 0x0004920000000000L, 0x0004890000000000L, 0x000d800000000000L, 0x000d9b0000000000L, 0x000db60000000000L, 0x000dad0000000000L, 0x000dec0000000000L, 0x000df70000000000L, 0x000dda0000000000L, 0x000dc10000000000L, 0x000d580000000000L, 0x000d430000000000L, 0x000d6e0000000000L, 0x000d750000000000L, 0x000d340000000000L, 0x000d2f0000000000L, 0x000d020000000000L, 0x000d190000000000L, 0x000c300000000000L, 0x000c2b0000000000L, 0x000c060000000000L, 0x000c1d0000000000L, 0x000c5c0000000000L, 0x000c470000000000L, 0x000c6a0000000000L, 0x000c710000000000L, 0x000ce80000000000L, 0x000cf30000000000L, 0x000cde0000000000L, 0x000cc50000000000L, 0x000c840000000000L, 0x000c9f0000000000L, 0x000cb20000000000L, 0x000ca90000000000L, 0x000ee00000000000L, 0x000efb0000000000L, 0x000ed60000000000L, 0x000ecd0000000000L, 0x000e8c0000000000L, 0x000e970000000000L, 0x000eba0000000000L, 0x000ea10000000000L, 0x000e380000000000L, 0x000e230000000000L, 0x000e0e0000000000L, 0x000e150000000000L, 0x000e540000000000L, 0x000e4f0000000000L, 0x000e620000000000L, 0x000e790000000000L, 0x000f500000000000L, 0x000f4b0000000000L, 0x000f660000000000L, 0x000f7d0000000000L, 0x000f3c0000000000L, 0x000f270000000000L, 0x000f0a0000000000L, 0x000f110000000000L, 0x000f880000000000L, 0x000f930000000000L, 0x000fbe0000000000L, 0x000fa50000000000L, 0x000fe40000000000L, 0x000fff0000000000L, 0x000fd20000000000L, 0x000fc90000000000L, 0x000b400000000000L, 0x000b5b0000000000L, 0x000b760000000000L, 0x000b6d0000000000L, 0x000b2c0000000000L, 0x000b370000000000L, 0x000b1a0000000000L, 0x000b010000000000L, 0x000b980000000000L, 0x000b830000000000L, 0x000bae0000000000L, 0x000bb50000000000L, 0x000bf40000000000L, 0x000bef0000000000L, 0x000bc20000000000L, 0x000bd90000000000L, 0x000af00000000000L, 0x000aeb0000000000L, 0x000ac60000000000L, 0x000add0000000000L, 0x000a9c0000000000L, 0x000a870000000000L, 0x000aaa0000000000L, 0x000ab10000000000L, 0x000a280000000000L, 0x000a330000000000L, 0x000a1e0000000000L, 0x000a050000000000L, 0x000a440000000000L, 0x000a5f0000000000L, 0x000a720000000000L, 0x000a690000000000L, 0x0008200000000000L, 0x00083b0000000000L, 0x0008160000000000L, 0x00080d0000000000L, 0x00084c0000000000L, 0x0008570000000000L, 0x00087a0000000000L, 0x0008610000000000L, 0x0008f80000000000L, 0x0008e30000000000L, 0x0008ce0000000000L, 0x0008d50000000000L, 0x0008940000000000L, 0x00088f0000000000L, 0x0008a20000000000L, 0x0008b90000000000L, 0x0009900000000000L, 0x00098b0000000000L, 0x0009a60000000000L, 0x0009bd0000000000L, 0x0009fc0000000000L, 0x0009e70000000000L, 0x0009ca0000000000L, 0x0009d10000000000L, 0x0009480000000000L, 0x0009530000000000L, 0x00097e0000000000L, 0x0009650000000000L, 0x0009240000000000L, 0x00093f0000000000L, 0x0009120000000000L, 0x0009090000000000L },
+ { 0x0000000000000000L, 0x0000001b00000000L, 0x0000003600000000L, 0x0000002d00000000L, 0x0000006c00000000L, 0x0000007700000000L, 0x0000005a00000000L, 0x0000004100000000L, 0x000000d800000000L, 0x000000c300000000L, 0x000000ee00000000L, 0x000000f500000000L, 0x000000b400000000L, 0x000000af00000000L, 0x0000008200000000L, 0x0000009900000000L, 0x000001b000000000L, 0x000001ab00000000L, 0x0000018600000000L, 0x0000019d00000000L, 0x000001dc00000000L, 0x000001c700000000L, 0x000001ea00000000L, 0x000001f100000000L, 0x0000016800000000L, 0x0000017300000000L, 0x0000015e00000000L, 0x0000014500000000L, 0x0000010400000000L, 0x0000011f00000000L, 0x0000013200000000L, 0x0000012900000000L, 0x0000036000000000L, 0x0000037b00000000L, 0x0000035600000000L, 0x0000034d00000000L, 0x0000030c00000000L, 0x0000031700000000L, 0x0000033a00000000L, 0x0000032100000000L, 0x000003b800000000L, 0x000003a300000000L, 0x0000038e00000000L, 0x0000039500000000L, 0x000003d400000000L, 0x000003cf00000000L, 0x000003e200000000L, 0x000003f900000000L, 0x000002d000000000L, 0x000002cb00000000L, 0x000002e600000000L, 0x000002fd00000000L, 0x000002bc00000000L, 0x000002a700000000L, 0x0000028a00000000L, 0x0000029100000000L, 0x0000020800000000L, 0x0000021300000000L, 0x0000023e00000000L, 0x0000022500000000L, 0x0000026400000000L, 0x0000027f00000000L, 0x0000025200000000L, 0x0000024900000000L, 0x000006c000000000L, 0x000006db00000000L, 0x000006f600000000L, 0x000006ed00000000L, 0x000006ac00000000L, 0x000006b700000000L, 0x0000069a00000000L, 0x0000068100000000L, 0x0000061800000000L, 0x0000060300000000L, 0x0000062e00000000L, 0x0000063500000000L, 0x0000067400000000L, 0x0000066f00000000L, 0x0000064200000000L, 0x0000065900000000L, 0x0000077000000000L, 0x0000076b00000000L, 0x0000074600000000L, 0x0000075d00000000L, 0x0000071c00000000L, 0x0000070700000000L, 0x0000072a00000000L, 0x0000073100000000L, 0x000007a800000000L, 0x000007b300000000L, 0x0000079e00000000L, 0x0000078500000000L, 0x000007c400000000L, 0x000007df00000000L, 0x000007f200000000L, 0x000007e900000000L, 0x000005a000000000L, 0x000005bb00000000L, 0x0000059600000000L, 0x0000058d00000000L, 0x000005cc00000000L, 0x000005d700000000L, 0x000005fa00000000L, 0x000005e100000000L, 0x0000057800000000L, 0x0000056300000000L, 0x0000054e00000000L, 0x0000055500000000L, 0x0000051400000000L, 0x0000050f00000000L, 0x0000052200000000L, 0x0000053900000000L, 0x0000041000000000L, 0x0000040b00000000L, 0x0000042600000000L, 0x0000043d00000000L, 0x0000047c00000000L, 0x0000046700000000L, 0x0000044a00000000L, 0x0000045100000000L, 0x000004c800000000L, 0x000004d300000000L, 0x000004fe00000000L, 0x000004e500000000L, 0x000004a400000000L, 0x000004bf00000000L, 0x0000049200000000L, 0x0000048900000000L, 0x00000d8000000000L, 0x00000d9b00000000L, 0x00000db600000000L, 0x00000dad00000000L, 0x00000dec00000000L, 0x00000df700000000L, 0x00000dda00000000L, 0x00000dc100000000L, 0x00000d5800000000L, 0x00000d4300000000L, 0x00000d6e00000000L, 0x00000d7500000000L, 0x00000d3400000000L, 0x00000d2f00000000L, 0x00000d0200000000L, 0x00000d1900000000L, 0x00000c3000000000L, 0x00000c2b00000000L, 0x00000c0600000000L, 0x00000c1d00000000L, 0x00000c5c00000000L, 0x00000c4700000000L, 0x00000c6a00000000L, 0x00000c7100000000L, 0x00000ce800000000L, 0x00000cf300000000L, 0x00000cde00000000L, 0x00000cc500000000L, 0x00000c8400000000L, 0x00000c9f00000000L, 0x00000cb200000000L, 0x00000ca900000000L, 0x00000ee000000000L, 0x00000efb00000000L, 0x00000ed600000000L, 0x00000ecd00000000L, 0x00000e8c00000000L, 0x00000e9700000000L, 0x00000eba00000000L, 0x00000ea100000000L, 0x00000e3800000000L, 0x00000e2300000000L, 0x00000e0e00000000L, 0x00000e1500000000L, 0x00000e5400000000L, 0x00000e4f00000000L, 0x00000e6200000000L, 0x00000e7900000000L, 0x00000f5000000000L, 0x00000f4b00000000L, 0x00000f6600000000L, 0x00000f7d00000000L, 0x00000f3c00000000L, 0x00000f2700000000L, 0x00000f0a00000000L, 0x00000f1100000000L, 0x00000f8800000000L, 0x00000f9300000000L, 0x00000fbe00000000L, 0x00000fa500000000L, 0x00000fe400000000L, 0x00000fff00000000L, 0x00000fd200000000L, 0x00000fc900000000L, 0x00000b4000000000L, 0x00000b5b00000000L, 0x00000b7600000000L, 0x00000b6d00000000L, 0x00000b2c00000000L, 0x00000b3700000000L, 0x00000b1a00000000L, 0x00000b0100000000L, 0x00000b9800000000L, 0x00000b8300000000L, 0x00000bae00000000L, 0x00000bb500000000L, 0x00000bf400000000L, 0x00000bef00000000L, 0x00000bc200000000L, 0x00000bd900000000L, 0x00000af000000000L, 0x00000aeb00000000L, 0x00000ac600000000L, 0x00000add00000000L, 0x00000a9c00000000L, 0x00000a8700000000L, 0x00000aaa00000000L, 0x00000ab100000000L, 0x00000a2800000000L, 0x00000a3300000000L, 0x00000a1e00000000L, 0x00000a0500000000L, 0x00000a4400000000L, 0x00000a5f00000000L, 0x00000a7200000000L, 0x00000a6900000000L, 0x0000082000000000L, 0x0000083b00000000L, 0x0000081600000000L, 0x0000080d00000000L, 0x0000084c00000000L, 0x0000085700000000L, 0x0000087a00000000L, 0x0000086100000000L, 0x000008f800000000L, 0x000008e300000000L, 0x000008ce00000000L, 0x000008d500000000L, 0x0000089400000000L, 0x0000088f00000000L, 0x000008a200000000L, 0x000008b900000000L, 0x0000099000000000L, 0x0000098b00000000L, 0x000009a600000000L, 0x000009bd00000000L, 0x000009fc00000000L, 0x000009e700000000L, 0x000009ca00000000L, 0x000009d100000000L, 0x0000094800000000L, 0x0000095300000000L, 0x0000097e00000000L, 0x0000096500000000L, 0x0000092400000000L, 0x0000093f00000000L, 0x0000091200000000L, 0x0000090900000000L },
+ { 0x0000000000000000L, 0x000000001b000000L, 0x0000000036000000L, 0x000000002d000000L, 0x000000006c000000L, 0x0000000077000000L, 0x000000005a000000L, 0x0000000041000000L, 0x00000000d8000000L, 0x00000000c3000000L, 0x00000000ee000000L, 0x00000000f5000000L, 0x00000000b4000000L, 0x00000000af000000L, 0x0000000082000000L, 0x0000000099000000L, 0x00000001b0000000L, 0x00000001ab000000L, 0x0000000186000000L, 0x000000019d000000L, 0x00000001dc000000L, 0x00000001c7000000L, 0x00000001ea000000L, 0x00000001f1000000L, 0x0000000168000000L, 0x0000000173000000L, 0x000000015e000000L, 0x0000000145000000L, 0x0000000104000000L, 0x000000011f000000L, 0x0000000132000000L, 0x0000000129000000L, 0x0000000360000000L, 0x000000037b000000L, 0x0000000356000000L, 0x000000034d000000L, 0x000000030c000000L, 0x0000000317000000L, 0x000000033a000000L, 0x0000000321000000L, 0x00000003b8000000L, 0x00000003a3000000L, 0x000000038e000000L, 0x0000000395000000L, 0x00000003d4000000L, 0x00000003cf000000L, 0x00000003e2000000L, 0x00000003f9000000L, 0x00000002d0000000L, 0x00000002cb000000L, 0x00000002e6000000L, 0x00000002fd000000L, 0x00000002bc000000L, 0x00000002a7000000L, 0x000000028a000000L, 0x0000000291000000L, 0x0000000208000000L, 0x0000000213000000L, 0x000000023e000000L, 0x0000000225000000L, 0x0000000264000000L, 0x000000027f000000L, 0x0000000252000000L, 0x0000000249000000L, 0x00000006c0000000L, 0x00000006db000000L, 0x00000006f6000000L, 0x00000006ed000000L, 0x00000006ac000000L, 0x00000006b7000000L, 0x000000069a000000L, 0x0000000681000000L, 0x0000000618000000L, 0x0000000603000000L, 0x000000062e000000L, 0x0000000635000000L, 0x0000000674000000L, 0x000000066f000000L, 0x0000000642000000L, 0x0000000659000000L, 0x0000000770000000L, 0x000000076b000000L, 0x0000000746000000L, 0x000000075d000000L, 0x000000071c000000L, 0x0000000707000000L, 0x000000072a000000L, 0x0000000731000000L, 0x00000007a8000000L, 0x00000007b3000000L, 0x000000079e000000L, 0x0000000785000000L, 0x00000007c4000000L, 0x00000007df000000L, 0x00000007f2000000L, 0x00000007e9000000L, 0x00000005a0000000L, 0x00000005bb000000L, 0x0000000596000000L, 0x000000058d000000L, 0x00000005cc000000L, 0x00000005d7000000L, 0x00000005fa000000L, 0x00000005e1000000L, 0x0000000578000000L, 0x0000000563000000L, 0x000000054e000000L, 0x0000000555000000L, 0x0000000514000000L, 0x000000050f000000L, 0x0000000522000000L, 0x0000000539000000L, 0x0000000410000000L, 0x000000040b000000L, 0x0000000426000000L, 0x000000043d000000L, 0x000000047c000000L, 0x0000000467000000L, 0x000000044a000000L, 0x0000000451000000L, 0x00000004c8000000L, 0x00000004d3000000L, 0x00000004fe000000L, 0x00000004e5000000L, 0x00000004a4000000L, 0x00000004bf000000L, 0x0000000492000000L, 0x0000000489000000L, 0x0000000d80000000L, 0x0000000d9b000000L, 0x0000000db6000000L, 0x0000000dad000000L, 0x0000000dec000000L, 0x0000000df7000000L, 0x0000000dda000000L, 0x0000000dc1000000L, 0x0000000d58000000L, 0x0000000d43000000L, 0x0000000d6e000000L, 0x0000000d75000000L, 0x0000000d34000000L, 0x0000000d2f000000L, 0x0000000d02000000L, 0x0000000d19000000L, 0x0000000c30000000L, 0x0000000c2b000000L, 0x0000000c06000000L, 0x0000000c1d000000L, 0x0000000c5c000000L, 0x0000000c47000000L, 0x0000000c6a000000L, 0x0000000c71000000L, 0x0000000ce8000000L, 0x0000000cf3000000L, 0x0000000cde000000L, 0x0000000cc5000000L, 0x0000000c84000000L, 0x0000000c9f000000L, 0x0000000cb2000000L, 0x0000000ca9000000L, 0x0000000ee0000000L, 0x0000000efb000000L, 0x0000000ed6000000L, 0x0000000ecd000000L, 0x0000000e8c000000L, 0x0000000e97000000L, 0x0000000eba000000L, 0x0000000ea1000000L, 0x0000000e38000000L, 0x0000000e23000000L, 0x0000000e0e000000L, 0x0000000e15000000L, 0x0000000e54000000L, 0x0000000e4f000000L, 0x0000000e62000000L, 0x0000000e79000000L, 0x0000000f50000000L, 0x0000000f4b000000L, 0x0000000f66000000L, 0x0000000f7d000000L, 0x0000000f3c000000L, 0x0000000f27000000L, 0x0000000f0a000000L, 0x0000000f11000000L, 0x0000000f88000000L, 0x0000000f93000000L, 0x0000000fbe000000L, 0x0000000fa5000000L, 0x0000000fe4000000L, 0x0000000fff000000L, 0x0000000fd2000000L, 0x0000000fc9000000L, 0x0000000b40000000L, 0x0000000b5b000000L, 0x0000000b76000000L, 0x0000000b6d000000L, 0x0000000b2c000000L, 0x0000000b37000000L, 0x0000000b1a000000L, 0x0000000b01000000L, 0x0000000b98000000L, 0x0000000b83000000L, 0x0000000bae000000L, 0x0000000bb5000000L, 0x0000000bf4000000L, 0x0000000bef000000L, 0x0000000bc2000000L, 0x0000000bd9000000L, 0x0000000af0000000L, 0x0000000aeb000000L, 0x0000000ac6000000L, 0x0000000add000000L, 0x0000000a9c000000L, 0x0000000a87000000L, 0x0000000aaa000000L, 0x0000000ab1000000L, 0x0000000a28000000L, 0x0000000a33000000L, 0x0000000a1e000000L, 0x0000000a05000000L, 0x0000000a44000000L, 0x0000000a5f000000L, 0x0000000a72000000L, 0x0000000a69000000L, 0x0000000820000000L, 0x000000083b000000L, 0x0000000816000000L, 0x000000080d000000L, 0x000000084c000000L, 0x0000000857000000L, 0x000000087a000000L, 0x0000000861000000L, 0x00000008f8000000L, 0x00000008e3000000L, 0x00000008ce000000L, 0x00000008d5000000L, 0x0000000894000000L, 0x000000088f000000L, 0x00000008a2000000L, 0x00000008b9000000L, 0x0000000990000000L, 0x000000098b000000L, 0x00000009a6000000L, 0x00000009bd000000L, 0x00000009fc000000L, 0x00000009e7000000L, 0x00000009ca000000L, 0x00000009d1000000L, 0x0000000948000000L, 0x0000000953000000L, 0x000000097e000000L, 0x0000000965000000L, 0x0000000924000000L, 0x000000093f000000L, 0x0000000912000000L, 0x0000000909000000L },
+ { 0x0000000000000000L, 0x00000000001b0000L, 0x0000000000360000L, 0x00000000002d0000L, 0x00000000006c0000L, 0x0000000000770000L, 0x00000000005a0000L, 0x0000000000410000L, 0x0000000000d80000L, 0x0000000000c30000L, 0x0000000000ee0000L, 0x0000000000f50000L, 0x0000000000b40000L, 0x0000000000af0000L, 0x0000000000820000L, 0x0000000000990000L, 0x0000000001b00000L, 0x0000000001ab0000L, 0x0000000001860000L, 0x00000000019d0000L, 0x0000000001dc0000L, 0x0000000001c70000L, 0x0000000001ea0000L, 0x0000000001f10000L, 0x0000000001680000L, 0x0000000001730000L, 0x00000000015e0000L, 0x0000000001450000L, 0x0000000001040000L, 0x00000000011f0000L, 0x0000000001320000L, 0x0000000001290000L, 0x0000000003600000L, 0x00000000037b0000L, 0x0000000003560000L, 0x00000000034d0000L, 0x00000000030c0000L, 0x0000000003170000L, 0x00000000033a0000L, 0x0000000003210000L, 0x0000000003b80000L, 0x0000000003a30000L, 0x00000000038e0000L, 0x0000000003950000L, 0x0000000003d40000L, 0x0000000003cf0000L, 0x0000000003e20000L, 0x0000000003f90000L, 0x0000000002d00000L, 0x0000000002cb0000L, 0x0000000002e60000L, 0x0000000002fd0000L, 0x0000000002bc0000L, 0x0000000002a70000L, 0x00000000028a0000L, 0x0000000002910000L, 0x0000000002080000L, 0x0000000002130000L, 0x00000000023e0000L, 0x0000000002250000L, 0x0000000002640000L, 0x00000000027f0000L, 0x0000000002520000L, 0x0000000002490000L, 0x0000000006c00000L, 0x0000000006db0000L, 0x0000000006f60000L, 0x0000000006ed0000L, 0x0000000006ac0000L, 0x0000000006b70000L, 0x00000000069a0000L, 0x0000000006810000L, 0x0000000006180000L, 0x0000000006030000L, 0x00000000062e0000L, 0x0000000006350000L, 0x0000000006740000L, 0x00000000066f0000L, 0x0000000006420000L, 0x0000000006590000L, 0x0000000007700000L, 0x00000000076b0000L, 0x0000000007460000L, 0x00000000075d0000L, 0x00000000071c0000L, 0x0000000007070000L, 0x00000000072a0000L, 0x0000000007310000L, 0x0000000007a80000L, 0x0000000007b30000L, 0x00000000079e0000L, 0x0000000007850000L, 0x0000000007c40000L, 0x0000000007df0000L, 0x0000000007f20000L, 0x0000000007e90000L, 0x0000000005a00000L, 0x0000000005bb0000L, 0x0000000005960000L, 0x00000000058d0000L, 0x0000000005cc0000L, 0x0000000005d70000L, 0x0000000005fa0000L, 0x0000000005e10000L, 0x0000000005780000L, 0x0000000005630000L, 0x00000000054e0000L, 0x0000000005550000L, 0x0000000005140000L, 0x00000000050f0000L, 0x0000000005220000L, 0x0000000005390000L, 0x0000000004100000L, 0x00000000040b0000L, 0x0000000004260000L, 0x00000000043d0000L, 0x00000000047c0000L, 0x0000000004670000L, 0x00000000044a0000L, 0x0000000004510000L, 0x0000000004c80000L, 0x0000000004d30000L, 0x0000000004fe0000L, 0x0000000004e50000L, 0x0000000004a40000L, 0x0000000004bf0000L, 0x0000000004920000L, 0x0000000004890000L, 0x000000000d800000L, 0x000000000d9b0000L, 0x000000000db60000L, 0x000000000dad0000L, 0x000000000dec0000L, 0x000000000df70000L, 0x000000000dda0000L, 0x000000000dc10000L, 0x000000000d580000L, 0x000000000d430000L, 0x000000000d6e0000L, 0x000000000d750000L, 0x000000000d340000L, 0x000000000d2f0000L, 0x000000000d020000L, 0x000000000d190000L, 0x000000000c300000L, 0x000000000c2b0000L, 0x000000000c060000L, 0x000000000c1d0000L, 0x000000000c5c0000L, 0x000000000c470000L, 0x000000000c6a0000L, 0x000000000c710000L, 0x000000000ce80000L, 0x000000000cf30000L, 0x000000000cde0000L, 0x000000000cc50000L, 0x000000000c840000L, 0x000000000c9f0000L, 0x000000000cb20000L, 0x000000000ca90000L, 0x000000000ee00000L, 0x000000000efb0000L, 0x000000000ed60000L, 0x000000000ecd0000L, 0x000000000e8c0000L, 0x000000000e970000L, 0x000000000eba0000L, 0x000000000ea10000L, 0x000000000e380000L, 0x000000000e230000L, 0x000000000e0e0000L, 0x000000000e150000L, 0x000000000e540000L, 0x000000000e4f0000L, 0x000000000e620000L, 0x000000000e790000L, 0x000000000f500000L, 0x000000000f4b0000L, 0x000000000f660000L, 0x000000000f7d0000L, 0x000000000f3c0000L, 0x000000000f270000L, 0x000000000f0a0000L, 0x000000000f110000L, 0x000000000f880000L, 0x000000000f930000L, 0x000000000fbe0000L, 0x000000000fa50000L, 0x000000000fe40000L, 0x000000000fff0000L, 0x000000000fd20000L, 0x000000000fc90000L, 0x000000000b400000L, 0x000000000b5b0000L, 0x000000000b760000L, 0x000000000b6d0000L, 0x000000000b2c0000L, 0x000000000b370000L, 0x000000000b1a0000L, 0x000000000b010000L, 0x000000000b980000L, 0x000000000b830000L, 0x000000000bae0000L, 0x000000000bb50000L, 0x000000000bf40000L, 0x000000000bef0000L, 0x000000000bc20000L, 0x000000000bd90000L, 0x000000000af00000L, 0x000000000aeb0000L, 0x000000000ac60000L, 0x000000000add0000L, 0x000000000a9c0000L, 0x000000000a870000L, 0x000000000aaa0000L, 0x000000000ab10000L, 0x000000000a280000L, 0x000000000a330000L, 0x000000000a1e0000L, 0x000000000a050000L, 0x000000000a440000L, 0x000000000a5f0000L, 0x000000000a720000L, 0x000000000a690000L, 0x0000000008200000L, 0x00000000083b0000L, 0x0000000008160000L, 0x00000000080d0000L, 0x00000000084c0000L, 0x0000000008570000L, 0x00000000087a0000L, 0x0000000008610000L, 0x0000000008f80000L, 0x0000000008e30000L, 0x0000000008ce0000L, 0x0000000008d50000L, 0x0000000008940000L, 0x00000000088f0000L, 0x0000000008a20000L, 0x0000000008b90000L, 0x0000000009900000L, 0x00000000098b0000L, 0x0000000009a60000L, 0x0000000009bd0000L, 0x0000000009fc0000L, 0x0000000009e70000L, 0x0000000009ca0000L, 0x0000000009d10000L, 0x0000000009480000L, 0x0000000009530000L, 0x00000000097e0000L, 0x0000000009650000L, 0x0000000009240000L, 0x00000000093f0000L, 0x0000000009120000L, 0x0000000009090000L },
+ { 0x0000000000000000L, 0x0000000000001b00L, 0x0000000000003600L, 0x0000000000002d00L, 0x0000000000006c00L, 0x0000000000007700L, 0x0000000000005a00L, 0x0000000000004100L, 0x000000000000d800L, 0x000000000000c300L, 0x000000000000ee00L, 0x000000000000f500L, 0x000000000000b400L, 0x000000000000af00L, 0x0000000000008200L, 0x0000000000009900L, 0x000000000001b000L, 0x000000000001ab00L, 0x0000000000018600L, 0x0000000000019d00L, 0x000000000001dc00L, 0x000000000001c700L, 0x000000000001ea00L, 0x000000000001f100L, 0x0000000000016800L, 0x0000000000017300L, 0x0000000000015e00L, 0x0000000000014500L, 0x0000000000010400L, 0x0000000000011f00L, 0x0000000000013200L, 0x0000000000012900L, 0x0000000000036000L, 0x0000000000037b00L, 0x0000000000035600L, 0x0000000000034d00L, 0x0000000000030c00L, 0x0000000000031700L, 0x0000000000033a00L, 0x0000000000032100L, 0x000000000003b800L, 0x000000000003a300L, 0x0000000000038e00L, 0x0000000000039500L, 0x000000000003d400L, 0x000000000003cf00L, 0x000000000003e200L, 0x000000000003f900L, 0x000000000002d000L, 0x000000000002cb00L, 0x000000000002e600L, 0x000000000002fd00L, 0x000000000002bc00L, 0x000000000002a700L, 0x0000000000028a00L, 0x0000000000029100L, 0x0000000000020800L, 0x0000000000021300L, 0x0000000000023e00L, 0x0000000000022500L, 0x0000000000026400L, 0x0000000000027f00L, 0x0000000000025200L, 0x0000000000024900L, 0x000000000006c000L, 0x000000000006db00L, 0x000000000006f600L, 0x000000000006ed00L, 0x000000000006ac00L, 0x000000000006b700L, 0x0000000000069a00L, 0x0000000000068100L, 0x0000000000061800L, 0x0000000000060300L, 0x0000000000062e00L, 0x0000000000063500L, 0x0000000000067400L, 0x0000000000066f00L, 0x0000000000064200L, 0x0000000000065900L, 0x0000000000077000L, 0x0000000000076b00L, 0x0000000000074600L, 0x0000000000075d00L, 0x0000000000071c00L, 0x0000000000070700L, 0x0000000000072a00L, 0x0000000000073100L, 0x000000000007a800L, 0x000000000007b300L, 0x0000000000079e00L, 0x0000000000078500L, 0x000000000007c400L, 0x000000000007df00L, 0x000000000007f200L, 0x000000000007e900L, 0x000000000005a000L, 0x000000000005bb00L, 0x0000000000059600L, 0x0000000000058d00L, 0x000000000005cc00L, 0x000000000005d700L, 0x000000000005fa00L, 0x000000000005e100L, 0x0000000000057800L, 0x0000000000056300L, 0x0000000000054e00L, 0x0000000000055500L, 0x0000000000051400L, 0x0000000000050f00L, 0x0000000000052200L, 0x0000000000053900L, 0x0000000000041000L, 0x0000000000040b00L, 0x0000000000042600L, 0x0000000000043d00L, 0x0000000000047c00L, 0x0000000000046700L, 0x0000000000044a00L, 0x0000000000045100L, 0x000000000004c800L, 0x000000000004d300L, 0x000000000004fe00L, 0x000000000004e500L, 0x000000000004a400L, 0x000000000004bf00L, 0x0000000000049200L, 0x0000000000048900L, 0x00000000000d8000L, 0x00000000000d9b00L, 0x00000000000db600L, 0x00000000000dad00L, 0x00000000000dec00L, 0x00000000000df700L, 0x00000000000dda00L, 0x00000000000dc100L, 0x00000000000d5800L, 0x00000000000d4300L, 0x00000000000d6e00L, 0x00000000000d7500L, 0x00000000000d3400L, 0x00000000000d2f00L, 0x00000000000d0200L, 0x00000000000d1900L, 0x00000000000c3000L, 0x00000000000c2b00L, 0x00000000000c0600L, 0x00000000000c1d00L, 0x00000000000c5c00L, 0x00000000000c4700L, 0x00000000000c6a00L, 0x00000000000c7100L, 0x00000000000ce800L, 0x00000000000cf300L, 0x00000000000cde00L, 0x00000000000cc500L, 0x00000000000c8400L, 0x00000000000c9f00L, 0x00000000000cb200L, 0x00000000000ca900L, 0x00000000000ee000L, 0x00000000000efb00L, 0x00000000000ed600L, 0x00000000000ecd00L, 0x00000000000e8c00L, 0x00000000000e9700L, 0x00000000000eba00L, 0x00000000000ea100L, 0x00000000000e3800L, 0x00000000000e2300L, 0x00000000000e0e00L, 0x00000000000e1500L, 0x00000000000e5400L, 0x00000000000e4f00L, 0x00000000000e6200L, 0x00000000000e7900L, 0x00000000000f5000L, 0x00000000000f4b00L, 0x00000000000f6600L, 0x00000000000f7d00L, 0x00000000000f3c00L, 0x00000000000f2700L, 0x00000000000f0a00L, 0x00000000000f1100L, 0x00000000000f8800L, 0x00000000000f9300L, 0x00000000000fbe00L, 0x00000000000fa500L, 0x00000000000fe400L, 0x00000000000fff00L, 0x00000000000fd200L, 0x00000000000fc900L, 0x00000000000b4000L, 0x00000000000b5b00L, 0x00000000000b7600L, 0x00000000000b6d00L, 0x00000000000b2c00L, 0x00000000000b3700L, 0x00000000000b1a00L, 0x00000000000b0100L, 0x00000000000b9800L, 0x00000000000b8300L, 0x00000000000bae00L, 0x00000000000bb500L, 0x00000000000bf400L, 0x00000000000bef00L, 0x00000000000bc200L, 0x00000000000bd900L, 0x00000000000af000L, 0x00000000000aeb00L, 0x00000000000ac600L, 0x00000000000add00L, 0x00000000000a9c00L, 0x00000000000a8700L, 0x00000000000aaa00L, 0x00000000000ab100L, 0x00000000000a2800L, 0x00000000000a3300L, 0x00000000000a1e00L, 0x00000000000a0500L, 0x00000000000a4400L, 0x00000000000a5f00L, 0x00000000000a7200L, 0x00000000000a6900L, 0x0000000000082000L, 0x0000000000083b00L, 0x0000000000081600L, 0x0000000000080d00L, 0x0000000000084c00L, 0x0000000000085700L, 0x0000000000087a00L, 0x0000000000086100L, 0x000000000008f800L, 0x000000000008e300L, 0x000000000008ce00L, 0x000000000008d500L, 0x0000000000089400L, 0x0000000000088f00L, 0x000000000008a200L, 0x000000000008b900L, 0x0000000000099000L, 0x0000000000098b00L, 0x000000000009a600L, 0x000000000009bd00L, 0x000000000009fc00L, 0x000000000009e700L, 0x000000000009ca00L, 0x000000000009d100L, 0x0000000000094800L, 0x0000000000095300L, 0x0000000000097e00L, 0x0000000000096500L, 0x0000000000092400L, 0x0000000000093f00L, 0x0000000000091200L, 0x0000000000090900L },
+ { 0x0000000000000000L, 0x000000000000001bL, 0x0000000000000036L, 0x000000000000002dL, 0x000000000000006cL, 0x0000000000000077L, 0x000000000000005aL, 0x0000000000000041L, 0x00000000000000d8L, 0x00000000000000c3L, 0x00000000000000eeL, 0x00000000000000f5L, 0x00000000000000b4L, 0x00000000000000afL, 0x0000000000000082L, 0x0000000000000099L, 0x00000000000001b0L, 0x00000000000001abL, 0x0000000000000186L, 0x000000000000019dL, 0x00000000000001dcL, 0x00000000000001c7L, 0x00000000000001eaL, 0x00000000000001f1L, 0x0000000000000168L, 0x0000000000000173L, 0x000000000000015eL, 0x0000000000000145L, 0x0000000000000104L, 0x000000000000011fL, 0x0000000000000132L, 0x0000000000000129L, 0x0000000000000360L, 0x000000000000037bL, 0x0000000000000356L, 0x000000000000034dL, 0x000000000000030cL, 0x0000000000000317L, 0x000000000000033aL, 0x0000000000000321L, 0x00000000000003b8L, 0x00000000000003a3L, 0x000000000000038eL, 0x0000000000000395L, 0x00000000000003d4L, 0x00000000000003cfL, 0x00000000000003e2L, 0x00000000000003f9L, 0x00000000000002d0L, 0x00000000000002cbL, 0x00000000000002e6L, 0x00000000000002fdL, 0x00000000000002bcL, 0x00000000000002a7L, 0x000000000000028aL, 0x0000000000000291L, 0x0000000000000208L, 0x0000000000000213L, 0x000000000000023eL, 0x0000000000000225L, 0x0000000000000264L, 0x000000000000027fL, 0x0000000000000252L, 0x0000000000000249L, 0x00000000000006c0L, 0x00000000000006dbL, 0x00000000000006f6L, 0x00000000000006edL, 0x00000000000006acL, 0x00000000000006b7L, 0x000000000000069aL, 0x0000000000000681L, 0x0000000000000618L, 0x0000000000000603L, 0x000000000000062eL, 0x0000000000000635L, 0x0000000000000674L, 0x000000000000066fL, 0x0000000000000642L, 0x0000000000000659L, 0x0000000000000770L, 0x000000000000076bL, 0x0000000000000746L, 0x000000000000075dL, 0x000000000000071cL, 0x0000000000000707L, 0x000000000000072aL, 0x0000000000000731L, 0x00000000000007a8L, 0x00000000000007b3L, 0x000000000000079eL, 0x0000000000000785L, 0x00000000000007c4L, 0x00000000000007dfL, 0x00000000000007f2L, 0x00000000000007e9L, 0x00000000000005a0L, 0x00000000000005bbL, 0x0000000000000596L, 0x000000000000058dL, 0x00000000000005ccL, 0x00000000000005d7L, 0x00000000000005faL, 0x00000000000005e1L, 0x0000000000000578L, 0x0000000000000563L, 0x000000000000054eL, 0x0000000000000555L, 0x0000000000000514L, 0x000000000000050fL, 0x0000000000000522L, 0x0000000000000539L, 0x0000000000000410L, 0x000000000000040bL, 0x0000000000000426L, 0x000000000000043dL, 0x000000000000047cL, 0x0000000000000467L, 0x000000000000044aL, 0x0000000000000451L, 0x00000000000004c8L, 0x00000000000004d3L, 0x00000000000004feL, 0x00000000000004e5L, 0x00000000000004a4L, 0x00000000000004bfL, 0x0000000000000492L, 0x0000000000000489L, 0x0000000000000d80L, 0x0000000000000d9bL, 0x0000000000000db6L, 0x0000000000000dadL, 0x0000000000000decL, 0x0000000000000df7L, 0x0000000000000ddaL, 0x0000000000000dc1L, 0x0000000000000d58L, 0x0000000000000d43L, 0x0000000000000d6eL, 0x0000000000000d75L, 0x0000000000000d34L, 0x0000000000000d2fL, 0x0000000000000d02L, 0x0000000000000d19L, 0x0000000000000c30L, 0x0000000000000c2bL, 0x0000000000000c06L, 0x0000000000000c1dL, 0x0000000000000c5cL, 0x0000000000000c47L, 0x0000000000000c6aL, 0x0000000000000c71L, 0x0000000000000ce8L, 0x0000000000000cf3L, 0x0000000000000cdeL, 0x0000000000000cc5L, 0x0000000000000c84L, 0x0000000000000c9fL, 0x0000000000000cb2L, 0x0000000000000ca9L, 0x0000000000000ee0L, 0x0000000000000efbL, 0x0000000000000ed6L, 0x0000000000000ecdL, 0x0000000000000e8cL, 0x0000000000000e97L, 0x0000000000000ebaL, 0x0000000000000ea1L, 0x0000000000000e38L, 0x0000000000000e23L, 0x0000000000000e0eL, 0x0000000000000e15L, 0x0000000000000e54L, 0x0000000000000e4fL, 0x0000000000000e62L, 0x0000000000000e79L, 0x0000000000000f50L, 0x0000000000000f4bL, 0x0000000000000f66L, 0x0000000000000f7dL, 0x0000000000000f3cL, 0x0000000000000f27L, 0x0000000000000f0aL, 0x0000000000000f11L, 0x0000000000000f88L, 0x0000000000000f93L, 0x0000000000000fbeL, 0x0000000000000fa5L, 0x0000000000000fe4L, 0x0000000000000fffL, 0x0000000000000fd2L, 0x0000000000000fc9L, 0x0000000000000b40L, 0x0000000000000b5bL, 0x0000000000000b76L, 0x0000000000000b6dL, 0x0000000000000b2cL, 0x0000000000000b37L, 0x0000000000000b1aL, 0x0000000000000b01L, 0x0000000000000b98L, 0x0000000000000b83L, 0x0000000000000baeL, 0x0000000000000bb5L, 0x0000000000000bf4L, 0x0000000000000befL, 0x0000000000000bc2L, 0x0000000000000bd9L, 0x0000000000000af0L, 0x0000000000000aebL, 0x0000000000000ac6L, 0x0000000000000addL, 0x0000000000000a9cL, 0x0000000000000a87L, 0x0000000000000aaaL, 0x0000000000000ab1L, 0x0000000000000a28L, 0x0000000000000a33L, 0x0000000000000a1eL, 0x0000000000000a05L, 0x0000000000000a44L, 0x0000000000000a5fL, 0x0000000000000a72L, 0x0000000000000a69L, 0x0000000000000820L, 0x000000000000083bL, 0x0000000000000816L, 0x000000000000080dL, 0x000000000000084cL, 0x0000000000000857L, 0x000000000000087aL, 0x0000000000000861L, 0x00000000000008f8L, 0x00000000000008e3L, 0x00000000000008ceL, 0x00000000000008d5L, 0x0000000000000894L, 0x000000000000088fL, 0x00000000000008a2L, 0x00000000000008b9L, 0x0000000000000990L, 0x000000000000098bL, 0x00000000000009a6L, 0x00000000000009bdL, 0x00000000000009fcL, 0x00000000000009e7L, 0x00000000000009caL, 0x00000000000009d1L, 0x0000000000000948L, 0x0000000000000953L, 0x000000000000097eL, 0x0000000000000965L, 0x0000000000000924L, 0x000000000000093fL, 0x0000000000000912L, 0x0000000000000909L },
+ };
+
+ /**
+ * Computes the 64 bit CRC of a byte array.
+ *
+ * @param x the byte array to CRC.
+ * @return the CRC.
+ */
+ public static long compute(final byte x[]) {
+ long w1 = 0, w2 = 0;
+ int i;
+ final int l = x.length, /* The length of the argument. */
+ r = l % 8; /* The length mod 8. */
+
+ /* We must prefix the array with 1. */
+
+ w1 = 1L << 8 * r;
+ for (i = 0; i < r; i++)
+ w1 |= ((long)x[i] & 0xFF) << 8 * (r - 1 - i);
+
+ w2 = T[0][(int)(w1 >>> 56)] ^ T[1][(int)(w1 >>> 48 & 0xFF)] ^ T[2][(int)(w1 >>> 40 & 0xFF)] ^ T[3][(int)(w1 >>> 32 & 0xFF)] ^ T[4][(int)(w1 >>> 24 & 0xFF)]
+ ^ T[5][(int)(w1 >>> 16 & 0xFF)] ^ T[6][(int)(w1 >>> 8 & 0xFF)] ^ T[7][(int)(w1 & 0xFF)];
+
+ for (; i < l; i += 8) {
+
+
+ w1 = w2
+ ^ (((long)x[i] & 0xFF) << 56 | ((long)x[i + 1] & 0xFF) << 48 | ((long)x[i + 2] & 0xFF) << 40 | ((long)x[i + 3] & 0xFF) << 32 | ((long)x[i + 4] & 0xFF) << 24
+ | ((long)x[i + 5] & 0xFF) << 16 | ((long)x[i + 6] & 0xFF) << 8 | ((long)x[i + 7] & 0xFF));
+
+ w2 = T[0][(int)(w1 >>> 56)] ^ T[1][(int)(w1 >>> 48 & 0xFF)] ^ T[2][(int)(w1 >>> 40 & 0xFF)] ^ T[3][(int)(w1 >>> 32 & 0xFF)] ^ T[4][(int)(w1 >>> 24 & 0xFF)]
+ ^ T[5][(int)(w1 >>> 16 & 0xFF)] ^ T[6][(int)(w1 >>> 8 & 0xFF)] ^ T[7][(int)(w1 & 0xFF)];
+
+ }
+
+ return w1;
+ }
+
+ /**
+ * Computes the 64 bit CRC of a character array fragment.
+ *
+ * @param x the array to CRC.
+ * @param off the offset inside <code>x</code>.
+ * @param len the number of characters to CRC.
+ * @return the CRC.
+ */
+ public static long compute(final char x[], final int off, final int len) {
+ long w1 = 0, w2 = 0;
+ int i, r = len % 8; /* The length mod 8. */
+
+ /* We must prefix the array with 1. */
+
+ w1 = 1L << 8 * r;
+ for (i = 0; i < r; i++)
+ w1 |= ((long)x[off + i] & 0xFF) << 8 * (r - 1 - i);
+
+ w2 = T[0][(int)(w1 >>> 56)] ^ T[1][(int)(w1 >>> 48 & 0xFF)] ^ T[2][(int)(w1 >>> 40 & 0xFF)] ^ T[3][(int)(w1 >>> 32 & 0xFF)] ^ T[4][(int)(w1 >>> 24 & 0xFF)]
+ ^ T[5][(int)(w1 >>> 16 & 0xFF)] ^ T[6][(int)(w1 >>> 8 & 0xFF)] ^ T[7][(int)(w1 & 0xFF)];
+
+ for (; i < len; i += 8) {
+
+
+ w1 = w2
+ ^ (((long)x[off + i] & 0xFF) << 56 | ((long)x[off + i + 1] & 0xFF) << 48 | ((long)x[off + i + 2] & 0xFF) << 40 | ((long)x[off + i + 3] & 0xFF) << 32
+ | ((long)x[off + i + 4] & 0xFF) << 24 | ((long)x[off + i + 5] & 0xFF) << 16 | ((long)x[off + i + 6] & 0xFF) << 8 | ((long)x[off + i + 7] & 0xFF));
+
+ w2 = T[0][(int)(w1 >>> 56)] ^ T[1][(int)(w1 >>> 48 & 0xFF)] ^ T[2][(int)(w1 >>> 40 & 0xFF)] ^ T[3][(int)(w1 >>> 32 & 0xFF)] ^ T[4][(int)(w1 >>> 24 & 0xFF)]
+ ^ T[5][(int)(w1 >>> 16 & 0xFF)] ^ T[6][(int)(w1 >>> 8 & 0xFF)] ^ T[7][(int)(w1 & 0xFF)];
+
+ }
+
+ return w1;
+ }
+
+ /**
+ * Computes the 64 bit CRC of a string, using the ISO8859-1 representation of its Unicode
+ * characters.
+ *
+ * @param s the byte array to CRC.
+ * @return the CRC.
+ */
+ public static long compute(String s) {
+ try {
+ return compute(s.getBytes("ISO8859-1"));
+ }
+ catch (java.io.UnsupportedEncodingException cantHappen) {
+ // Every Java release should know about ISO8859-1 encoding!
+ return 0;
+ }
+ }
+
+ /**
+ * Computes the 64 bit CRC of a mutable string, using the ISO8859-1 representation of its
+ * Unicode characters.
+ *
+ * @param s the mutable string to CRC.
+ * @return the CRC.
+ */
+ public static long compute(MutableString s) {
+ return compute(s.array(), 0, s.length());
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ConsistentHashFunction.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ConsistentHashFunction.java
new file mode 100644
index 0000000..2dd2f8f
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ConsistentHashFunction.java
@@ -0,0 +1,380 @@
+package it.unimi.dsi.law.util;
+
+import java.util.NoSuchElementException;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.longs.AbstractLong2ObjectSortedMap;
+import it.unimi.dsi.fastutil.longs.Long2ObjectAVLTreeMap;
+import it.unimi.dsi.fastutil.longs.Long2ObjectSortedMap;
+import it.unimi.dsi.fastutil.objects.Object2IntMap;
+import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
+import it.unimi.dsi.fastutil.objects.ObjectArrays;
+import it.unimi.dsi.fastutil.objects.ObjectBidirectionalIterator;
+import it.unimi.dsi.fastutil.objects.ObjectLinkedOpenHashSet;
+import it.unimi.dsi.fastutil.objects.ObjectSortedSet;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandomGenerator;
+
+// RELEASE-STATUS: DIST
+
+/**
+ * Provides an implementation of consistent hashing. Consistent hashing has been introduced in
+ * <blockquote> <P>Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving
+ * Hot Spots on the World Wide Web, by David R. Karger, Eric Lehman, Frank T. Leighton, Rina
+ * Panigrahy, Matthew S. Levine and Daniel Lewin, Proc. of the twenty-ninth annual ACM symposium on
+ * Theory of computing, El Paso, Texas, United States, 1997, pages 654&minus;663. </blockquote> This
+ * class provides some extension to the original definition: weighted buckets and skippable buckets.
+ * More precisely, keys are distributed on buckets proportionally to the weight of the buckets, and
+ * it is possible to specify a {@linkplain ConsistentHashFunction.SkipStrategy skip strategy} for
+ * buckets.
+ *
+ * <H3>Consistent Hash Function: Properties</H3>
+ *
+ * <P>A consistent hash function consists, at any time, of a set of objects, called
+ * <em>buckets</em>, each with a specified weight (a positive integer). At the beginning, there
+ * are no buckets, but you can {@linkplain #add(Comparable, int) add a new bucket}, or
+ * {@linkplain #remove(Comparable) remove an existing bucket}.
+ *
+ * <P>The method {@link #hash(long)} can be used to hash a <em>point</em> (a long) to a bucket.
+ * More precisely, when applied to a given point, this method will return one of the buckets, and
+ * the method will satisfy the following properties:
+ *
+ * <OL> <LI>the bucket returned by {@link #hash(long)} is one of the buckets currently present in
+ * the consistent hash function; <LI>the fraction of points that are hashed to a specific bucket is
+ * approximately proportional to the weight of the bucket; for example, if there are only two
+ * buckets <var>A</var> and <var>B</var>, and the weight of <var>A</var> is 2 whereas the weight
+ * of <var>B</var> is 3, then about 2/5 of all longs will be hashed to <var>A</var> and about 3/5
+ * will be hashed to <var>B</var>; <LI>every time you add a new bucket, some of the points will of
+ * course be hashed to the new bucket, but it is impossible that a point that was hashed to an old
+ * bucket will now be hashed to some other old bucket; more formally, consider the following
+ * sequence of instructions:
+ *
+ * <pre>
+ * Object A = chf.hash(x);
+ * chf.add(B);
+ * Object C = chf.hash(x);
+ * </pre>
+ *
+ * at the end either <code>A==C</code> (i.e., the hash of x has not changed after adding B), or
+ * <code>C==B</code> (i.e., now x is hashed to the new object). </OL>
+ *
+ * <P>Otherwise said, if a new bucket is added, then the number of keys that change their
+ * assignment is the minimum necessary; more importantly, it is impossible that a key changes its
+ * bucket assignment towards a bucket that already existed: when a bucket is added old buckets can
+ * only lose keys towards the new bucket.
+ *
+ * <P>It is easy to see that the last property stated above can be equivalently stated by saying
+ * that every point determines a (total) order among buckets; the {@link #hash(long)} method only
+ * returns the first element of this order. It is also possible, using {@link #hash(long, int)} to
+ * obtain an <em>array</em> of buckets containing the (first part of the) order. In particular, if
+ * the latter method is called with a specified length corresponding to the number of buckets, the
+ * whole order will be returned.
+ *
+ * <H3>Implementation Details</H3>
+ *
+ * <P>With each bucket, we associate a number of points, called replicae, located on the unit
+ * circle (the unit circle itself is represented approximately on the whole range of
+ * <code>long</code>s). Then, given a point <var>p</var>, one can
+ * {@linkplain #hash(long) get the bucket} corresponding to the replica that is closest to <var>p</var>.
+ *
+ * <P>The method that {@linkplain #hash(long, int) gets an array} containing the buckets looks at
+ * the buckets that are closest to a point, in distance order, without repetitions. In particular,
+ * by computing an array as large as the number of buckets, you will obtain a permutation of the
+ * buckets themselves. Indeed, another viewpoint on consistent hashing is that it associates a
+ * random permutation of the buckets to each point (albeit the interpretation of weights, in that
+ * case, becomes a bit twisted).
+ *
+ * <P>The number of replicas associated to a bucket is fixed when the bucket is inserted in the
+ * map. The actual number depends on the weight and on the constant {@link #REPLICAE_PER_BUCKET}.
+ *
+ * <P>This class handles overlaps (i.e., conflicts in a replica creation). In that case, a local
+ * deterministic ordering is generated using the hash code (and, in case of a tie, the lexicographic
+ * order of string representation) of the buckets.
+ *
+ * <P>The hashing function is deterministically based on the hash code of the buckets, which should
+ * be of good quality. This is essential to ensure deterministic replicability of the results of
+ * this function across different instances.
+ *
+ */
+
+public final class ConsistentHashFunction<T extends Comparable<? super T>> {
+
+ /** Each bucket is replicated this number of times. */
+ public final static int REPLICAE_PER_BUCKET = 200;
+
+ /** Maps points in the unit interval to buckets. */
+ final protected Long2ObjectSortedMap<Object> replicae = new Long2ObjectAVLTreeMap<Object>();
+
+ /** The cached key set of {@link #replicae}. */
+ final protected ObjectSortedSet<it.unimi.dsi.fastutil.longs.Long2ObjectMap.Entry<Object>> entrySet = replicae.long2ObjectEntrySet();
+
+
+ /** For each bucket, its size. */
+ final protected Object2IntMap<T> sizes = new Object2IntOpenHashMap<T>();
+
+ /** The cached key set of {@link #sizes}. */
+ final protected Set<T> buckets = sizes.keySet();
+
+ /** The optional strategy to skip buckets, or {@code null}. */
+ final protected SkipStrategy<T> skipStrategy;
+
+ private final static boolean DEBUG = false;
+
+ /**
+ * Allows to skip suitable items when searching for the closest replica.
+ *
+ * <P>Sometimes it is useful to restrict the set of buckets that can be returned without
+ * modifying a consistent hash function (if not else, because any change requires removing or
+ * adding {@link ConsistentHashFunction#REPLICAE_PER_BUCKET} replicae).
+ *
+ * <P>To do so, it is possible to
+ * {@linkplain ConsistentHashFunction#ConsistentHashFunction(ConsistentHashFunction.SkipStrategy)
+ * provide at construction time} a strategy that, at each call to
+ * {@link ConsistentHashFunction#hash(long)}, will be used to test whether the bucket of a
+ * certain replica can be returned or not. Of course, in the latter case the search will
+ * continue with the next replica.
+ */
+
+ public static interface SkipStrategy<T> {
+
+ /**
+ * Checks whether a bucket can be returned or should be skipped.
+ *
+ * @param bucket the bucket to test.
+ * @return true if the bucket should be skipped.
+ */
+ public boolean isSkippable(T bucket);
+ }
+
+
+
+ /** Creates a new consistent hash function. */
+ public ConsistentHashFunction() {
+ this(null);
+ }
+
+ /**
+ * Creates a new consistent hash function with given skip strategy.
+ *
+ * @param skipStrategy a skip strategy, or {@code null}.
+ */
+ public ConsistentHashFunction(final SkipStrategy<T> skipStrategy) {
+ this.skipStrategy = skipStrategy;
+ }
+
+
+ /**
+ * Adds a bucket to the map.
+ *
+ * @param bucket the new bucket.
+ * @param weight the weight of the new bucket; buckets with a larger weight are returned
+ * proportionately more often.
+ * @return false if the bucket was already present.
+ */
+
+
+
+ @SuppressWarnings("unchecked")
+ public boolean add(final T bucket, final int weight) {
+
+ if (sizes.containsKey(bucket)) return false;
+ sizes.put(bucket, weight);
+
+
+ final XoRoShiRo128PlusRandomGenerator random = new XoRoShiRo128PlusRandomGenerator(bucket.hashCode());
+
+ long point;
+ Object o;
+ SortedSet<T> conflictSet;
+
+ for (int i = 0; i < weight * REPLICAE_PER_BUCKET; i++) {
+
+ point = random.nextLong();
+ if (DEBUG) point = point % 1024;
+
+ if ((o = replicae.get(point)) != null) {
+ if (o != bucket) { // o == bucket should happen with very low probability.
+ if (o instanceof SortedSet) ((SortedSet<T>)o).add(bucket);
+ else {
+ if (DEBUG) System.err.println("Creating conflict set...");
+ conflictSet = new TreeSet<T>();
+ conflictSet.add((T)o);
+ conflictSet.add(bucket);
+ replicae.put(point, conflictSet);
+ }
+ }
+ }
+ else replicae.put(point, bucket);
+ }
+
+ return true;
+ }
+
+ /**
+ * Removes a bucket.
+ *
+ * @param bucket the bucket to be removed.
+ * @return false if the bucket was not present.
+ */
+
+ @SuppressWarnings("unchecked")
+ public boolean remove(final T bucket) {
+
+ if (!sizes.containsKey(bucket)) return false;
+
+ final XoRoShiRo128PlusRandomGenerator random = new XoRoShiRo128PlusRandomGenerator(bucket.hashCode());
+ final int size = sizes.removeInt(bucket);
+
+ long point;
+ Object o;
+ SortedSet<T> conflictSet;
+
+ for (int i = 0; i < size * REPLICAE_PER_BUCKET; i++) {
+ point = random.nextLong();
+
+ if (DEBUG) point = point % 1024;
+ o = replicae.remove(point);
+ if (o instanceof SortedSet) {
+ if (DEBUG) System.err.println("Removing from " + point + " conflict set...");
+ conflictSet = (SortedSet<T>)o;
+ conflictSet.remove(bucket);
+ if (conflictSet.size() > 1) replicae.put(point, conflictSet);
+ else replicae.put(point, conflictSet.first());
+ }
+ else if (o != null && ((T)o).compareTo(bucket) != 0) replicae.put(point, o);
+ }
+
+ return true;
+ }
+
+
+ /**
+ * Returns an array of buckets whose replicae are close to the given point. The first element
+ * will be the bucket of the replica closest to the point, followed by the bucket of the next
+ * closest replica (whose bucket is not the first, of course) and so on.
+ *
+ * @param point a point on the unit circle.
+ * @param n the number of closest buckets to return.
+ * @return an array of distinct buckets of the closest replicas; the array could be shorter than
+ * <code>n</code> if there are not enough buckets and, in case a skip strategy has been
+ * specified, it could be empty even if the bucket set is nonempty.
+ */
+
+ @SuppressWarnings("unchecked")
+ public Object[] hash(long point, int n) {
+ if (n == 0 || buckets.size() == 0) return ObjectArrays.EMPTY_ARRAY;
+
+ if (DEBUG) point %= 1024;
+ final ObjectLinkedOpenHashSet<Object> result = new ObjectLinkedOpenHashSet<Object>(n, .5f);
+
+ ObjectBidirectionalIterator<it.unimi.dsi.fastutil.longs.Long2ObjectMap.Entry<Object>> i = replicae.long2ObjectEntrySet().iterator(new AbstractLong2ObjectSortedMap.BasicEntry<Object>(point, null));
+
+ Object value;
+
+ for (int pass = 2; pass-- != 0;) {
+ while (i.hasNext()) {
+ value = i.next().getValue();
+
+ if (value instanceof SortedSet) {
+ for (T p : (SortedSet<T>)value) {
+ if ((skipStrategy == null || !skipStrategy.isSkippable(p)) && result.add(p) && --n == 0) return result.toArray();
+ }
+ }
+ else if ((skipStrategy == null || !skipStrategy.isSkippable((T)value)) && result.add(value) && --n == 0) return result.toArray();
+ }
+ // Restart from the first element
+ i = replicae.long2ObjectEntrySet().iterator();
+ }
+
+ return result.toArray();
+ }
+
+ /**
+ * Returns an array of buckets whose replicae are close to the given object.
+ *
+ * @param key an object ot hash.
+ * @param n the number of close buckets to return.
+ *
+ * <P>This method just uses <code>hashCode() << 32</code> as point for
+ * {@link #hash(long,int)}
+ * @return an array of distinct buckets of the closest replicas; the array could be shorter than
+ * <code>n</code> if there are not enough buckets and, in case a skip strategy has been
+ * specified, it could be empty even if the bucket set is nonempty.
+ * @see #hash(long, int)
+ */
+
+ public Object[] hash(final Object key, final int n) {
+ return hash((long)key.hashCode() << 32, n);
+ }
+
+ /**
+ * Returns the bucket of the replica that is closest to the given point.
+ *
+ * @param point a point on the unit circle.
+ * @return the bucket of the closest replica, or {@code null} if there are no buckets or
+ * all buckets must be skipped.
+ * @see #hash(long, int)
+ * @throws NoSuchElementException if there are no buckets, or if a skip strategy has been
+ * specified and it skipped all existings buckets.
+ */
+
+ @SuppressWarnings("unchecked")
+ public T hash(long point) {
+ final Object result[] = hash(point, 1);
+ if (result.length == 0) throw new NoSuchElementException();
+ else return (T)result[0];
+ }
+
+
+ /**
+ * Returns the bucket of the replica that is closest to the given key.
+ *
+ * @param key an object to hash.
+ * @return the bucket of the closest replica, or {@code null} if there are no buckets or
+ * all buckets must be skipped.
+ * @see #hash(Object, int)
+ * @throws NoSuchElementException if there are no buckets, or if a skip strategy has been
+ * specified and it skipped all existings buckets.
+ */
+
+ @SuppressWarnings("unchecked")
+ public T hash(final Object key) {
+ final Object result[] = hash(key, 1);
+ if (result.length == 0) throw new NoSuchElementException();
+ else return (T)result[0];
+ }
+
+
+ /**
+ * Returns the set of buckets of this consistent hash function.
+ *
+ * @return the set of buckets.
+ */
+
+ public Set<T> buckets() {
+ return buckets;
+ }
+
+ public String toString() {
+ return replicae.toString();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ExchangeCounter.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ExchangeCounter.java
new file mode 100644
index 0000000..1414e69
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ExchangeCounter.java
@@ -0,0 +1,152 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.stat.KendallTau;
+
+// RELEASE-STATUS: DIST
+
+/** Computes the number of discordances between two score vectors
+ * using Knight's O(<var>n</var>&nbsp;log&nbsp;<var>n</var>)
+ * MergeSort-based algorithm.
+ *
+ * <P>The number of <em>discordances</em> between two score vectors is the number
+ * of unordered pairs whose mutual relationship is opposite in the two orders (ties in either side are not counted).
+ *
+ * <P>The computation of the number of discordances is the most onerous step
+ * in {@linkplain KendallTau the computation of Kendall's &tau;}
+ * It is possible to compute the
+ * number of discordances trivially using BubbleSort and counting the exchanges that are
+ * necessary to turn from the ranking induced by the first score vector into the ranking induced by the second
+ * score vector, but Knight noted that the same can be done
+ * in time O(<var>n</var> log <var>n</var>) using a stable sorting algorithm [William R. Knight,
+ * &lduo;A Computer Method for Calculating Kendall's Tau with Ungrouped Data&rdquo;, <i>J. Amer. Statist. Assoc.</i>,
+ * 61(314):436&minus;439, 1966].
+ *
+ * <P>This class makes it possible to count the number of exchanges that will change a given ranking, specified by
+ * an array of integers, into another order, specified by a score vector.
+ * You must {@linkplain #ExchangeCounter(int[], double[]) creates an exchange counter} first,
+ * and then invoke {@link #count()}. You are welcome to use a one-liner (e.g., <code>new ExchangeCounter(v, c).count()</code>),
+ * so that the large support array allocated by an instance of this class is collected quickly.
+ * Optionally, {@linkplain #ExchangeCounter(int[], double[], int[]) you can pass a support array explicitly}.
+ *
+ * <P>The slightly awkward structure is due to the necessity (in the computation of Kendall's &tau;)
+ * of computing the first order externally, and to avoid passing around several additional parameters in recursive
+ * calls.
+ */
+public class ExchangeCounter {
+ /** Below this number of elements we use insertion sort. */
+ private final static int SMALL = 32;
+ /** A support array used by MergeSort. Must be at least as large as {@link #perm}. */
+ private final int[] temp;
+ /** The score vector used to perform comparisons. */
+ private final double[] v;
+ /** An array of integers (representing a previously built order). */
+ private final int perm[];
+
+ /** Creates a new exchange counter with a provided support array.
+ *
+ * <P>This constructor avoids the need to allocate a support array, in case one is already available.
+ *
+ * @param perm the array to be sorted.
+ * @param v the score vector used to compare the element of {@code perm}.
+ * @param support an array that will be used as temporary storage during the computation (its content will be erased);
+ * must be at least as long as <code>a</code>.
+ */
+ public ExchangeCounter(final int perm[], final double[] v, final int[] support) {
+ this.perm = perm;
+ this.v = v;
+ if (support.length < perm.length) throw new IllegalArgumentException("The support array length (" + support.length + ") is smaller than the main array length (" + perm.length + ")");
+ this.temp = support;
+ }
+
+ /** Creates a new exchange counter.
+ *
+ * @param perm the array to be sorted.
+ * @param v the score vector used to compare the element of {@code perm}.
+ */
+ public ExchangeCounter(final int perm[], final double[] v) {
+ this(perm, v, new int[perm.length]);
+ }
+
+ /** Computes the number of exchanges.
+ *
+ * <P>Note that a call to this method will actually sort the permutation at creation time.
+ *
+ * @return the number of exchanges that will order the permutation provided at creation time using the score vector
+ * provided at creation time.
+ */
+ public long count() {
+ return count(0, perm.length);
+ }
+
+ /** Orders a subarray of {@link #perm} and returns the number of exchanges.
+ *
+ * @param offset the starting element to order.
+ * @param length the number of elements to order.
+ * @return the number of exchanges in the subarray.
+ */
+
+
+ private long count(final int offset, final int length) {
+ long exchanges = 0;
+ final int[] perm = this.perm;
+
+ if (length < SMALL) {
+ final int end = offset + length;
+ for (int i = offset; ++i < end;) {
+ int t = perm[i];
+ int j = i;
+ for (int u = perm[j - 1]; v[t] < v[u]; u = perm[j - 1]) {
+ exchanges++;
+ perm[j--] = u;
+ if (offset == j) break;
+ }
+ perm[j] = t;
+ }
+
+ return exchanges;
+ }
+
+ final int length0 = length / 2, length1 = length - length / 2, middle = offset + length0;
+ exchanges += count(offset, length0);
+ exchanges += count(middle, length1);
+
+ /* If the last element of the first subarray is smaller than the first element of
+ * the second subarray, there is nothing to do. */
+ if (v[perm[middle - 1]] >= v[perm[middle]]) {
+ // We merge the lists into temp, adding the number of forward moves to exchanges.
+ int i = 0, j = 0, k = 0;
+ while(j < length0 && k < length1) {
+ if (v[perm[offset + j]] <= v[perm[middle + k]]) {
+ temp[i] = perm[offset + j++];
+ }
+ else {
+ temp[i] = perm[middle + k++];
+ exchanges += length0 - j;
+ }
+ i++;
+ }
+
+ System.arraycopy(perm, offset + j, perm, offset + i, length0 - j);
+ System.arraycopy(temp, 0, perm, offset, i);
+ }
+ return exchanges;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ExchangeWeigher.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ExchangeWeigher.java
new file mode 100644
index 0000000..4852094
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/ExchangeWeigher.java
@@ -0,0 +1,119 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2013-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+
+// RELEASE-STATUS: DIST
+
+/** Computes the weight of discordances using a generalisation of Knight's algorithm. */
+
+public class ExchangeWeigher {
+ /** The function used to weigh elements. */
+ private final Int2DoubleFunction weigher;
+ /** A rank on the elements of {@link #perm}. The weight is computed by applying the {@link #weigher} on the rank. */
+ private final int[] rank;
+ /** The score vector used to perform comparisons. */
+ private final double[] v;
+ /** A support array used by MergeSort. Must be at least as large as {@link #perm}. */
+ private final int[] temp;
+ /** An array of integers representing a previously built order. */
+ private final int perm[];
+ /** Whether combining weights multiplicatively, rather than additively, which is the default. */
+ private final boolean multiplicative;
+ /** The currently accumulated exchange weight. */
+ private double exchangeWeight;
+
+ /** Creates a new exchange weigher.
+ *
+ * @param weigher the function used to weight indices.
+ * @param perm the array to be sorted.
+ * @param v the score vector used to compare the element of {@code perm}.
+ * @param rank a rank on the indices.
+ * @param multiplicative whether to combine weights multiplicatively, rather than additively, which is the default.
+ * @param support an array that will be used as temporary storage during the computation; its content will be erased, and it
+ * must be at least as long as {@code perm}. If {@code null}, it will be allocated.
+ */
+ public ExchangeWeigher(final Int2DoubleFunction weigher, final int[] perm, final double[] v, final int[] rank, final boolean multiplicative, final int[] support) {
+ this.weigher = weigher;
+ this.perm = perm;
+ this.v = v;
+ this.rank = rank;
+ this.multiplicative = multiplicative;
+ if (rank.length != perm.length) throw new IllegalArgumentException("The permutation array length (" + perm.length + ") and the rank array length (" + rank.length + ") do not match");
+ if (support != null && support.length < perm.length) throw new IllegalArgumentException("The support array length (" + support.length + ") is smaller than the main array length (" + perm.length + ")");
+ this.temp = support == null ? new int[perm.length] : support;
+ }
+
+ /** Computes the weight of exchanges for the current data.
+ *
+ * <P>Note that a call to this method will actually order the array
+ * provided at creation time using the comparator provided
+ * at creation time; thus, subsequent calls will always return 1.
+ *
+ * @return the weight of exchanges that will order the vector provided at creation time using the comparator provided
+ * at creation time.
+ */
+ public double weigh() {
+ exchangeWeight = 0;
+ weigh(0, perm.length);
+ return exchangeWeight;
+ }
+
+ /** Orders a subarray of {@link #perm} and returns the sum of the weight of its elements.
+ *
+ * @param offset the starting element to order.
+ * @param length the number of elements to order.
+ * @return the weight of the elements in the subarray.
+ */
+
+ private double weigh(final int offset, final int length) {
+ /* Using a non-recursive sort for small subarrays gives no noticeable
+ * improvement, as most of the cost is given by floating-point computations. */
+ if (length == 1) return weigher.get(rank[perm[offset]]);
+
+ final int length0 = length / 2, length1 = length - length / 2, middle = offset + length0;
+ double residual = weigh(offset, length0);
+ final double weight = weigh(middle, length1) + residual;
+
+ /* If the last element of the first subarray is larger than or equal to the first element of
+ * the second subarray, then there is nothing to do. */
+ if (v[perm[middle - 1]] >= v[perm[middle]]) {
+ // We merge the lists into temp, adding the weights of the resulting exchanges.
+ int i = 0, j = 0, k = 0;
+ while(j < length0 && k < length1) {
+ if (v[perm[offset + j]] <= v[perm[middle + k]]) {
+ temp[i] = perm[offset + j++];
+ residual -= weigher.get(rank[temp[i]]);
+ }
+ else {
+ temp[i] = perm[middle + k++];
+ exchangeWeight += multiplicative
+ ? weigher.get(rank[temp[i]]) * residual
+ : weigher.get(rank[temp[i]]) * (length0 - j) + residual;
+ }
+ i++;
+ }
+
+ System.arraycopy(perm, offset + j, perm, offset + i, length0 - j);
+ System.arraycopy(temp, 0, perm, offset, i);
+ }
+ return weight;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/KahanSummation.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/KahanSummation.java
new file mode 100644
index 0000000..4db5dd1
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/KahanSummation.java
@@ -0,0 +1,52 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2011-2019 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+//RELEASE-STATUS: DIST
+
+/** <a href="http://en.wikipedia.org/wiki/Kahan_summation_algorithm">Kahan's
+ * summation algorithm</a> encapsulated in an object. */
+
+public class KahanSummation {
+ /** The current value of the sum. */
+ private double value;
+ /** The current correction. */
+ private double c;
+
+ /** Adds a value.
+ * @param v the value to be added to the sum.
+ */
+ public void add(final double v) {
+ final double y = v - c;
+ final double t = value + y;
+ c = (t - value) - y;
+ value = t;
+ }
+
+ /** Returns the sum computed so far.
+ * @return the sum computed so far.
+ */
+ public double value() {
+ return value;
+ }
+
+ /** Resets the current value and correction to zero. */
+ public void reset() {
+ value = c = 0;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/Norm.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/Norm.java
new file mode 100644
index 0000000..57b3bce
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/Norm.java
@@ -0,0 +1,287 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2011-2019 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+//RELEASE-STATUS: DIST
+
+import java.io.IOException;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrays;
+import it.unimi.dsi.law.stat.CorrelationIndex;
+import it.unimi.dsi.law.stat.KendallTau;
+
+/** An {@link Enum} providing different &#x2113; norms. */
+
+public enum Norm {
+ /** The {@linkplain #compute(double[]) &#x2113;<sub>1</sub> norm} of a vector is the sum of the absolute
+ * values of its components. We use <a href="http://en.wikipedia.org/wiki/Kahan_summation_algorithm">Kahan's
+ * summation algorithm</a> to contain numerical errors. */
+ L_1 {
+ @Override
+ public double compute(final double[] v) {
+ double normL1 = 0, c = 0;
+ for (int i = v.length; i-- != 0;) {
+ final double y = Math.abs(v[i]) - c;
+ final double t = normL1 + y;
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ return normL1;
+ }
+
+ @Override
+ public double compute(final double[][] bv) {
+ double normL1 = 0, c = 0;
+ for(final double[] v: bv) {
+ for (int i = v.length; i-- != 0;) {
+ final double y = Math.abs(v[i]) - c;
+ final double t = normL1 + y;
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ }
+ return normL1;
+ }
+
+ @Override
+ public double compute(final double[] v, final double[] w) {
+ if (v.length != w.length) throw new IllegalArgumentException("The two vectors have different sizes: " + v.length + " != " + w.length);
+
+ double normL1 = 0, c = 0;
+ for (int i = v.length; i-- != 0;) {
+ final double y = Math.abs(v[i] - w[i]) - c;
+ final double t = normL1 + y;
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ return normL1;
+ }
+
+ @Override
+ public double compute(final double[][] bv, final double[][] bw) {
+ if (DoubleBigArrays.length(bv) != DoubleBigArrays.length(bw)) throw new IllegalArgumentException("The two big vectors have different sizes: " + DoubleBigArrays.length(bv) + " != " + DoubleBigArrays.length(bw));
+
+ double normL1 = 0, c = 0;
+ for(int s = bv.length; s-- != 0;) {
+ final double[] v = bv[s];
+ final double[] w = bw[s];
+ for (int i = v.length; i-- != 0;) {
+ final double y = Math.abs(v[i] - w[i]) - c;
+ final double t = normL1 + y;
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ }
+ return normL1;
+ }
+ },
+ /** The {@linkplain #compute(double[]) &#x2113;<sub>2</sub> norm} of a vector is the square root of the sum of the squares
+ * of its components. We use <a href="http://en.wikipedia.org/wiki/Kahan_summation_algorithm">Kahan's
+ * summation algorithm</a> to contain numerical errors.*/
+ L_2 {
+ @Override
+ public double compute(final double[] v) {
+ double sumOfSquares = 0, c = 0;
+ for(int i = v.length ; i-- != 0;) {
+ final double y = (v[i] * v[i]) - c;
+ final double t = sumOfSquares + y;
+ c = (t - sumOfSquares) - y;
+ sumOfSquares = t;
+ }
+ return Math.sqrt(sumOfSquares);
+ }
+
+ @Override
+ public double compute(final double[][] bv) {
+ double sumOfSquares = 0, c = 0;
+ for(final double[] v: bv) {
+ for(int i = v.length ; i-- != 0;) {
+ final double y = (v[i] * v[i]) - c;
+ final double t = sumOfSquares + y;
+ c = (t - sumOfSquares) - y;
+ sumOfSquares = t;
+ }
+ }
+ return Math.sqrt(sumOfSquares);
+ }
+
+ @Override
+ public double compute(final double[] v, final double[] w) {
+ if (v.length != w.length) throw new IllegalArgumentException("The two vectors have different sizes: " + v.length + " != " + w.length);
+
+ double sumOfSquares = 0, c = 0;
+ for(int i = v.length; i-- != 0;) {
+ final double y = (v[i] - w[i]) * (v[i] - w[i]) - c;
+ final double t = sumOfSquares + y;
+ c = (t - sumOfSquares) - y;
+ sumOfSquares = t;
+ }
+ return Math.sqrt(sumOfSquares);
+ }
+
+ @Override
+ public double compute(final double[][] bv, final double[][] bw) {
+ if (DoubleBigArrays.length(bv) != DoubleBigArrays.length(bw)) throw new IllegalArgumentException("The two big vectors have different sizes: " + DoubleBigArrays.length(bv) + " != " + DoubleBigArrays.length(bw));
+
+ double sumOfSquares = 0, c = 0;
+ for(int s = bv.length; s-- != 0; ) {
+ final double[] v = bv[s];
+ final double[] w = bw[s];
+ for (int i = v.length; i-- != 0;) {
+ final double y = (v[i] - w[i]) * (v[i] - w[i]) - c;
+ final double t = sumOfSquares + y;
+ c = (t - sumOfSquares) - y;
+ sumOfSquares = t;
+ }
+ }
+ return Math.sqrt(sumOfSquares);
+ }
+ },
+ /** The {@linkplain #compute(double[]) &#x2113;<sub>&#x221E;</sub> norm} of a vector is the maximum of the absolute
+ * values of its components. */
+ L_INFINITY {
+ @Override
+ public double compute(final double[] v) {
+ double norm = 0;
+ for (int i = v.length; i-- != 0;) norm = Math.max(norm, Math.abs(v[i]));
+ return norm;
+ }
+
+ @Override
+ public double compute(final double[][] bv) {
+ double norm = 0;
+ for(final double[] v : bv)
+ for (int i = v.length; i-- != 0;) norm = Math.max(norm, Math.abs(v[i]));
+ return norm;
+ }
+
+ @Override
+ public double compute(final double[] v, final double[] w) {
+ if (v.length != w.length) throw new IllegalArgumentException("The two vectors have different sizes: " + v.length + " != " + w.length);
+
+ double norm = 0;
+ for (int i = v.length; i-- != 0;) norm = Math.max(norm, Math.abs(v[i] - w[i]));
+ return norm;
+ }
+
+ @Override
+ public double compute(final double[][] bv, final double[][] bw) {
+ if (DoubleBigArrays.length(bv) != DoubleBigArrays.length(bw)) throw new IllegalArgumentException("The two big vectors have different sizes: " + DoubleBigArrays.length(bv) + " != " + DoubleBigArrays.length(bw));
+
+ double norm = 0;
+ for(int s = bv.length; s-- != 0; ) {
+ final double[] v = bv[s];
+ final double[] w = bw[s];
+ for (int i = v.length; i-- != 0;) norm = Math.max(norm, Math.abs(v[i] - w[i]));
+ }
+ return norm;
+ }
+ };
+
+ /** Computes the norm of a vector.
+ *
+ * @param v a vector.
+ * @return the norm of <code>v</code>.
+ */
+ public abstract double compute(final double[] v);
+
+ /** Computes the norm of the difference of two vectors.
+ *
+ * @param v the first vector.
+ * @param w the second vector.
+ * @return the norm of <code>v</code>&nbsp;&minus;&nbsp;<code>w</code>.
+ */
+ public abstract double compute(final double[] v, final double[] w);
+
+ /** Computes the norm of a big vector.
+ *
+ * @param v a big vector.
+ * @return the norm of <code>v</code>.
+ */
+ public abstract double compute(final double[][] v);
+
+ /** Computes the norm of the difference of two big vectors.
+ *
+ * @param v the first big vector.
+ * @param w the second big vector.
+ * @return the norm of <code>v</code>&nbsp;&minus;&nbsp;<code>w</code>.
+ */
+ public abstract double compute(final double[][] v, final double[][] w);
+
+ /** Computes the norm of a big vector.
+ *
+ * @param v a big vector.
+ * @return the norm of <code>v</code>.
+ */
+
+ /** Normalizes a vector to a given norm value.
+ *
+ * @param v the vector to be normalized.
+ * @param norm the new norm value (nonnegative).
+ * @return <code>v</code>.
+ */
+ public double[] normalize(final double[] v, final double norm) {
+ if (norm < 0) throw new IllegalArgumentException("Negative norm: " + norm);
+ final double c = norm / compute(v);
+ for (int i = v.length; i-- != 0;) v[i] *= c;
+ return v;
+ }
+
+
+ public static void main(final String[] arg) throws NumberFormatException, IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(KendallTau.class.getName(),
+ "Computes the L1- or L2-norm of a vector of doubles (or of the difference between two vectors of doubles) contained in a (two) given file. " +
+ "The file(s) must contain the same number of doubles, written " +
+ "in Java binary format, or in some other format if -t is specified.",
+ new Parameter[] {
+ new FlaggedOption("type", JSAP.STRING_PARSER, "double", JSAP.NOT_REQUIRED, 't', "type", "The type of the input files, of the form type[:type] where type is one of int, long, float, double, text"),
+ new FlaggedOption("norm", JSAP.INTEGER_PARSER, "2", JSAP.NOT_REQUIRED, 'n', "norm", "The type of norm (1=L1, 2=L2)."),
+ new UnflaggedOption("file0", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The first rank file."),
+ new UnflaggedOption("file1", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The second rank file."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final int normNumber = jsapResult.getInt("norm");
+ if (normNumber != 1 && normNumber != 2) throw new IllegalArgumentException("Type must be 1 or 2");
+ final Norm norm = normNumber == 1? L_1 : L_2;
+
+ final Class<?>[] inputType = CorrelationIndex.parseInputTypes(jsapResult);
+
+ final String f0 = jsapResult.getString("file0");
+ final String f1 = jsapResult.getString("file1");
+
+ final double[] v0 = CorrelationIndex.loadAsDoubles(f0, inputType[0], false);
+ final double[] v1 = f1 == null? null : CorrelationIndex.loadAsDoubles(f1, inputType[1], false);
+
+ final double result = v1 == null? norm.compute(v0) : norm.compute(v0, v1);
+ System.out.println(result);
+
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/NormL1.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/NormL1.java
new file mode 100644
index 0000000..36987f4
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/NormL1.java
@@ -0,0 +1,60 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2005-2019 Roberto Posenato and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+//RELEASE-STATUS: DIST
+
+/** Static methods that compute &#x2113;<sub>1</sub> norms.
+ * @deprecated Use {@link Norm#L_1}.
+ */
+
+@Deprecated
+public class NormL1 {
+
+ private NormL1() {}
+
+ public static double compute(final double[] v) {
+ double normL1 = 0, c = 0;
+ for (int i = v.length; i-- != 0;) {
+ final double y = Math.abs(v[i]) - c;
+ final double t = normL1 + y;
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ return normL1;
+ }
+
+ public static double compute(double[] v, double[] w) {
+ if (v.length != w.length) throw new IllegalArgumentException("The two vectors have different sizes: " + v.length + " != " + w.length);
+
+ double normL1 = 0, c = 0;
+ for (int i = v.length; i-- != 0;) {
+ final double y = Math.abs(v[i] - w[i]) - c;
+ final double t = normL1 + y;
+ c = (t - normL1) - y;
+ normL1 = t;
+ }
+ return normL1;
+ }
+
+ public static void normalize(final double[] v, final double norm) {
+ if (norm < 0) throw new IllegalArgumentException("Negative norm: " + norm);
+ double c = norm / NormL1.compute(v);
+ for (int i = v.length; i-- != 0;) v[i] *= c;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/NormL2.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/NormL2.java
new file mode 100644
index 0000000..9ee08a6
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/NormL2.java
@@ -0,0 +1,60 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2005-2019 Roberto Posenato and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+// RELEASE-STATUS: DIST
+
+/** Static methods that compute &#x2113;<sub>2</sub> norms.
+ * @deprecated Use {@link Norm#L_2}. */
+
+@Deprecated
+public class NormL2 {
+
+ private NormL2() {}
+
+ public static double compute(final double[] v) {
+ double sumOfSquares = 0, c = 0;
+ for(int i = v.length ; i-- != 0;) {
+ final double y = (v[i] * v[i]) - c;
+ final double t = sumOfSquares + y;
+ c = (t - sumOfSquares) - y;
+ sumOfSquares = t;
+ }
+ return Math.sqrt(sumOfSquares);
+ }
+
+
+ public static double compute(final double[] v, final double[] w) {
+ if (v.length != w.length) throw new IllegalArgumentException("The two vectors have different sizes: " + v.length + " != " + w.length);
+
+ double sumOfSquares = 0, c = 0;
+ for(int i = v.length; i-- != 0;) {
+ final double y = (v[i] - w[i]) * (v[i] - w[i]) - c;
+ final double t = sumOfSquares + y;
+ c = (t - sumOfSquares) - y;
+ sumOfSquares = t;
+ }
+ return Math.sqrt(sumOfSquares);
+ }
+
+ public static void normalize(final double[] v, final double norm) {
+ if (norm < 0) throw new IllegalArgumentException("Negative norm: " + norm);
+ double c = norm / Norm.L_2.compute(v);
+ for (int i = v.length; i-- != 0;) v[i] *= c;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/Precision.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/Precision.java
new file mode 100644
index 0000000..52cb0d1
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/Precision.java
@@ -0,0 +1,69 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+
+// RELEASE-STATUS: DIST
+
+/**
+ * A set of commodity methods to manipulate precision of doubles.
+ *
+ */
+public class Precision {
+
+ /** Truncates the given double value to the given number of fractional binary digits. This is
+ * equivalent to multiplying <code>value</code> by 2<sup><code>significantBinaryDigits</code></sup>,
+ * taking the floor and then multiplying the result by 2<sup><code>-significantBinaryDigits</code></sup>
+ * (the computation should be performed in arbitrary precision). By choice, this method does not apply any kind of rounding.
+ *
+ * <p>Note that <code>significantBinaryDigits</code> can be negative: in that case, on positive
+ * values the method is equivalent to applying the mask <code>-1L << -significantBinaryDigits</code> to the integer part of <code>value</code>.
+ *
+ * <p>As an example, if you have an estimate <var>v</var> &pm; <var>e</var>, the right value to be passed for <var>v</var> is
+ * <code>Math.floor(-Math.log(e)/Math.log(2))-1</code>.
+ *
+ * @param value the value to be truncated.
+ * @param significantFractionalBinaryDigits the number of significant fractional binary digits ({@link Integer#MAX_VALUE} causes <code>value</code> to be returned unmodified).
+ * @return the truncated value.
+ */
+ public static double truncate(final double value, final int significantFractionalBinaryDigits) {
+ final long bits = Double.doubleToLongBits(value);
+ // 52 - the exponent.
+ final int negExponent = - (int)((bits >> 52) & 0x7FFL) + 1075;
+ // The number of digits lost at the end of the significand.
+ final long lostDigits = (negExponent - significantFractionalBinaryDigits - (negExponent == 0 ? 1 : 0));
+ if (lostDigits <= 0) return value;
+ if (lostDigits > 52) return 0;
+ return Double.longBitsToDouble(bits & (-1L << lostDigits));
+ }
+
+ /** Applies {@link #truncate(double, int)} to the given array.
+ *
+ * <p><strong>Warning</strong>: previous implementations of this method used the special value -1 to indicate
+ * that <code>value</code> was to be left unchanged. The current version uses {@link Integer#MAX_VALUE} to this purpose.
+ *
+ * @param value an array.
+ * @param significantFractionalBinaryDigits the number of significant fractional binary digits ({@link Integer#MAX_VALUE} causes the contents of <code>value</code> to be returned unmodified).
+ * @return <code>value</code>.
+ * @see #truncate(double, int)
+ */
+ public static double[] truncate(final double[] value, final int significantFractionalBinaryDigits) {
+ if (significantFractionalBinaryDigits != Integer.MAX_VALUE) for (int i = value.length; i-- != 0;) value[i] = truncate(value[i], significantFractionalBinaryDigits);
+ return value;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/util/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/package.html
new file mode 100644
index 0000000..e5f34e3
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/util/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>Utilities</title>
+ </head>
+
+ <body>
+
+ <P>Utility classes.
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/CosineSimilarityStrategy.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/CosineSimilarityStrategy.java
new file mode 100644
index 0000000..1e15ff9
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/CosineSimilarityStrategy.java
@@ -0,0 +1,33 @@
+package it.unimi.dsi.law.vector;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+//RELEASE-STATUS: DIST
+
+/** A class that compute the similarity between pattern using cosine similarity. */
+public class CosineSimilarityStrategy implements SimilarityStrategy {
+
+ static final long serialVersionUID = 2006001L;
+
+ public double similarity(Vector v0, Vector v1) {
+ double dot = v0.dotProduct(v1);
+ return dot == 0.0 ? 0.0 : dot / (v0.ell2Norm() * v1.ell2Norm());
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/DenseVector.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/DenseVector.java
new file mode 100644
index 0000000..1c3857d
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/DenseVector.java
@@ -0,0 +1,212 @@
+package it.unimi.dsi.law.vector;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.util.Arrays;
+
+// RELEASE-STATUS: DIST
+
+/** A mutable implementation of {@link Vector} optimized for dense vectors. */
+public class DenseVector extends Vector {
+
+ static final long serialVersionUID = 2006002L;
+
+ /** An arrays containing vector values. */
+ final double[] value;
+
+ /**
+ * Build a vector of given size with zero values.
+ *
+ * @param size the size.
+ * @param id the id of description of this vector.
+ */
+ private DenseVector(final int size, final int id) {
+ super(size, true, id);
+ this.value = new double[size];
+ ell2norm = ell1norm = 0.0;
+ }
+
+ /**
+ * Build a vector from an array of values.
+ *
+ * @param value an array of values.
+ * @param id the id of description of this vector.
+ */
+ private DenseVector(final double[] value, final int id) {
+ super(value.length, true, id);
+ this.value = value;
+ ell2norm = ell1norm = INVALID_NORM;
+ }
+
+ /**
+ * Returns an instance of given size with zero values.
+ *
+ * @param size the size.
+ * @param id the id of description of this vector.
+ * @return a vector.
+ */
+ public static DenseVector getInstance(final int size, final int id) {
+ return new DenseVector(size, id);
+ }
+
+ /**
+ * Returns an instance from an array of values.
+ *
+ * @param value an array of values.
+ * @param id the id of description of this vector.
+ * @return a vector.
+ */
+ public static DenseVector getInstance(final double[] value, final int id) {
+ return new DenseVector(value, id);
+ }
+
+ public void set(final int idx, final double val) {
+ if (idx < 0 || idx >= size) throw new IllegalArgumentException("index out of range");
+
+ if (ell1norm != INVALID_NORM) ell1norm = ell1norm - Math.abs(value[idx]) + Math.abs(val);
+
+ value[idx] = val;
+ ell2norm = INVALID_NORM;
+ }
+
+ public double get(final int idx) {
+ if (idx < 0 || idx >= size) throw new IllegalArgumentException("index out of range");
+
+ return value[idx];
+ }
+
+ public void add(final double alpha, final Vector v) {
+ // check size
+ if (size != v.size) throw new IllegalArgumentException("vectors with different size");
+
+ if (v instanceof DenseVector) {
+ final double[] dvValue = ((DenseVector)v).value;
+
+ for (int i = size; i-- != 0;)
+ value[i] += dvValue[i] * alpha;
+ }
+ else {
+ if (v instanceof ImmutableSparseVector) {
+ ImmutableSparseVector sv = (ImmutableSparseVector)v;
+ final double[] svValue = sv.value;
+ final int[] svIndex = sv.index;
+
+ for (int i = sv.nonZero; i-- != 0;)
+ value[svIndex[i]] += svValue[i] * alpha;
+ }
+ else super.add(alpha, v);
+ }
+
+ ell2norm = ell1norm = INVALID_NORM;
+ }
+
+ public void scale(final double alpha) {
+ for (int i = size; i-- != 0;)
+ value[i] *= alpha;
+
+ // update norm
+ if (ell2norm != INVALID_NORM) ell2norm *= Math.abs(alpha); // update norm
+ if (ell1norm != INVALID_NORM) ell1norm *= Math.abs(alpha);
+ }
+
+ public void zero() {
+ Arrays.fill(value, 0.0);
+ ell2norm = 0.0; // update norm
+ }
+
+ public double dotProduct(final Vector v) {
+ // check size
+ if (size != v.size) throw new IllegalArgumentException("vectors with different size");
+
+ // compute dot product
+ if (v instanceof DenseVector) {
+ final double[] dvValue = ((DenseVector)v).value;
+
+ double dot = 0.0;
+
+ double c = 0.0, t, y;
+ for (int i = size; i-- != 0;) {
+ y = (value[i] * dvValue[i]) - c;
+ t = dot + y;
+ c = (t - dot) - y;
+ dot = t;
+
+ // dot += value[i] * dvValue[i];
+ }
+
+ return dot;
+ }
+ else if (v instanceof ImmutableSparseVector) return v.dotProduct(this);
+ else return super.dotProduct(v);
+ }
+
+ public double euclideanDistance(final Vector v) {
+ // check size
+ if (size != v.size) throw new IllegalArgumentException("vectors with different size");
+
+ // compute distance
+ if (v instanceof DenseVector) {
+ final double[] dvValue = ((DenseVector)v).value;
+
+ double dist = 0.0, temp;
+
+ double c = 0.0, t, y;
+ for (int i = size; i-- != 0;) {
+ temp = value[i] - dvValue[i];
+ y = (temp * temp) - c;
+ t = dist + y;
+ c = (t - dist) - y;
+ dist = t;
+
+ // temp = value[i] - dvValue[i];
+ // dist += temp * temp;
+ }
+
+ return Math.sqrt(Math.abs(dist));
+ }
+ else if (v instanceof ImmutableSparseVector) return Math.sqrt(Math.abs(dotProduct(this) + v.dotProduct(v) - 2 * v.dotProduct(this)));
+ else return super.euclideanDistance(v);
+ }
+
+ public double ell2Norm() {
+ if (ell2norm == INVALID_NORM) {
+ double tempNorm = 0.0;
+ for (int i = size; i-- != 0;)
+ tempNorm += value[i] * value[i];
+
+ ell2norm = Math.sqrt(Math.abs(tempNorm));
+ }
+
+ return ell2norm;
+ }
+
+ @Override
+ public double ell1Norm() {
+ if (ell1norm == INVALID_NORM)
+ {
+ double tempNorm = 0.0;
+ for (int i = size; i-- != 0 ;)
+ tempNorm += Math.abs(value[i]);
+ ell1norm = tempNorm;
+ }
+
+ return ell1norm;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/EuclideanSimilarityStrategy.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/EuclideanSimilarityStrategy.java
new file mode 100644
index 0000000..9cbf692
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/EuclideanSimilarityStrategy.java
@@ -0,0 +1,34 @@
+package it.unimi.dsi.law.vector;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+//RELEASE-STATUS: DIST
+
+/** A class that compute the similarity between pattern using the euclidean distance. */
+public class EuclideanSimilarityStrategy implements SimilarityStrategy {
+
+ static final long serialVersionUID = 2006001L;
+
+ public double similarity(Vector v0, Vector v1) {
+ double dist = v0.euclideanDistance(v1);
+ return dist < 1.0 ? 1.0 : 1.0 / dist;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/ImmutableSparseVector.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/ImmutableSparseVector.java
new file mode 100644
index 0000000..b8ba511
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/ImmutableSparseVector.java
@@ -0,0 +1,303 @@
+package it.unimi.dsi.law.vector;
+
+import java.util.Arrays;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+
+//RELEASE-STATUS: DIST
+
+/** An immutable implementation of {@link Vector} optimized for sparse vectors. */
+public class ImmutableSparseVector extends Vector {
+
+ static final long serialVersionUID = 2006002L;
+
+ /** An arrays containing vector values. */
+ final public double[] value;
+
+ /** An arrays containing the indexes of the vector values. */
+ final public int[] index;
+
+ /** The number of non-zero entry in this vector. */
+ final public int nonZero;
+
+ /** The index in {@link #index} of the last value returned by {@link #get(int)}. */
+ private int lastIndex;
+
+ /** The value of <code>index[lastIdx]</code>. */
+ private int lastIndexValue;
+
+ /** The value of <code>index[lastIdx + 1]</code> (or <code>size</code> if <code>lastIdx == nonZero - 1</code>. */
+ private int lastIndexNextValue;
+
+ // ALERT: document that index is supposed to be SORTED!
+
+ /** Build a vector of given size from an array of values.
+ *
+ * @param size the size.
+ * @param value the value.
+ * @param index the index.
+ * @param id the id of description of this vector.
+ */
+ private ImmutableSparseVector (final int size, final double[] value, final int[] index, final int id) {
+ super (size, false, id);
+ this.value = value;
+ this.index = index;
+ this.nonZero = index.length;
+ // reset last index
+ lastIndex = lastIndexValue = lastIndexNextValue = -1;
+ }
+
+ /** Returns an instance of given size from an array of values.
+ *
+ * @param size the size.
+ * @param value the value.
+ * @param index the index.
+ * @param id the id of description of this vector.
+ * @return an immutable vector.
+ */
+ public static ImmutableSparseVector getInstance (final int size, final double[] value, final int[] index, final int id) {
+ // check for different length
+ if (value.length != index.length)
+ throw new IllegalArgumentException ("array with different size");
+
+ // check for negative value in index
+ for(int i = index.length; i-- != 0;)
+ if (index[i] < 0)
+ throw new IllegalArgumentException ("index with negative value");
+
+ return new ImmutableSparseVector (size, value, index, id);
+ }
+
+ /** Returns an instance containing all the values of a given vector larger or smaller of a given threshold.
+ * The vector will have size equal to the vector length.
+ *
+ * @param v the vector.
+ * @param threshold the threshold.
+ * @param id the id of description of this vector.
+ * @return an immutable vector
+ */
+ public static ImmutableSparseVector getInstance (final Vector v, final double threshold, final int id) {
+ final IntArrayList indexList = new IntArrayList();
+ final DoubleArrayList valueList = new DoubleArrayList();
+ final int size = v.size;
+
+ // add value grater than threshold
+ for(int i = size; i-- != 0;) {
+ final double d = v.get(i);
+ if (d < -threshold || d > threshold) {
+ indexList.add (i);
+ valueList.add (d);
+ }
+ }
+
+ return new ImmutableSparseVector (size, valueList.toDoubleArray(), indexList.toIntArray(), id);
+ }
+
+ /** Returns an instance containing all the values of a given array larger or smaller of a given threshold.
+ * The vector will have size equal to the array length.
+ *
+ * @param value tha array of double.
+ * @param threshold the threshold.
+ * @param id the id of description associated with this vector.
+ * @return an immutable vector.
+ */
+ public static ImmutableSparseVector getInstance (final double[] value, final double threshold, final int id) {
+ final IntArrayList indexList = new IntArrayList();
+ final DoubleArrayList valueList = new DoubleArrayList();
+ final int size = value.length;
+
+ // add value grater than threshold
+ for (int i = 0; i < size; i++) {
+ final double d = value[i];
+ if (d < -threshold || d > threshold) {
+ indexList.add (i);
+ valueList.add (d);
+ }
+ }
+
+ return new ImmutableSparseVector (size, valueList.toDoubleArray(), indexList.toIntArray(), id);
+ }
+
+ /** Returns an instance containing all the values of a given array larger than a given threshold.
+ * The vector will have the given size
+ *
+ * @param value tha array of double.
+ * @param threshold the threshold.
+ * @param id the id of description associated with this vector.
+ * @return an immutable vector.
+ */
+ public static ImmutableSparseVector getInstance (final int[] index, final double[] value, final int size, final double threshold, final int id) {
+ final IntArrayList indexList = new IntArrayList();
+ final DoubleArrayList valueList = new DoubleArrayList();
+
+ // add value grater than threshold
+ for (int i = 0; i < index.length; i++) {
+
+ if (value[i] >= threshold)
+ {
+ indexList.add (index [i]);
+ valueList.add (value [i]);
+ }
+ }
+
+ return new ImmutableSparseVector (size, valueList.toDoubleArray(), indexList.toIntArray(), id);
+ }
+
+ public void set (final int idx, final double val) {
+ throw new UnsupportedOperationException ("ImmutableSparseVector is immutable");
+ }
+
+ public double get (final int idx) {
+ if (idx < 0 || idx >= size)
+ throw new IllegalArgumentException ("index out of range");
+ // shortcut
+ if (idx > lastIndexValue && idx < lastIndexNextValue) // beetwen indeces
+ return 0.0;
+ if (idx == lastIndexNextValue) { // next index
+ lastIndex++;
+ lastIndexValue = index[lastIndex];
+ lastIndexNextValue = lastIndex == nonZero - 1 ? size : index[lastIndex + 1];
+ return value[lastIndex];
+ }
+
+ // shortcut failed...take the long way
+ int pos = Arrays.binarySearch(index, idx);
+ if (pos < 0) { // value not found -> reset
+ lastIndexValue = lastIndexNextValue = -1;
+ return 0.0;
+ }
+
+ lastIndex = pos;
+ lastIndexValue = index[lastIndex];
+ lastIndexNextValue = lastIndex == nonZero - 1 ? size : index[lastIndex + 1];
+ return value[pos];
+ }
+
+ public void add (final double alpha, final Vector v) {
+ throw new UnsupportedOperationException ("ImmutableSparseVector is immutable");
+ }
+
+ public void scale (final double alpha) {
+ throw new UnsupportedOperationException ("ImmutableSparseVector is immutable");
+ }
+
+ public void zero () {
+ throw new UnsupportedOperationException ("ImmutableSparseVector is immutable");
+ }
+
+ public double dotProduct (final Vector v) {
+ // check size
+ if (size != v.size)
+ throw new IllegalArgumentException ("vectors with different size");
+
+ // compute dot product
+ if (v instanceof ImmutableSparseVector) {
+ final ImmutableSparseVector sVector = (ImmutableSparseVector) v;
+ final double[] svValue = sVector.value;
+ final int[] svIndex = sVector.index;
+ final int last0 = nonZero, last1 = sVector.nonZero;
+
+ int i = 0, j = 0;
+ double dot = 0.0;
+
+ double c = 0.0, t, y;
+ while (i < last0 && j < last1) {
+ if (index[i] < svIndex[j])
+ i++;
+ else {
+ if (svIndex[j] < index[i])
+ j++;
+ else {
+ y = (value[i] * svValue[j]) - c;
+ t = dot + y;
+ c = (t - dot) - y;
+ dot = t;
+
+ //dot += value[i] * svValue[j];
+ i++;
+ j++;
+ }
+ }
+ }
+
+ return dot;
+ }
+ else if (v instanceof DenseVector) {
+ final double[] dvValue = ((DenseVector)v).value;
+
+ double dot = 0.0;
+ double c = 0.0, t, y;
+
+ for(int i = nonZero; i-- != 0;) {
+ y = (dvValue[index[i]] * value[i]) - c;
+ t = dot + y;
+ c = (t - dot) - y;
+ dot = t;
+
+ //dot += dvValue[index[i]] * value[i];
+ }
+
+ return dot;
+ }
+ else
+ return super.dotProduct (v);
+ }
+
+ public double euclideanDistance (final Vector v) {
+ // check size
+ if (size != v.size)
+ throw new IllegalArgumentException ("vectors with different size");
+
+ // compute distance
+ if (v instanceof DenseVector)
+ return v.euclideanDistance(this);
+ else if (v instanceof ImmutableSparseVector)
+ return Math.sqrt (Math.abs (dotProduct (this) + v.dotProduct (v) - 2 * dotProduct (v)));
+ else
+ return super.euclideanDistance (v);
+ }
+
+ public double ell2Norm () {
+ if (ell2norm == INVALID_NORM) {
+ double tempNorm = 0.0;
+ int i = nonZero;
+ while (--i >= 0)
+ tempNorm += value[i] * value[i];
+ ell2norm = Math.sqrt (Math.abs (tempNorm));
+ }
+
+ return ell2norm;
+ }
+
+ public double ell1Norm() {
+ if (ell1norm == INVALID_NORM) {
+ double tempNorm = 0.0;
+ for(int i = nonZero; i-- != 0 ;)
+ tempNorm += Math.abs(value[i]);
+ ell1norm = tempNorm;
+ }
+
+ return ell1norm;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/Int2DoubleMapVector.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/Int2DoubleMapVector.java
new file mode 100644
index 0000000..2a96b5a
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/Int2DoubleMapVector.java
@@ -0,0 +1,104 @@
+package it.unimi.dsi.law.vector;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.Int2DoubleAVLTreeMap;
+import it.unimi.dsi.fastutil.ints.Int2DoubleMap;
+
+//RELEASE-STATUS: DIST
+
+/** A mutable implementation of {@link Vector} for sparse vectors.
+ * It is possible to create sparse vector by iteratively adding values. This implementation uses the supertype method for every operation.
+ * It should be converted with {@link #toImmutableSparseVector()} before being actually used.
+ */
+public class Int2DoubleMapVector extends Vector {
+
+ static final long serialVersionUID = 2006002L;
+
+ /** Storage for vector values. */
+ private final Int2DoubleMap value;
+
+ /** Build a vector of given size with zero values.
+ *
+ * @param size the size.
+ * @param id the id of description of this vector.
+ */
+ private Int2DoubleMapVector (final int size, final int id) {
+ super (size, true, id);
+ value = new Int2DoubleAVLTreeMap();
+ ell2norm = ell1norm = 0.0;
+ }
+
+ /** Build a vector from the given {@link Int2DoubleMap}.
+ *
+ * @param size the size.
+ * @param value a map of values.
+ * @param id the id of description of this vector.
+ */
+ private Int2DoubleMapVector (final int size, final Int2DoubleMap value, final int id) {
+ super (size, true, id);
+ this.value = value;
+ ell2norm = INVALID_NORM;
+ }
+
+ /** Returns an instance of given size with zero values.
+ *
+ * @param size the size.
+ * @param id the id of description of this vector.
+ * @return a vector
+ */
+ public static Int2DoubleMapVector getInstance (final int size, final int id) {
+ return new Int2DoubleMapVector (size, id);
+ }
+
+ /** Returns an instance from the given {@link Int2DoubleMap}.
+ *
+ * @param size the size.
+ * @param value a map of values.
+ * @param id the id of description of this vector.
+ * @return a vector.
+ */
+ public static Int2DoubleMapVector getInstance (final int size, final Int2DoubleMap value, final int id) {
+ return new Int2DoubleMapVector (size, value, id);
+ }
+
+ public void set (final int idx, final double val) {
+ if (idx < 0 || idx >= size)
+ throw new IllegalArgumentException ("index out of range");
+
+ value.put (idx, val);
+ ell2norm = ell1norm = INVALID_NORM;
+ }
+
+ public double get (final int idx) {
+ if (idx < 0 || idx >= size)
+ throw new IllegalArgumentException ("index out of range");
+
+ return value.get (idx);
+ }
+
+ public ImmutableSparseVector toImmutableSparseVector () {
+ final double[] value = (this.value.values()).toDoubleArray();
+ final int[] index = (this.value.keySet()).toIntArray();
+
+ return ImmutableSparseVector.getInstance (size, value, index, id);
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/SimilarityStrategy.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/SimilarityStrategy.java
new file mode 100644
index 0000000..3c9a92e
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/SimilarityStrategy.java
@@ -0,0 +1,37 @@
+package it.unimi.dsi.law.vector;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import java.io.Serializable;
+
+//RELEASE-STATUS: DIST
+
+/** An interface specifying methods used to obtain pattern similarities. */
+public interface SimilarityStrategy extends Serializable{
+
+ /** Returns the similarity value between two vectors.
+ *
+ * @param v0 the first vector.
+ * @param v1 the second vector.
+ * @return the similarity between vectors.
+ */
+ public double similarity (Vector v0, Vector v1);
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/Vector.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/Vector.java
new file mode 100644
index 0000000..92f5084
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/vector/Vector.java
@@ -0,0 +1,203 @@
+package it.unimi.dsi.law.vector;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import java.io.Serializable;
+
+//RELEASE-STATUS: DIST
+
+/** A class representing a vector of <code>double</code>. Different implementation can allow mutable or immutable values
+ * through suitably implemented {@link #set(int, double)} and {@link #get(int)} method. Immutable implementation should throw an
+ * {@link java.lang.UnsupportedOperationException} if a method that cause a mutation is called.
+ */
+public abstract class Vector implements Serializable {
+ private static final long serialVersionUID = 1L;
+
+ /** A value indicating that the norm is not computed for current values. */
+ public static final int INVALID_ID = -1;
+
+ /** A value indicating that the norm is not computed for current values. */
+ public static final double INVALID_NORM = -1.0;
+
+ /** The vector size (immutable). */
+ public final int size;
+
+ /** The description ID associated with this vector (immutable). */
+ public final int id;
+
+ /** The value of computed norm. It can be {@link #INVALID_NORM} if the norm is not computer for current values. */
+ protected double ell2norm;
+
+ /** The value of computed L1 norm. */
+ protected double ell1norm;
+
+ /** The mutability status of this vector. */
+ private final boolean mutable;
+
+ /** Build a vector of given size and set the mutability status of this vector.
+ *
+ * @param size the size.
+ * @param mutable the mutability status.
+ * @param id the id of description of this vector.
+ */
+ protected Vector (final int size, final boolean mutable, final int id) {
+ this.size = size;
+ this.mutable = mutable;
+ this.id = id;
+ ell2norm = ell1norm = INVALID_NORM;
+ }
+
+ /** Sets the value <var>val</var> at index <var>idx</var>.
+ *
+ * @param idx the index.
+ * @param val the value.
+ */
+ public abstract void set (final int idx, final double val);
+
+ /** Gets the value at index <var>idx</var>.
+ *
+ * @param idx the index.
+ * @return the value at index <var>idx</var>.
+ */
+ public abstract double get (final int idx);
+
+ /** Adds values in vector <var>v</var> scaled by <var>alpha</var> to this vector.
+ *
+ * @param alpha the scaling factor.
+ * @param v the vector to add.
+ */
+ public void add (final double alpha, final Vector v) {
+ // check size
+ if (size != v.size)
+ throw new IllegalArgumentException ("vectors with different size");
+
+ for(int i = size; i-- != 0;)
+ set (i, get(i) + v.get(i) * alpha);
+
+ ell2norm = INVALID_NORM;
+ }
+
+ /** Scale values in this vector by a value <var>alpha</var>.
+ *
+ * @param alpha the scaling factor
+ */
+ public void scale (final double alpha) {
+ for(int i = size; i-- != 0;)
+ set (i, alpha * get (i));
+
+ // update norm
+ if (ell2norm != INVALID_NORM)
+ ell2norm *= Math.abs (alpha);
+ if (ell1norm != INVALID_NORM)
+ ell1norm *= Math.abs(alpha);
+ }
+
+ /** Reset (to zero) this vector.
+ *
+ *
+ */
+ public void zero () {
+ for (int i = 0; i < size; i++)
+ set (i, 0.0);
+
+ ell2norm = ell1norm = 0.0; // update norm
+ }
+
+ /** Returns the mutability status of this vector.
+ *
+ * @return <code>true</code> if the vector is mutable; <code>false</code> otherwise.
+ */
+ public boolean isMutable () {
+ return mutable;
+ }
+
+ /** Returns the dot product between <var>v</var> and this vector.
+ *
+ * @param v the vector.
+ * @return the dot product.
+ */
+ public double dotProduct (final Vector v) {
+ // check size
+ if (size != v.size)
+ throw new IllegalArgumentException ("vectors with different size");
+
+ // compute dot product
+ double dot = 0.0;
+
+ double c = 0.0, t, y;
+ for(int i = size; i-- != 0;) {
+ y = (get(i) * v.get(i)) - c;
+ t = dot + y;
+ c = (t - dot) - y;
+ dot = t;
+
+ //dot += get (i) * v.get(i);
+ }
+
+ return dot;
+ }
+
+ /** Returns the euclidean distance between <var>v</var> and this vector.
+ *
+ * @param v the vector.
+ * @return the euclidean distance.
+ */
+ public double euclideanDistance (final Vector v) {
+ // check size
+ if (size != v.size)
+ throw new IllegalArgumentException ("vectors with different size");
+
+ // compute distance
+ double dist = 0.0, temp;
+
+ for(int i = size; i-- != 0;) {
+ temp = get(i) - v.get(i);
+ dist += temp * temp;
+ }
+
+ return Math.sqrt (Math.abs (dist));
+ }
+
+ /** Returns the l<sub>2</sub> norm of this vector.
+ *
+ * @return the l<sub>2</sub> norm.
+ */
+ public double ell2Norm () {
+ if (ell2norm == INVALID_NORM)
+ ell2norm = Math.sqrt (Math.abs (dotProduct(this))); // just a shortcut
+
+ return ell2norm;
+ }
+
+ /** Returns the l<sub>1</sub> norm of this vector.
+ */
+ public double ell1Norm() {
+ if (ell1norm == INVALID_NORM)
+ {
+ double tempNorm = 0.0;
+ for(int i = size; i-- != 0;)
+ tempNorm += Math.abs (get(i));
+ ell1norm = tempNorm;
+ }
+ return ell1norm;
+ }
+
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/AbstractFilter.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/AbstractFilter.java
new file mode 100644
index 0000000..66d3ec7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/AbstractFilter.java
@@ -0,0 +1,54 @@
+package it.unimi.dsi.law.warc.filters;
+
+import org.apache.commons.lang.StringUtils;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+// RELEASE-STATUS: DIST
+
+/** An abstract implementation of a {@link Filter} providing a {@link #toString(Object...) method}
+ * that helps in implementing properly {@link #toString()} for atomic (i.e., class-based) filters. */
+
+public abstract class AbstractFilter<T> implements Filter<T> {
+
+ /** A helper method that generates a string version of this filter (mainly
+ * useful for atomic, i.e., class-based, filters).
+ *
+ * <p>The output format is
+ *
+ * <code>&lt;classname&gt;(&lt;arg&gt;, &lt;arg&gt;, ...)</code>
+ *
+ * when &lt;classname&gt;
+ * is the simple filter class name, if the filter class belongs
+ * to the {@link #FILTER_PACKAGE_NAME} package, or the fully
+ * qualified filter class name otherwise, and the arguments
+ * are the string representations of the arguments of this method.
+ *
+ * @param arg arguments for the string representation above.
+ * @return the string representation specified above.
+ */
+ protected String toString(final Object... arg) {
+ // TODO: handle commas inside arguments
+ if (this.getClass().getPackage().getName().equals(AbstractFilter.FILTER_PACKAGE_NAME))
+ return this.getClass().getSimpleName() + "(" + StringUtils.join(arg, ',') + ")";
+ else
+ return this.getClass().getName() + "(" + StringUtils.join(arg, ',') + ")";
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/ContentTypeStartsWith.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/ContentTypeStartsWith.java
new file mode 100644
index 0000000..f1cabec
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/ContentTypeStartsWith.java
@@ -0,0 +1,55 @@
+package it.unimi.dsi.law.warc.filters;
+
+import com.google.common.net.HttpHeaders;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.util.HttpResponse;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only fetched response whose content type starts with a given string.
+ *
+ * <p>Typical usage: <code>ContentTypeStartsWith(text/)</code>,
+ */
+public class ContentTypeStartsWith extends AbstractFilter<HttpResponse> {
+ /** The prefix of accepted content types. */
+ private final String prefix;
+
+ public ContentTypeStartsWith(final String prefix) {
+ this.prefix = prefix;
+ }
+
+ public boolean apply(HttpResponse x) {
+ final String header = x.headers().get(HttpHeaders.CONTENT_TYPE);
+ return header != null && header.startsWith(prefix);
+ }
+
+ public static ContentTypeStartsWith valueOf(String spec) {
+ return new ContentTypeStartsWith(spec);
+ }
+
+ public String toString() {
+ return toString(prefix);
+ }
+
+ public boolean equals(Object x) {
+ return x instanceof ContentTypeStartsWith && ((ContentTypeStartsWith)x).prefix.equals(prefix);
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/DigestEquals.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/DigestEquals.java
new file mode 100644
index 0000000..5fc42f9
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/DigestEquals.java
@@ -0,0 +1,50 @@
+package it.unimi.dsi.law.warc.filters;
+
+import java.util.Arrays;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.Util;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only records of given digest, specified as a hexadecimal string. */
+public class DigestEquals extends AbstractFilter<WarcRecord> {
+ private final byte[] digest;
+
+ private DigestEquals(final byte[] digest) {
+ this.digest = digest;
+ }
+
+ public boolean apply(final WarcRecord x) {
+ String s = x.header.anvlFields.get(HttpResponse.DIGEST_HEADER);
+ return s != null && Arrays.equals(digest, Util.fromHexString(s));
+ }
+
+ public static DigestEquals valueOf(final String spec) {
+ return new DigestEquals(Util.fromHexString(spec));
+ }
+
+ public String toString() {
+ return toString(Util.toHexString(digest));
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/DuplicateSegmentsLessThan.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/DuplicateSegmentsLessThan.java
new file mode 100644
index 0000000..f3a4b9b
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/DuplicateSegmentsLessThan.java
@@ -0,0 +1,326 @@
+package it.unimi.dsi.law.warc.filters;
+
+import java.net.URI;
+import java.util.Arrays;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.AbstractIntComparator;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.law.bubing.util.BURL;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs whose path does not contain too many duplicate segments.
+ *
+ * <p>It is not uncommon to find URIs generated by badly configured 404
+ * pages that look like <code>http://example.com/foo/bar/foo/bar/&hellip;</code>.
+ * This filter will not accept such URIs if some sequence of consecutive segments
+ * appears more times than a given threshold.
+ *
+ * <p>This implementation uses ideas from <i>Linear-Time Longest-Common-Prefix
+ * Computation in Suffix Arrays and Its Applications</i>, by Toru Kasai, Gunho Lee, Hiroki Arimura,
+ * Setsuo Arikawa, and Kunsoo Park, in <i>Proc. of the 12th Annual Symposium on
+ * Combinatorial Pattern Matching</i>,
+ * number 2089 of Lecture Notes In Computer Science, pages 181&minus;192, Springer-Verlag, 2001, to simulate
+ * a suffix-tree visit on a suffix array, and ideas from
+ * <i>Simple and flexible detection of contiguous repeats using a suffix tree</i>, by
+ * Jens Stoye and Dan Gusfield, <i>Theoret. Comput. Sci.</i> 270:843&minus;856, 2002,
+ * for the linear-time detection of tandem arrays using suffix trees.
+ *
+ * <p>The resulting code is one order of magnitude faster than regular expressions.
+ */
+
+public class DuplicateSegmentsLessThan extends AbstractFilter<URI> {
+
+ private static final boolean DEBUG = false;
+ private static final boolean ASSERTS = true;
+ /** The extra symbol (usually denoted with $ in the literature) added at the end of
+ * a string to force shorter suffixes to come after in lexicographical ordering. */
+ private static final char EXTRA_SYMBOL = 65535;
+ /** URIs with a duplicate sequence of consecutive segments longer than this value won't be accepted. */
+ private final int threshold;
+
+ /** Creates a filter that only accepts URIs whose path does contains less duplicate consecutive segments than
+ * the given threshold.
+ *
+ * @param threshold the duplicate-segment threshold (at least 2); if a URI contains less than
+ * this number of duplicate consecutive segments it will be accepted.
+ */
+ public DuplicateSegmentsLessThan(final int threshold) {
+ if (threshold < 2) throw new IllegalArgumentException("This filter requires a threshold larger than one");
+ this.threshold = threshold;
+ }
+
+ private void matches(final boolean b, final String s) {
+ final Matcher m0 = Pattern.compile(".*(/.*)\\1{" + (threshold - 1) + ",}/.*").matcher(s);
+ final Matcher m1 = Pattern.compile(".*(/.*)\\1{" + (threshold - 1) + ",}").matcher(s);
+ assert b != (m0.matches() || m1.matches()) : s + " (" + ! b + (! b ? "" : ", " +
+ (m0.matches() ? m0.group(1) : m1.group(1))) + ")";
+ }
+
+ @Override
+ public boolean apply(final URI url) {
+
+ final String s = url.getRawPath();
+ final int length = s.length();
+ final boolean pathEndsWithSlash = s.charAt(length - 1) == '/';
+
+ final char[] path = new char[length + 1 + (! pathEndsWithSlash? 1 : 0)];
+ path[path.length - 1] = EXTRA_SYMBOL; // Usual suffix-array trick
+ if (!pathEndsWithSlash) path[path.length - 2] = '/'; // To guarantee that each segment ends with a slash
+ s.getChars(0, length, path, 0);
+
+ // Phase 1: count slashes
+ int c = 0;
+ for(int i = length; i-- != 0;) if (path[i] == '/') c++;
+ if (c < threshold) {
+ if (ASSERTS) matches(true, s);
+ return true; // No way
+ }
+
+ // Phase 2: allocate and fill start array
+ final int[] start = new int[c];
+ c = 0;
+ for(int i = 0; i < length; i++) if (path[i] == '/') start[c++] = i;
+
+ // Phase 3: build suffix array for path components and compute largest number of common path segments
+
+ final int[] a = new int[c];
+ for(int i = c; i-- != 0;) a[i] = i;
+
+ IntArrays.quickSort(a, 0, c, new AbstractIntComparator() {
+ public int compare(final int x, final int y) {
+ if (x == y) return 0;
+ int j = start[x], k = start[y];
+ while(path[++j] == path[++k]);
+ return path[j] - path[k];
+ }
+ });
+
+ // Linear-time LCP computation, from Kasai et. al paper.
+ final int[] r= new int[c];
+ for(int i = c; i-- != 0;) r[a[i]] = i;
+
+ final int[] lcp = new int[c + 1]; // Last element account for the $ element
+ int h = 0;
+ int p = 1;
+ boolean maxNonZero = false;
+
+ for(int i = 0; i < c; i++) {
+ if (r[i] > 0) {
+ int j = a[r[i] - 1];
+ final int starti = start[i];
+ final int startj = start[j];
+ while(path[starti + p] == path[startj + p]) {
+ if (path[starti + p] == '/') h++;
+ p++;
+ }
+
+ lcp[r[i]] = h;
+ if (h > 0) {
+ maxNonZero = true;
+ // Discard first common segment
+ int k = 1;
+ while(path[starti + k] != '/') k++;
+ p -= k;
+ h--;
+ }
+ else p = 1;
+ }
+ }
+
+ if (! maxNonZero) {
+ if (ASSERTS) matches(true, s);
+ return true; // Not a single common prefix
+ }
+
+ if (ASSERTS) {
+ final int[] lcp2 = new int[c + 1];
+ for(int i = c; i-- != 1;) {
+ final int starti = start[a[i - 1]];
+ final int startipp = start[a[i]];
+
+ int k = 1;
+ int n = 0;
+ while(path[starti + k] == path[startipp + k]) {
+ if (path[starti + k] == '/') n++;
+ k++;
+ }
+
+ lcp2[i] = n;
+ }
+
+ assert Arrays.equals(lcp2, lcp);
+ }
+
+ if (DEBUG) System.err.println("Path: " + Arrays.toString(path));
+ if (DEBUG) System.err.println("Start: " + Arrays.toString(start));
+ if (DEBUG) System.err.println("Suffixes: " + Arrays.toString(a));
+ if (DEBUG) System.err.println("Common paths: " + Arrays.toString(lcp));
+
+ // Phase 4: Simulate depth-first visit of the suffix tree
+
+ // Simulated visit of the associated suffix tree, always by Kasai et. al.
+
+ // A stack for left extremes and depth, initialised with -1, -1.
+ final int[] ls = new int[c + 1], ds = new int[c + 1];
+ /* A support array where, while visiting a node, we will store the length of the
+ * maximal arithmetic progression of ratio d among the leaves of the current
+ * node. */
+ final int[] prog = new int[c];
+ ls[0] = ds[0] = -1;
+ p = 1;
+
+ int llca, dlca;
+ int l, d;
+
+ for(int i = 0; i < c; i++) {
+ llca = i;
+ dlca = lcp[i + 1]; // Note that when i == c - 1 then lcp[i + 1] == 0.
+
+ while(ds[p - 1] > dlca) {
+ // Pop (l,d) off the stack
+ l = ls[--p];
+ d = ds[p];
+
+ if (DEBUG) System.err.printf("Got triple <" + l + ", " + i + ", " + d + "\n");
+ if (DEBUG) System.err.println(IntArrayList.wrap(a).subList(l, i + 1));
+ // Now we have a visit interval start at L, ending at i of depth H
+ if (i - l + 1 >= threshold) {
+ /* Now we have a list of leaves which share a common prefix of length d.
+ * Stoye and Gusfield note that we can find an arithmetic progression of
+ * ratio d among those leaves (e.g., we can find leaves whose associated positions are
+ * i, i+d, i+2d, ..., i+(k-1)d) iff those positions
+ * are the starting points of a tandem array of length k.
+ *
+ * To do this in linear time, we exploit the fact (noted by Stoye and Gusfield)
+ * that for l <= j <= i, r[a[j] + t * d] is the position in the string of
+ * a[j] + t * d, which means that a[j] + t * d is in the set of
+ * leaves under examination (i.e., a[l..i]) iff r[a[j] + t * d]
+ * is between l and i (inclusive).
+ *
+ * To avoid testing all elements separately (which would require potentially
+ * (i - l + 1) * k tests) we use prog either to remember the length of the longest
+ * increasing progression found starting with the corresponding element of a,
+ * or to remember that an element need not being examined because it cannot lead
+ * to maximal progressions.
+ *
+ * Starting from each leaf a[j], we try to
+ * extent greedily an arithmetic progression of ratio d, and record its length
+ * in prog[j]. When examining the following elements, if following the progression
+ * we hit an element with nonzero prog, we can just sum to the current length
+ * the number found thereis and break the loop, as the maximal arithmetic
+ * progression of ratio d from our current position has been already computed.
+ */
+ Arrays.fill(prog, l, i + 1, 0);
+ for(int j = l; j <= i; j++) {
+ if (prog[j] != 0) continue;
+ int t = 1, u = a[j], k = u, pos;
+ for(;;) {
+ k += d; // The next element of the progression
+ if (k >= c) break;
+ pos = r[k]; // Its position (in [l..i])
+ if (pos < l || i < pos) break;
+ else if (prog[pos] != 0) {
+ if (ASSERTS) assert prog[pos] > 0 : "l=" + l + " , i=" + i + ", j=" + j + ", t=" + t + ", a=" + Arrays.toString(a) + ", prog=" + Arrays.toString(prog);
+ t += prog[pos];
+ break;
+ }
+ t++;
+ }
+ if (t >= threshold) {
+ if (ASSERTS) matches(false, s);
+ return false;
+ }
+ prog[j] = t;
+ // We backtrack, putting -1 in all intermediate entries so we won't examine them further
+ while((k -= d) != u) prog[r[k]] = -1;
+ }
+ }
+ llca = l;
+
+ }
+
+ if (ds[p - 1] < dlca) {
+ // Push (llca, dlca) on the stack
+ ls[p] = llca;
+ ds[p++] = dlca;
+ }
+ }
+
+ if (ASSERTS) matches(true, s);
+ return true;
+ }
+
+ public static DuplicateSegmentsLessThan valueOf(String spec) {
+ return new DuplicateSegmentsLessThan(Integer.parseInt(spec));
+ }
+
+ public String toString() {
+ return toString(Integer.toString(threshold));
+ }
+
+ public boolean equals(Object x) {
+ return x instanceof DuplicateSegmentsLessThan && ((DuplicateSegmentsLessThan)x).threshold == threshold;
+ }
+
+ public static void main(String arg[]) {
+ // A simple speed test for this filter.
+
+ final int rep = Integer.parseInt(arg[0]);
+ final long times = Long.parseLong(arg[1]);
+
+ Pattern p = Pattern.compile(".*/(.*/)\\1{" + (rep - 1) + ",}.*");
+ Matcher m;
+ //String url = "http://example.com/test/foo/bar/foo/bar/foo/mu/foo/bar/foo/bar/foo/bar/";
+ //String url = "http://example.com/test/foo/bar1/foo/bar2/foo/mu/foo/bar3/foo/bar4/foo/bar5/test/";
+ String uri = "http://example.com/test/foo/bar1/foo/bar2/foo/mu/foo/bar3/foo/bar4/foo/bar5/test/foo/bar1/foo/bar2/foo/mu/foo/bar3/foo/bar4/foo/bar5/test/";
+ DuplicateSegmentsLessThan filter = new DuplicateSegmentsLessThan(rep);
+ URI buri = BURL.parse(uri);
+
+ long start;
+
+ System.err.println("Regex: " + (! p.matcher(uri).matches()));
+ System.err.println("Filter: " + filter.apply(buri));
+
+ for(int k = 10; k-- != 0;) {
+
+ start = -System.currentTimeMillis();
+
+ for(long i = times; i-- != 0;) {
+ m = p.matcher(uri);
+ m.matches();
+ }
+
+ start += System.currentTimeMillis();
+ System.err.printf("Regex: %f Kcalls/s\n", Double.valueOf(times / (double)start));
+
+ start = -System.currentTimeMillis();
+
+ for(long i = times; i-- != 0;) filter.apply(buri);
+
+ start += System.currentTimeMillis();
+ System.err.printf("Filter: %f Kcalls/s\n", Double.valueOf(times / (double)start));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/Filter.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/Filter.java
new file mode 100644
index 0000000..0dc3837
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/Filter.java
@@ -0,0 +1,54 @@
+package it.unimi.dsi.law.warc.filters;
+
+import com.google.common.base.Predicate;
+
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.util.Response;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+//RELEASE-STATUS: DIST
+
+/** A filter is a strategy to decide whether to accept a given
+ * object or not. Typically <code>T</code> will be either {@link BURL} or
+ * {@link Response}. Technically it is identical to the Google Guava
+ * {@link Predicate} interface, but there are some conventions listed
+ * below that apply only to filters.
+ *
+ * <p>By contract, every filter that is an instance of a non-anonymous
+ * filter class is supposed to have a <strong>static</strong>
+ * method with the following signature
+ * <pre>public static Filter&lt;T&gt; valueOf(String x)</pre>
+ * that returns a filter (typically, a filter of its own kind) from
+ * a string. Moreover, it is required, for every filter class <code>F</code>
+ * and for every instance <code>f</code>, that <code>toString()</code> returns
+ * <pre><var>classname</var>(<var>spec</var>)</pre>
+ * where <pre>f.equals(F.valueOf(<var>spec</var>))</pre>
+ *
+ * <p>Note that <code><var>classname</var></code> can omit the package name if
+ * it is {@link #FILTER_PACKAGE_NAME}.
+ */
+
+public interface Filter<T> extends Predicate<T> {
+
+ /** The name of the package that contains this interface as well as
+ * most filters.
+ */
+ public final static String FILTER_PACKAGE_NAME = Filter.class.getPackage().getName();
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/Filters.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/Filters.java
new file mode 100644
index 0000000..db363cb
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/Filters.java
@@ -0,0 +1,243 @@
+package it.unimi.dsi.law.warc.filters;
+
+import java.lang.reflect.Method;
+import java.net.URI;
+
+import org.apache.commons.lang.StringUtils;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import it.unimi.dsi.law.warc.filters.parser.ParseException;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+
+// RELEASE-STATUS: DIST
+
+/** A collection of static methods to deal with {@link Filter filters}. */
+public class Filters {
+
+ public static final Filter<?>[] EMPTY_ARRAY = {};
+
+ /** A set containing all filters in the bubing.filters package. This primitive technique
+ * is used to circumvent the impossibility of obtaining all classes in a package by reflection. */
+ @SuppressWarnings("unchecked")
+ private static final ObjectOpenHashSet<Class<? extends Filter<?>>> FILTERS = new ObjectOpenHashSet<Class<? extends Filter<?>>>(
+ // TODO: periodically check that this list is complete.
+ new Class[] { ContentTypeStartsWith.class, DigestEquals.class, DuplicateSegmentsLessThan.class,
+ HostEndsWith.class, HostEquals.class, IsHttpResponse.class, PathEndsWithOneOf.class,
+ SchemeEquals.class, StatusCategory.class, URLEquals.class,
+ URLMatchesRegex.class, URLShorterThan.class, IsProbablyBinary.class
+ }
+ );
+
+ /** Produces the conjunction of the given filters.
+ *
+ * @param <T> the type of objects that the filters deal with.
+ * @param f the filters.
+ * @return the conjunction.
+ */
+ @SafeVarargs
+ public static<T> Filter<T> and(final Filter<T>... f) {
+ return new Filter<T>() {
+ public boolean apply(final T x) {
+ for (Filter<T> filter: f) if (! filter.apply(x)) return false;
+ return true;
+ }
+
+ public String toString() {
+ return "(" + StringUtils.join(f, " and ") + ")";
+ }
+ };
+ }
+
+ /** Produces the disjunction of the given filters.
+ *
+ * @param <T> the type of objects that the filters deal with.
+ * @param f the filters.
+ * @return the disjunction.
+ */
+ @SafeVarargs
+ public static<T> Filter<T> or(final Filter<T>... f) {
+ return new Filter<T>() {
+ public boolean apply(final T x) {
+ for (Filter<T> filter: f) if (filter.apply(x)) return true;
+ return false;
+ }
+
+ public String toString() {
+ return "(" + StringUtils.join(f, " or ") + ")";
+ }
+
+ };
+ }
+
+ /** Produces the negation of the given filter.
+ *
+ * @param <T> the type of objects that the filter deal with.
+ * @param filter the filter.
+ * @return the negation of the given filter.
+ */
+ public static<T> Filter<T> not(final Filter<T> filter) {
+ return new AbstractFilter<T>() {
+ public boolean apply(final T x) {
+ return ! filter.apply(x);
+ }
+
+ public String toString() {
+ return "(not " + filter + ")";
+ }
+ };
+ }
+
+ // TODO: change this to a static, correctly typed method.
+ /** The constantly true filter. */
+ @SuppressWarnings("rawtypes")
+ public static Filter TRUE = new Filter() {
+ public boolean apply(Object x) {
+ return true;
+ }
+
+ public String toString() {
+ return "true";
+ }
+ };
+
+ @SuppressWarnings("rawtypes")
+ /** The constantly false filter. */
+ public static Filter FALSE = new Filter() {
+ public boolean apply(Object x) {
+ return false;
+ }
+
+ public String toString() {
+ return "false";
+ }
+ };
+
+
+ /** Creates a filter from a filter class name and an external form.
+ *
+ * @param className the name of a filter class; it may either be a single class name (in which case it
+ * will be qualified with {@link Filter#FILTER_PACKAGE_NAME}) or a fully qualified classname.
+ * @param spec the specification from which the filter will be created, using the <tt>valueOf(String)</tt> method (see {@link Filter}).
+ * @param tClass the base class of the filter that is desired: it should coincide with <code>T</code>; if the base type <code>D</code> of
+ * the filter is wrong, it will try to adapt it by using a static method in the Filters class whose signature is
+ * <pre>public static Filter&lt;T&gt; adaptD2T(Filter&lt;D&gt;)</pre>.
+ * @return the filter.
+ */
+ @SuppressWarnings("unchecked")
+ public static<T> Filter<T> getFilterFromSpec(String className, String spec, Class<T> tClass) throws ParseException {
+ String filterClassName;
+
+ if (className.indexOf('.') >= 0) filterClassName = className;
+ else filterClassName = Filter.FILTER_PACKAGE_NAME + "." + className;
+ try {
+ // Produce the filter
+ Class<?> c = Class.forName(filterClassName);
+ if (! Filter.class.isAssignableFrom(c)) throw new ParseException(filterClassName + " is not a valid filter class");
+ Filter<T> filter = (Filter<T>)c.getMethod("valueOf", String.class).invoke(null, spec);
+
+ // Extract its base type
+ final Method method[] = filter.getClass().getMethods();
+ int i;
+ for (i = 0; i < method.length; i++) if (! method[i].isSynthetic() && method[i].getName().equals("apply")) break;
+ if (i == method.length) throw new NoSuchMethodException("Could not find apply method in filter " + filter);
+ final Class<?>[] parameterTypes = method[i].getParameterTypes();
+ if (parameterTypes.length != 1) throw new NoSuchMethodException("Could not find one-argument apply method in filter " + filter);
+ final Class<?> toClass = parameterTypes[0];
+
+ // Possibly: adapt the filter
+ if (toClass.equals(tClass)) return filter;
+ else {
+ Method adaptMethod;
+ try {
+ adaptMethod = Filters.class.getMethod("adaptFilter" + toClass.getSimpleName() + "2" + tClass.getSimpleName(), Filter.class);
+ } catch (NoSuchMethodException e) {
+ throw new NoSuchMethodException("Cannot adapt a Filter<" + toClass.getSimpleName() + "> into Filter<" + tClass.getSimpleName() + ">");
+ }
+ return (Filter<T>)adaptMethod.invoke(null, filter);
+ }
+ }
+ catch(ParseException e) {
+ throw e;
+ }
+ catch (Exception e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ /** Adapts a filter with {@link String} base type to a filter with {@link URI} base type. For testing purposes only.
+ *
+ * @param original the original filter.
+ * @return the adapted filter.
+ */
+ public static Filter<URI> adaptFilterString2URI(final Filter<String> original) {
+ return new AbstractFilter<URI>() {
+ public boolean apply(final URI uri) {
+ return original.apply(uri.toString());
+ }
+ public String toString() {
+ return original.toString();
+ }
+ };
+ }
+
+ /** Adapts a filter with {@link URI} base type to a filter with {@link HttpResponse} base type.
+ *
+ * @param original the original filter.
+ * @return the adapted filter.
+ */
+ public static Filter<HttpResponse> adaptFilterURI2HttpResponse(final Filter<URI> original) {
+ return new AbstractFilter<HttpResponse>() {
+ public boolean apply(final HttpResponse response) {
+ return original.apply(response.uri());
+ }
+ public String toString() {
+ return original.toString();
+ }
+ };
+ }
+
+ /** Adapts a filter with {@link URI} base type to a filter with {@link WarcRecord} base type.
+ *
+ * @param original the original filter.
+ * @return the adapted filter.
+ */
+ public static Filter<WarcRecord> adaptFilterURI2WarcRecord(final Filter<URI> original) {
+ return new AbstractFilter<WarcRecord>() {
+ public boolean apply(WarcRecord x) {
+ return original.apply(x.header.subjectUri);
+ }
+ public String toString() {
+ return original.toString();
+ }
+ };
+ }
+
+ /** Returns a list of the standard filter classes.
+ *
+ * @return a list of standard filter classes.
+ */
+ @SuppressWarnings("unchecked")
+ public static Class<? extends Filter<?>>[] standardFilters() {
+ return FILTERS.toArray(new Class[FILTERS.size()]);
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/HostEndsWith.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/HostEndsWith.java
new file mode 100644
index 0000000..4de4108
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/HostEndsWith.java
@@ -0,0 +1,61 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.net.URI;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs whose host ends with (case-insensitively) a certain suffix.
+ *
+ * <p>Note that {@link #apply(URI)} will throw an {@link IllegalArgumentException}
+ * if the argument has a null {@linkplain URI#getHost() host}.
+ */
+public class HostEndsWith extends AbstractFilter<URI> {
+
+ /** The accepted host suffix (lowercased). */
+ private final String suffix;
+
+ /** Creates a filter that only accepts URLs with a given suffix.
+ *
+ * @param suffix the accepted suffix.
+ */
+ public HostEndsWith(final String suffix) {
+ this.suffix = suffix.toLowerCase();
+ }
+
+ public boolean apply(final URI uri) {
+ if (uri.getHost() == null) throw new IllegalArgumentException("URI \"" + uri + "\" has no host");
+ // BURL hosts are always lower cased
+ return uri.getHost().endsWith(suffix);
+ }
+
+ public static HostEndsWith valueOf(String spec) {
+ return new HostEndsWith(spec);
+ }
+
+ public String toString() {
+ return toString(suffix);
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof HostEndsWith) return ((HostEndsWith)x).suffix.equals(suffix);
+ else return false;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/HostEquals.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/HostEquals.java
new file mode 100644
index 0000000..4a246a3
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/HostEquals.java
@@ -0,0 +1,62 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.net.URI;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs whose host equals (case-insensitively) a certain string.
+ *
+ * <p>Note that {@link #apply(URI)} will throw an {@link IllegalArgumentException}
+ * if the argument has a {@code null} {@linkplain URI#getHost() host}.
+ */
+public class HostEquals extends AbstractFilter<URI> {
+
+ /** The accepted host. */
+ private final String host;
+
+ /** Creates a filter that only accepts URLs with a given host.
+ *
+ * @param host the accepted host.
+ */
+ public HostEquals(final String host) {
+ this.host = host;
+ }
+
+ public boolean apply(final URI uri) {
+ if (uri.getHost() == null) throw new IllegalArgumentException("URI \"" + uri + "\" has no host");
+ // BURL hosts are always lower cased
+ return uri.getHost().equals(host);
+ }
+
+ public static HostEquals valueOf(String spec) {
+ return new HostEquals(spec);
+ }
+
+ public String toString() {
+ return toString(host);
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof HostEquals) return ((HostEquals)x).host.equals(host);
+ else return false;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/IsHttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/IsHttpResponse.java
new file mode 100644
index 0000000..fb3cdb5
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/IsHttpResponse.java
@@ -0,0 +1,47 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.io.WarcRecord;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only records that are http/https responses. */
+public class IsHttpResponse extends AbstractFilter<WarcRecord> {
+
+ public final static IsHttpResponse INSTANCE = new IsHttpResponse();
+
+ private IsHttpResponse() {}
+
+ public boolean apply(final WarcRecord x) {
+ WarcRecord.Header header = x.header;
+ return (header.recordType == WarcRecord.RecordType.RESPONSE &&
+ (header.contentType == WarcRecord.ContentType.HTTP ||
+ header.contentType == WarcRecord.ContentType.HTTPS));
+ }
+
+ public static IsHttpResponse valueOf(final String emptySpec) {
+ if (emptySpec.length() > 0) throw new IllegalArgumentException();
+ return INSTANCE;
+ }
+
+ public String toString() {
+ return getClass().getSimpleName() + "()";
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/IsProbablyBinary.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/IsProbablyBinary.java
new file mode 100644
index 0000000..5f939f5
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/IsProbablyBinary.java
@@ -0,0 +1,74 @@
+package it.unimi.dsi.law.warc.filters;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.util.HttpResponse;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only http responses whose content stream appears to be binary. */
+public class IsProbablyBinary extends AbstractFilter<HttpResponse> {
+
+ public static final IsProbablyBinary INSTANCE = new IsProbablyBinary();
+ public static final int BINARY_CHECK_SCAN_LENGTH = 1000;
+ /** The number of zeroes that must appear to cause the page to be considered probably
+ * binary. Some misconfigured servers emit one or two ASCII NULs at the start of their
+ * pages, so we use a relatively safe value. */
+ public static final int THRESHOLD = 3;
+
+ private IsProbablyBinary() {}
+
+ /** This method implements a simple heuristic for guessing whether a page is binary.
+ *
+ * <P>The first {@link #BINARY_CHECK_SCAN_LENGTH} bytes are scanned: if we find more than
+ * {@link #THRESHOLD} zeroes, we deduce that this page is binary. Note that this works
+ * also with UTF-8, as no UTF-8 legal character encoding contains these characters (unless
+ * you're encoding 0, but this is not our case).
+ *
+ * @return <code>true</code> iff this page has most probably a binary content.
+ * @throws NullPointerException if the page has no byte content.
+ */
+ public boolean apply(final HttpResponse httpResponse) {
+ try {
+ final InputStream content = httpResponse.contentAsStream();
+ int count = 0;
+ for(int i = BINARY_CHECK_SCAN_LENGTH; i-- != 0;) {
+ final int b = content.read();
+ if (b == -1) return false;
+ if (b == 0 && ++count == THRESHOLD) return true;
+ }
+ }
+ catch(IOException shouldntReallyHappen) {
+ throw new RuntimeException(shouldntReallyHappen);
+ }
+ return false;
+ }
+
+ public static IsProbablyBinary valueOf(final String emptySpec) {
+ if (emptySpec.length() > 0) throw new IllegalArgumentException();
+ return INSTANCE;
+ }
+
+ public String toString() {
+ return getClass().getSimpleName() + "()";
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/PathEndsWithOneOf.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/PathEndsWithOneOf.java
new file mode 100644
index 0000000..723a3e4
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/PathEndsWithOneOf.java
@@ -0,0 +1,68 @@
+package it.unimi.dsi.law.warc.filters;
+
+import java.io.IOException;
+import java.net.URI;
+import java.util.Set;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import it.unimi.dsi.law.warc.util.Util;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs whose path ends (case-insensitively) with one of a given set of suffixes. */
+public class PathEndsWithOneOf extends AbstractFilter<URI> {
+
+ /** The accepted suffixes, downcased. */
+ private final String[] suffixes;
+
+ /** Creates a filter that only accepts URLs whose path ends with one of a given set of suffixes.
+ *
+ * @param suffixes the accepted suffixes.
+ */
+ public PathEndsWithOneOf(final String[] suffixes) {
+ this.suffixes = new String[suffixes.length];
+ for (int i = 0; i < suffixes.length; i++) this.suffixes[i] = suffixes[i].toLowerCase();
+ }
+
+ @Override
+ public boolean apply(final URI uri) {
+ String file = uri.getRawPath().toLowerCase();
+ for (String suffix: suffixes) if (file.endsWith(suffix)) return true;
+ return false;
+ }
+
+ public static PathEndsWithOneOf valueOf(String spec) throws IOException {
+ return new PathEndsWithOneOf(Util.parseCommaSeparatedProperty(spec));
+ }
+
+ public String toString() {
+ return toString((Object[])suffixes);
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof PathEndsWithOneOf) {
+ Set<String> suffixSet = new ObjectOpenHashSet<String>(suffixes);
+ Set<String> xSuffixSet = new ObjectOpenHashSet<String>(((PathEndsWithOneOf)x).suffixes);
+ return suffixSet.equals(xSuffixSet);
+ }
+ else return false;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/SchemeEquals.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/SchemeEquals.java
new file mode 100644
index 0000000..0e77f71
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/SchemeEquals.java
@@ -0,0 +1,60 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.net.URI;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs whose scheme equals a certain string (typically, <code>http</code>).
+ *
+ * <p>Note that {@link #apply(URI)} will throw an {@link IllegalArgumentException}
+ * if the argument has a {@code null} {@linkplain URI#getScheme() scheme}.
+ */
+public class SchemeEquals extends AbstractFilter<URI> {
+
+ /** The accepted scheme. */
+ private final String scheme;
+
+ /** Creates a filter that only accepts URIs with a given scheme.
+ *
+ * @param scheme the accepted scheme.
+ */
+ public SchemeEquals(final String scheme) {
+ this.scheme = scheme;
+ }
+
+ public boolean apply(final URI uri) {
+ if (uri.getScheme() == null) throw new IllegalArgumentException("URI \"" + uri + "\" has no scheme");
+ return scheme.equals(uri.getScheme());
+ }
+
+ public static SchemeEquals valueOf(String spec) {
+ return new SchemeEquals(spec);
+ }
+
+ public String toString() {
+ return toString(scheme);
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof SchemeEquals) return ((SchemeEquals)x).scheme.equals(scheme);
+ else return false;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/StatusCategory.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/StatusCategory.java
new file mode 100644
index 0000000..4bd90fb
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/StatusCategory.java
@@ -0,0 +1,57 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.util.HttpResponse;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only fetched response whose status category (status/100) has a certain value.
+ */
+public class StatusCategory extends AbstractFilter<HttpResponse> {
+
+ /** The accepted category (e.g., 2 for 2xx). */
+ private final int category;
+
+ /** Creates a filter that only accepts responses of the given category.
+ *
+ * @param category the accepted category.
+ */
+ public StatusCategory(final int category) {
+ this.category = category;
+ }
+
+ public boolean apply(HttpResponse x) {
+ return x.status() / 100 == category;
+ }
+
+ public static StatusCategory valueOf(String spec) {
+ return new StatusCategory(Integer.parseInt(spec));
+ }
+
+ public String toString() {
+ return toString(String.valueOf(category));
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof StatusCategory) return ((StatusCategory)x).category == category;
+ else return false;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLEquals.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLEquals.java
new file mode 100644
index 0000000..1f553bd
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLEquals.java
@@ -0,0 +1,58 @@
+package it.unimi.dsi.law.warc.filters;
+
+import java.net.URI;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.bubing.util.BURL;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only a given URIs. */
+public class URLEquals extends AbstractFilter<URI> {
+
+ /** The URL to be matched. */
+ private final URI uri;
+
+ /** Creates a filter that only accepts URIs equal to a given URI.
+ *
+ * @param uri a URI.
+ */
+ public URLEquals(final String uri) {
+ this.uri = BURL.parse(uri);
+ if (this.uri == null) throw new IllegalArgumentException("Unparsable URI " + uri);
+ }
+
+ public boolean apply(final URI uri) {
+ return uri.equals(uri);
+ }
+
+ public static URLEquals valueOf(String spec) {
+ return new URLEquals(spec);
+ }
+
+ public String toString() {
+ return toString(uri);
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof URLEquals) return ((URLEquals)x).uri.equals(uri);
+ else return false;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLMatchesRegex.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLMatchesRegex.java
new file mode 100644
index 0000000..7d74c1b
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLMatchesRegex.java
@@ -0,0 +1,56 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.net.URI;
+import java.util.regex.Pattern;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs that match a certain regular expression. */
+public class URLMatchesRegex extends AbstractFilter<URI> {
+
+ /** The pattern containing the compiled regular expression. */
+ private Pattern pattern;
+
+ /** Creates a filter that only accepts URLs matching a given regular expression.
+ *
+ * @param expr the regular expression.
+ */
+ public URLMatchesRegex(final String expr) {
+ pattern = Pattern.compile(expr);
+ }
+
+ public boolean apply(final URI uri) {
+ return pattern.matcher(uri.toString()).matches();
+ }
+
+ public static URLMatchesRegex valueOf(String spec) {
+ return new URLMatchesRegex(spec);
+ }
+
+ public String toString() {
+ return toString(pattern.pattern());
+ }
+
+ public boolean equals(Object x) {
+ if (x instanceof URLMatchesRegex) return ((URLMatchesRegex)x).pattern.equals(pattern);
+ else return false;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLShorterThan.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLShorterThan.java
new file mode 100644
index 0000000..f2e45c7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/URLShorterThan.java
@@ -0,0 +1,54 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.net.URI;
+
+// RELEASE-STATUS: DIST
+
+/** Accepts only URIs whose overall length is below a given threshold. */
+public class URLShorterThan extends AbstractFilter<URI> {
+
+ /** URL longer than this threshold won't be accepted. */
+ private final int threshold;
+
+ /** Creates a filter that only accepts URLs shorter than the given threshold.
+ *
+ * @param threshold the acceptance threshold.
+ */
+ public URLShorterThan(final int threshold) {
+ this.threshold = threshold;
+ }
+
+ public boolean apply(final URI uri) {
+ return uri.toString().length() < threshold;
+ }
+
+ public static URLShorterThan valueOf(String spec) {
+ return new URLShorterThan(Integer.parseInt(spec));
+ }
+
+ public String toString() {
+ return toString(Integer.toString(threshold));
+ }
+
+ public boolean equals(Object x) {
+ return x instanceof URLShorterThan && ((URLShorterThan)x).threshold == threshold;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/package.html
new file mode 100644
index 0000000..194a9df
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/package.html
@@ -0,0 +1,61 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>LAW software</title>
+ </head>
+
+ <body>
+
+ <p>A comprehensive filtering system.
+
+ <p>A filter
+ is a strategy to decide whether a certain object should be accepted or not; the type of objects
+ a filter considers is called the <em>base type</em> of the filter. In most cases, the base type
+ is going to be a URL or a fetched page. More precisely, a <em>prefetch filter</em> is one that
+ has {@link it.unimi.dsi.law.warc.BURL} as its base type (typically: to decide whether a
+ URL should be scheduled for later visit,
+ or should be fetched); a <em>postfetch filter</em> is one that has FetchedResponse as base type
+ and decides whether to do something with that response (typically: to parse
+ it, to store it, etc.).</p>
+
+ <p>Various kinds of filters are available, and moreover they can be composed with boolean operators
+ using the static methods specified in the <code>Filters</code> class. Additionally, a filter parser is
+ provided in the <tt>it.unimi.dsi.law.ubi.filters.parser</tt> package; since the parser itself is written
+ using <a href="https://javacc.dev.java.net/">JavaCC</a>, we provide a description of it here.</p>
+
+ <p>Two filters are called <em>homogeneous</em> if they filter the same kind of objects, <em>heterogeneous</em>
+ otherwise.</p>
+
+ <p>A filter parser is instantiated on the basis of the kind of filters it will actually return; more precisely a
+ <code>FilterParser&lt;T&gt;</code> is a filter parser that will return a <code>Filter&lt;T&gt;</code>; for technical
+ reasons, the class <code>T</code> must be provided as unique parameter when the parser is constructed. A parser
+ can be used many times. Every time a filter is sought, the <code>parse(String x)</code> method of the parser
+ is called, which returns a filter of the correct kind, or throws a parse exception.</p>
+
+ <p>The syntax used by the filter parser is <a href="parser/FilterParser.doc.html">available</a>. Basically,
+ it is a propositional calculus, with and (denoted by infix <tt>and</tt> or <tt>&amp;</tt>), or (denoted by infix <tt>or</tt> or <tt>|</tt>)
+ and not (denoted by prefix <tt>not</tt> or <tt>!</tt>), whose ground terms have the same form as returned
+ by the <tt>toString()</tt> method of the <tt>Filter</tt> class.</p>
+
+ <p>Here are some examples:</p>
+
+ <ul>
+ <li><samp>HostEquals(www.foo.bar)</samp>
+ <li><samp>(HostEndsWith(foo.bar) and not ForbiddenHost(http://xxx.yyy.zzz/list-of-forbidden-hosts)) or NoMoreSlashThan(10)</samp>
+ </ul>
+
+ <p>Usually, an expression should only contain references to homogeneous filters of type <code>T</code>, where
+ <code>T</code> is the type used to instantiate the parser. Nonetheless, if some ground term refers to a
+ filter of some other type <code>D</code>, the parser will try to find a static method in the <code>Filters</code> class
+ having the following signature:</p>
+
+ <pre>
+ public static Filter&lt;T&gt; adaptFilterD2T( Filter&lt;D&gt; f )
+ </pre>
+
+ <p>that adapts the given filter <code>f</code> to a filter of the correct type. If this method is missing,
+ the parser will itself throw an exception.</p>
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/parser/FilterParser.jj b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/parser/FilterParser.jj
new file mode 100644
index 0000000..5e71225
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/filters/parser/FilterParser.jj
@@ -0,0 +1,181 @@
+options {
+ STATIC = false;
+ UNICODE_INPUT = true;
+}
+
+PARSER_BEGIN(FilterParser)
+
+package it.unimi.dsi.law.warc.filters.parser;
+
+/*
+ * Copyright (C) 2004-2018 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 2.1 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+
+// RELEASE-STATUS: DIST
+
+import it.unimi.dsi.law.warc.filters.*;
+import it.unimi.dsi.fastutil.objects.*;
+import java.lang.reflect.*;
+import java.util.*;
+import java.io.*;
+
+/** A simple parser that transforms a filter expression into a filter.
+ */
+public class FilterParser<T> {
+
+ private final static boolean DEBUG = false;
+
+ private Class tClass;
+
+ public FilterParser( Class<T> tClass ) {
+ this( new java.io.StringReader( "" ) );
+ this.tClass = tClass;
+ }
+
+ public Filter<T> parse( String filter ) throws ParseException {
+ ReInit( new java.io.StringReader( filter ) );
+ return start();
+ }
+}
+
+PARSER_END(FilterParser)
+
+/** Lexer. */
+
+// This stuff separates terms
+SKIP: { " " | "\t" | "\n" | "\r" }
+
+// Operators
+TOKEN: { < AND: "and" | "&" | "∧" > }
+TOKEN: { < OR: "or" | "|" | "∨" > }
+TOKEN: { < NOT: "not" | "!" | "~" > }
+TOKEN: { < TRUE: "true" | "TRUE" > }
+TOKEN: { < FALSE: "false" | "FALSE" > }
+TOKEN: { < OPENPAREN: "(" > }
+TOKEN: { < CLOSEPAREN: ")" > }
+
+/* A word is a sequence of alphanumeric characters, dollar signs or dots. */
+TOKEN: { < WORD: ( ["a"-"z","A"-"Z","0"-"9","_",".","$"] )+ > }
+TOKEN: { < ARGS: "(" ( ~["(",")"] )* ")" > }
+
+
+/** Parser. */
+
+Filter<T> start():
+{
+ Filter<T> res;
+}
+{
+ res = or()
+ {
+ return res;
+ }
+}
+
+
+Filter<T> or():
+{
+ Filter<T> res;
+ ObjectArrayList<Filter<T>> qrm = new ObjectArrayList<Filter<T>>();
+}
+{
+ res = and()
+ { qrm.add( res ); }
+ (
+ <OR>
+ res = and()
+ { qrm.add( res ); }
+ )*
+ {
+ if ( qrm.size() == 1 ) return res;
+ return Filters.or( (Filter<T>[])qrm.toArray( Filters.EMPTY_ARRAY ) );
+ }
+}
+
+Filter<T> and():
+{
+ Filter<T> res;
+ ObjectArrayList<Filter<T>> qrm = new ObjectArrayList<Filter<T>>();
+}
+{
+ res = atom()
+ { qrm.add( res ); }
+ (
+ <AND>
+ res = atom()
+ { qrm.add( res ); }
+ )*
+ {
+ if ( qrm.size() == 1 ) return res;
+ return Filters.and( (Filter<T>[])qrm.toArray( Filters.EMPTY_ARRAY ) );
+ }
+}
+
+Filter<T> atom():
+{
+ Filter<T> res;
+ ObjectArrayList<Filter<T>> qrm = new ObjectArrayList<Filter<T>>();
+}
+{
+ res = ground()
+ { return res; }
+ |
+ <NOT> res = ground()
+ { return Filters.not( res ); }
+}
+
+
+Filter<T> ground():
+{
+ Filter<T> res;
+ Token tclass, targs;
+}
+{
+ (
+ tclass = <WORD>
+ targs = <ARGS>
+ {
+ try {
+ return Filters.getFilterFromSpec( tclass.image, targs.image.substring( 1, targs.image.length() - 1 ).trim(), tClass );
+ } catch ( ParseException e ) {
+ throw e;
+ } catch ( Exception e ) {
+ throw new ParseException( e.toString() );
+ }
+ }
+ )
+ |
+ <TRUE>
+ {
+ return Filters.TRUE;
+ }
+ |
+ <FALSE>
+ {
+ return Filters.FALSE;
+ }
+ |
+ (
+ <OPENPAREN> res = start() <CLOSEPAREN>
+ {
+ return res;
+ }
+ )
+
+}
+
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/BoundedCountingInputStream.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/BoundedCountingInputStream.java
new file mode 100644
index 0000000..78daba6
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/BoundedCountingInputStream.java
@@ -0,0 +1,167 @@
+package it.unimi.dsi.law.warc.io;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.zip.CRC32;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+
+// RELEASE-STATUS: DIST
+
+/**
+ * A class that decorates an {@link java.io.InputStream} to obtain a
+ * {@link it.unimi.dsi.fastutil.io.MeasurableInputStream}.
+ *
+ * <p> This class serves two purpose: wrap an input stream so that no more than a certain number of
+ * bytes may be read from it (and that the total number of bytes read so far is available). Moreover,
+ * if a {@link java.util.zip.CRC32} is given, the content is also checked using it.
+ *
+ * <p> Observe that the underlying stream can be {@code null} if the empty constructor is
+ * called but {@link #setInput(InputStream, long, CRC32)} is not called; this will lead to
+ * {@link java.lang.NullPointerException} in almost every call.
+ */
+
+public class BoundedCountingInputStream extends MeasurableInputStream {
+
+ /** The underlying input stream. */
+ private InputStream is;
+
+ /** The (cached) length of this stream (the minimum among this stream bound and the length of the underlying stream;
+ * if the latter is not measurable, it is considered to have infinite length).
+ */
+ private long length;
+
+ /** The bound. */
+ private long bound;
+
+ /** The current position. */
+ private long position;
+
+ /** Tells if the underlying stream has reached its end. */
+ private boolean eofReached;
+
+ /** A class to compute the crc of read bytes. */
+ public CRC32 crc;
+
+ /**
+ * Builds the bounded stream.
+ *
+ * <p> Before actually using an object constructed with this,
+ * {@link #setInput(InputStream, long, CRC32)} must be called.
+ */
+ public BoundedCountingInputStream() {}
+
+ /**
+ * Builds the bounded stream.
+ *
+ * @param is the stream.
+ * @param bound the maximum number of bytes that can be read.
+ * @param crc if not {@code null}, it will be used to compute the crc of read bytes.
+ * @throws IOException
+ */
+ public BoundedCountingInputStream(final InputStream is, final long bound, final CRC32 crc) throws IOException {
+ setInput(is, bound, crc);
+ }
+
+ /**
+ * Builds the bounded stream.
+ *
+ * @param is the stream.
+ * @param bound the maximum number of bytes that can be read.
+ * @throws IOException
+ */
+ public BoundedCountingInputStream(final InputStream is, final long bound) throws IOException {
+ setInput(is, bound);
+ }
+
+ /**
+ * Resets the bounded stream fields, for reusing it.
+ *
+ * @param is the stream.
+ * @param bound the maximum number of bytes that can be read.
+ * @param crc if not {@code null}, it will be used to compute the crc of read bytes.
+ * @throws IOException
+ */
+ public void setInput(final InputStream is, final long bound, final CRC32 crc) throws IOException {
+ if (is == null) throw new IllegalArgumentException();
+ this.is = is;
+ this.bound = bound;
+ this.crc = crc;
+ this.position = 0;
+ eofReached = false;
+ long isLength = Long.MAX_VALUE;
+ if (is instanceof MeasurableInputStream) try {
+ isLength = ((MeasurableInputStream)is).length();
+ } catch (UnsupportedOperationException e) {}
+ length = Math.min(isLength, bound);
+ }
+
+ /**
+ * Resets the bounded stream fields, for reusing it.
+ *
+ * @param is the stream.
+ * @param bound the maximum number of bytes that can be read.
+ * @throws IOException
+ */
+ public void setInput(final InputStream is, final long bound) throws IOException {
+ setInput(is, bound, null);
+ }
+
+ @Override
+ public int available() throws IOException {
+ return (int)Math.min(is.available(), bound - position);
+ }
+
+ public int read() throws IOException {
+ if (position >= bound || eofReached) return -1;
+ int b = is.read();
+ if (b == -1)
+ eofReached = true;
+ else {
+ position++;
+ if (crc != null) crc.update(b);
+ }
+ return b;
+ }
+
+ @Override
+ public int read(final byte[] buf, final int offset, final int length) throws IOException {
+ if (length == 0) return 0;
+ if (position >= bound || eofReached) return -1;
+ int read = is.read(buf, offset, (int)Math.min(length, bound - position));
+ if (read == -1)
+ eofReached = true;
+ else {
+ position += read;
+ if (crc != null) crc.update(buf, offset, read);
+ }
+ return read;
+ }
+
+ public long length() throws IOException {
+ return length;
+ }
+
+ public long position() {
+ return position;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/GZWarcRecord.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/GZWarcRecord.java
new file mode 100644
index 0000000..909a363
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/GZWarcRecord.java
@@ -0,0 +1,526 @@
+package it.unimi.dsi.law.warc.io;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+import java.util.Arrays;
+import java.util.zip.CRC32;
+import java.util.zip.Deflater;
+import java.util.zip.DeflaterOutputStream;
+import java.util.zip.Inflater;
+import java.util.zip.InflaterInputStream;
+
+import org.apache.commons.lang.ArrayUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.Util;
+
+
+// RELEASE-STATUS: DIST
+
+/**
+ * A class to read/write WARC/0.9 records in compressed form (for format details, please see the <a
+ * href='http://archive-access.sourceforge.net/warc/warc_file_format-0.9.html'>WARC</a> and <a
+ * href='http://www.gzip.org/zlib/rfc1952.pdf'>GZip</a> format specifications).
+ *
+ * <p> Records written/read with this class use the <code>skip-lengths</code> as detailed in
+ * section 10.2 of warc specification.
+ *
+ * <p> Moreover the <code>NAME</code> optional gzip header field contains the
+ * {@link it.unimi.dsi.law.warc.io.WarcRecord.Header#recordId} and the <code>COMMENT</code>
+ * optional gzip header field contains the value of the <code>anvl-filed</code> corresponding to
+ * the {@link it.unimi.dsi.law.warc.util.HttpResponse#DIGEST_HEADER} key, if present, or the
+ * {@link it.unimi.dsi.law.warc.io.WarcRecord.Header#recordId}, followed by tab and then by the
+ * {@link it.unimi.dsi.law.warc.io.WarcRecord.Header#subjectUri}.
+ *
+ * <p> As for a {@link it.unimi.dsi.law.warc.io.WarcRecord}, to write a record, set {@link #header}
+ * and {@link #block} appropriately and then call {@link #write(OutputStream)}. After such call,
+ * some {@link #header} fields will be modified and the {@link #gzheader} fields will be set to
+ * reflect the write operation.
+ *
+ * <p> Again, as in the case of a {@link it.unimi.dsi.law.warc.io.WarcRecord}, to perform a sequence
+ * of consecutive read/skip, call {@link #read(FastBufferedInputStream)} or
+ * {@link #skip(FastBufferedInputStream)}. After a read, the {@link #block} can (but it is not
+ * required to) be read to obtain the read data. The {@link WarcRecord.Header#contentType} field
+ * can be used to determine how to parse the content of {@link #block}.
+ *
+ * <p> As an implementation note: skipping just populates the {@link #gzheader} fields and returns
+ * the value of the <code>compressed-skip-length</code> field of the skipped record. On the other
+ * hand, reading parses the gzip header as well as the warc <code>header</code> and sets all the
+ * {@link #header} and {@link #gzheader} fields appropriately, and hence sets {@link #block} so that
+ * it refers to the <code>block</code> part of the record. After a full read (a read that leaves
+ * less than {@link #PARTIAL_UNCOMPRESSED_READ_THRESHOLD} bytes in {@link #uncompressedRecordStream})
+ * the CRC found in the gzip file is checked against the CRC computed over the record; partial reads
+ * can check the CRC calling {@link #checkCRC(FastBufferedInputStream)} that will consume
+ * {@link #uncompressedRecordStream} to compute the CRC.
+ *
+ * <p>This object can be reused for non-consecutive writes on different streams. On the other hand,
+ * to reuse this object for non-consecutive read/skip, the method {@link #resetRead()} must be
+ * called any time a read/skip does not follow a read/skip from the same stream.
+ *
+ * <p>This class uses internal buffering, hence it is not thread safe.
+ */
+@SuppressWarnings("javadoc")
+public class GZWarcRecord extends WarcRecord {
+ private final static Logger LOGGER = LoggerFactory.getLogger(GZWarcRecord.class);
+
+ @SuppressWarnings("hiding")
+ final public static boolean ASSERTS = true;
+
+ /** Tells what method to use to skip bytes in the input stream. It's here for profiling purposes. */
+ public static final boolean USE_POSITION_INSTEAD_OF_SKIP = false;
+
+ /** A class to contain fields contained in the gzip header. */
+ public static class GZHeader {
+
+ /** The <code>compressed-skip-length</code> warc-required extra gzip field. */
+ public int compressedSkipLength;
+
+ /** The <code>uncompressed-skip-length</code> warc-required extra gzip field. */
+ public int uncompressedSkipLength;
+
+ /** The <code>mtime</code> gzip field. */
+ public int mtime;
+
+ /** The (optional) <code>name</code> gzip field. Here is used to contain {@link WarcRecord.Header#recordId}) */
+ public byte[] name;
+
+ /** The (optional) <code>comment</code> gzip field. Here is used to contain {@link WarcRecord.Header#subjectUri}) */
+ public byte[] comment;
+
+ @Override
+ public String toString() {
+ MutableString s = new MutableString();
+ s.append("compressedSkipLength: ");
+ s.append(compressedSkipLength);
+ s.append(", uncompressedSkipLength: ");
+ s.append(uncompressedSkipLength);
+ s.append(", mtime: ");
+ s.append(mtime);
+ s.append(", name: ");
+ s.append(name == null ? "<null>" : Util.getString(name));
+ s.append(", comment: ");
+ s.append(comment == null ? "<null>" : Util.getString(comment));
+ return s.toString();
+ }
+
+ @Override
+ public int hashCode() {
+ // TODO can we do better?
+ return Util.getString(name).hashCode();
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (! (o instanceof GZWarcRecord.GZHeader)) return false;
+ GZWarcRecord.GZHeader h = (GZWarcRecord.GZHeader)o;
+ if (compressedSkipLength != h.compressedSkipLength) return false;
+ if (uncompressedSkipLength != h.uncompressedSkipLength) return false;
+ if (mtime != h.mtime) return false;
+ if (! Arrays.equals(name, h.name)) return false;
+ if (! Arrays.equals(comment, h.comment)) return false;
+ return true;
+ }
+
+ }
+
+ /* GZip constants TODO: comment! */
+
+ private static final byte XFL = Deflater.BEST_COMPRESSION;
+
+ @SuppressWarnings("unused")
+ private static final byte FTEXT = 1 << 0, FHCRC = 1 << 1, FEXTRA = 1 << 2, FNAME = 1 << 3, FCOMMENT = 1 << 4;
+
+ private static final byte[] GZIP_START = new byte[] {
+ (byte)0x1F, (byte)0x8B, // ID1 ID2
+ Deflater.DEFLATED, // CM
+ FEXTRA | FNAME | FCOMMENT }, // FLG
+ XFL_OS = new byte[] { XFL, (byte)0xFF }, // unknown os
+ SKIP_LEN = new byte[] { (byte)'s', (byte)'l' };
+
+ private static final short SUB_LEN = 8; // 2 ints (compressedSkipLength, uncompressedSkipLength)
+
+ private static final short XLEN = 4 + SUB_LEN; // 2 byte (SI1 + SI2) + 1 short (SUB_LEN)
+
+ private static final short TRAILER_LEN = 8; // 2 ints (CRC32, ISIZE)
+
+ private static final int FIX_LEN = GZIP_START.length +
+ 4 + // 1 int (MTIME)
+ XFL_OS.length +
+ (2 + XLEN) + // 1 short (XLEN) + EXTRA bytes
+ TRAILER_LEN;
+
+ /** If {@link #uncompressedRecordStream} contains more than this amount of bytes, the last read is considered partial. */
+ private static final int PARTIAL_UNCOMPRESSED_READ_THRESHOLD = 16; // must be >= 4, the two last CRLF present in any record
+
+ /** The buffer size used by the {@link #uncompressedRecordStream}. */
+ private static final int UNCOMPRESSED_RECORD_STREAM_BUFFER_SIZE = 1024;
+
+ /** The deflater used to compress the record. */
+ private final Deflater deflater = new Deflater(XFL, true);
+
+ /** The inflater used to decompress the record. */
+ private final Inflater inflater = new Inflater(true);
+
+ /** An output stream used by {@link #write(OutputStream)} to cache the compressed record. */
+ private final FastByteArrayOutputStream compressedOutputStream = new FastByteArrayOutputStream();
+
+ /** The size of {@link #headerBuffer}; must be enough to contain NAME, or COMMENT, or GZIP_START. */
+ private static final int HEADER_BUFFER_SIZE = 16384;
+
+ /** A buffer used in reading/writing the GZip header. */
+ private final byte[] headerBuffer = new byte[HEADER_BUFFER_SIZE];
+
+ /** The position of the first byte of the last read GZip header. It is -1 when no header has been read, or when the trailed of the last header was already read. */
+ private long positionOfLastGZHeader;
+
+ /** The <code>compressed-skip-length</code> found in the last read GZip header (set by {@link #readGZHeader(FastBufferedInputStream)}). */
+ private long compressedDataLengthInLastGZHeader;
+
+ /** An input stream passed to {@link WarcRecord#read(FastBufferedInputStream)} as the last found uncompressed block ({@code null} in case of a skip). */
+ private FastBufferedInputStream uncompressedRecordStream;
+
+ /** The GZip headers used by this object. */
+ final public GZHeader gzheader = new GZHeader();
+
+ public GZWarcRecord() {
+ crc = new CRC32();
+ positionOfLastGZHeader = -1;
+ }
+
+ @Override
+ public void resetRead() {
+ positionOfLastGZHeader = -1;
+ }
+
+ @Override
+ public long skip(FastBufferedInputStream in) throws IOException, FormatException {
+ if (readGZHeader(in) == -1) return -1;
+ uncompressedRecordStream = null; // so that the readGZTrailer will not consume it and compute the CRC
+ readGZTrailer(in);
+ return gzheader.compressedSkipLength;
+ }
+
+ @Override
+ public long read(FastBufferedInputStream in) throws IOException, FormatException {
+
+ if (readGZHeader(in) == -1) return -1;
+
+ // compressed blocks
+
+ inflater.reset();
+ crc.reset();
+ final long reminingCompressedBytes = gzheader.compressedSkipLength - (in.position() - positionOfLastGZHeader) - TRAILER_LEN;
+ if (ASSERTS) assert reminingCompressedBytes > 0; // we always have at least the <code>header-line</code> and the two last CRLFs
+ InflaterInputStream gzin = new InflaterInputStream(new BoundedCountingInputStream(in, reminingCompressedBytes), inflater);
+ uncompressedRecordStream = new FastBufferedInputStream(new BoundedCountingInputStream(gzin, gzheader.uncompressedSkipLength, crc), UNCOMPRESSED_RECORD_STREAM_BUFFER_SIZE);
+ super.resetRead(); // reading from gzip does not require to consume previous input
+ super.read(uncompressedRecordStream);
+
+ return compressedDataLengthInLastGZHeader;
+ }
+
+ @Override
+ public void write(OutputStream out) throws IOException {
+
+ byte[] buffer = headerBuffer; // for efficiency
+
+ /* prepare the compressed block and uncompressed crc */
+
+ deflater.reset();
+ compressedOutputStream.reset();
+ DeflaterOutputStream gzout = new DeflaterOutputStream(compressedOutputStream, deflater);
+ crc.reset();
+ super.write(gzout);
+ gzout.finish();
+ gzout = null;
+
+ /* fill gzheader */
+
+ final byte[] recordByteRepresentation = Util.getASCIIBytes(header.recordId.toString());
+ final String digest = header.anvlFields.get(HttpResponse.DIGEST_HEADER);
+
+ gzheader.name = recordByteRepresentation;
+
+ int commentLength = 0;
+ if (digest != null) {
+ final byte[] digestByteRepresentation = Util.getASCIIBytes(digest);
+ commentLength = digestByteRepresentation.length;
+ System.arraycopy(digestByteRepresentation, 0, buffer, 0, commentLength);
+ } else {
+ commentLength = recordByteRepresentation.length;
+ System.arraycopy(recordByteRepresentation, 0, buffer, 0, commentLength);
+ }
+ buffer[commentLength++] = '\t';
+ final byte[] subjectUriByteRepresentation = BURL.toByteArray(header.subjectUri);
+ System.arraycopy(subjectUriByteRepresentation, 0, buffer, commentLength, subjectUriByteRepresentation.length);
+ commentLength += subjectUriByteRepresentation.length;
+ gzheader.comment = ArrayUtils.subarray(buffer, 0, commentLength);
+
+ gzheader.compressedSkipLength = FIX_LEN + (gzheader.name.length + 1) + (gzheader.comment.length + 1) + compressedOutputStream.length;
+ gzheader.uncompressedSkipLength = (int)(header.dataLength & 0xFFFFFFFF);
+ gzheader.mtime = (int)(header.creationDate.getTime() / 1000);
+
+ /* write */
+
+ // ID1 ID2 CM FLG
+
+ out.write(GZIP_START);
+
+ // MTIME
+
+ writeLEInt(out, gzheader.mtime);
+
+ // XFL OS
+
+ out.write(XFL_OS);
+
+ /* EXTRA begin */
+
+ // XLEN
+
+ writeLEShort(out, XLEN);
+
+ // SI1 SI2 (as in warc spec)
+
+ out.write(SKIP_LEN);
+
+ // LEN
+
+ writeLEShort(out, SUB_LEN);
+
+ // compressed-skip-length (as in warc spec)
+
+ writeLEInt(out, gzheader.compressedSkipLength);
+
+ // uncompressed length (as in warc spec)
+
+ writeLEInt(out, gzheader.uncompressedSkipLength);
+
+ /* EXTRA end */
+
+ // NAME
+
+ out.write(gzheader.name);
+ out.write(0);
+
+ // COMMENT
+
+ out.write(gzheader.comment);
+ out.write(0);
+
+ // compressed blocks
+
+ out.write(compressedOutputStream.array, 0, compressedOutputStream.length);
+
+ // CRC32
+
+ writeLEInt(out, (int)(crc.getValue() & 0xFFFFFFFF));
+
+ // ISIZE
+
+ writeLEInt(out, gzheader.uncompressedSkipLength);
+
+ }
+
+ public void checkCRC(FastBufferedInputStream in) throws IOException, FormatException {
+ if (positionOfLastGZHeader == -1 || uncompressedRecordStream == null) throw new IllegalStateException();
+ consumeUncompressedRecord();
+ readGZTrailer(in);
+ }
+
+ @Override
+ public String toString() {
+ return gzheader.toString() + "\n" + header.toString();
+ }
+
+ private long readGZHeader(FastBufferedInputStream in) throws IOException, FormatException {
+
+ byte[] buffer = headerBuffer; // local copy for efficiency reasons
+
+ // if we haven't done it yet, consume the trailer of the previous read
+
+ if (positionOfLastGZHeader != -1) readGZTrailer(in);
+
+ // ID1 ID2 CM FLG
+
+ positionOfLastGZHeader = in.position();
+ if (in.read(buffer, 0, 4) == -1) return -1;
+
+ if (buffer[0] != GZIP_START[0] || buffer[1] != GZIP_START[1]) throw new FormatException("Missing GZip magic numbers, found: " + buffer[0] + " " + buffer[1]);
+ if (buffer[2] != Deflater.DEFLATED) throw new FormatException("Unknown compression method: " + buffer[2]);
+ int flg = buffer[3];
+
+ // MTIME
+
+ gzheader.mtime = readLEInt(in);
+
+ // XFL OS (ignored)
+
+ in.read(buffer, 0, 2);
+
+ /* EXTRA begin */
+
+ gzheader.compressedSkipLength = -1;
+
+ if ((flg & FEXTRA) != 0) {
+
+ // XLEN
+
+ short xlen = readLEShort(in);
+
+ while (xlen > 0) {
+
+ // SI1 SI2
+
+ in.read(buffer, 0, 2);
+
+ // LEN
+
+ short len = readLEShort(in);
+
+ if (buffer[0] == SKIP_LEN[0] && buffer[1] == SKIP_LEN[1]) {
+ compressedDataLengthInLastGZHeader = gzheader.compressedSkipLength = readLEInt(in);
+ gzheader.uncompressedSkipLength = readLEInt(in);
+ } else in.read(buffer, 0, len);
+
+ xlen -= len + 4; // 2 bytes (SI1, SI2) + 1 short (LEN)
+
+ }
+ }
+
+ if (gzheader.compressedSkipLength < 0) throw new FormatException("Missing SL extra field, or negative compressed-skip-length");
+
+ /* EXTRA end */
+
+ // NAME
+
+ if ((flg & FNAME) != 0) {
+ int l = 0, b;
+ while ((b = in.read()) != 0) {
+ buffer[l++] = (byte)b;
+ }
+ gzheader.name = ArrayUtils.subarray(buffer, 0, l);
+ }
+
+ // COMMENT
+
+ if ((flg & FCOMMENT) != 0) {
+ int l = 0, b;
+ while ((b = in.read()) != 0) {
+ buffer[l++] = (byte)b;
+ }
+ gzheader.comment = ArrayUtils.subarray(buffer, 0, l);
+ }
+
+ // HCRC
+
+ if ((flg & FHCRC) != 0) {
+ in.read(buffer, 0, 2);
+ }
+
+ return compressedDataLengthInLastGZHeader;
+ }
+
+ private void readGZTrailer(FastBufferedInputStream in) throws IOException, FormatException {
+
+ if (positionOfLastGZHeader == -1) return; // we haven't read any new header
+
+ // possibly position correctly according to previous read
+
+ boolean consumed = false;
+ if (uncompressedRecordStream != null) {
+ final long remaining = uncompressedRecordStream.length() - uncompressedRecordStream.position();
+ if (ASSERTS) assert remaining >= 0;
+ if (0 < remaining && remaining < PARTIAL_UNCOMPRESSED_READ_THRESHOLD) { // > 0 is needed to avoid unnecessary consume
+ consumeUncompressedRecord(); // we update the CRC for complete reads
+ consumed = true;
+ } else LOGGER.debug("Omitting CRC check, since the last read was partial.");
+ } else LOGGER.debug("Omitting CRC check, since coming from a skip.");
+
+ long newPosition = positionOfLastGZHeader + compressedDataLengthInLastGZHeader - TRAILER_LEN;
+ if (ASSERTS) assert newPosition >= in.position();
+ if (USE_POSITION_INSTEAD_OF_SKIP)
+ in.position(newPosition);
+ else
+ in.skip(newPosition - in.position());
+
+ // CRC32
+
+ final int expectedCrc = readLEInt(in);
+ if (consumed) {
+ final int actualCrc = (int)(crc.getValue() & 0xFFFFFFFF);
+ if (expectedCrc != actualCrc) throw new FormatException("CRC32 mismatch, expected: " + expectedCrc + ", actual: " + actualCrc);
+ else LOGGER.debug("CRC check OK.");
+ }
+
+ // ISIZE
+
+ final int iSize = readLEInt(in);
+ if (gzheader.uncompressedSkipLength != iSize) throw new FormatException("Length mismatch between (warc) extra gzip fields uncompressed-skip-length (" + gzheader.uncompressedSkipLength + ") and ISIZE (" + iSize + ")");
+
+ // we have consumed the trailer
+
+ positionOfLastGZHeader = -1;
+
+ }
+
+ private void consumeUncompressedRecord() throws IOException {
+ if (ASSERTS) assert uncompressedRecordStream.length() - uncompressedRecordStream.position() >= 4;
+ byte[] b = new byte[1024];
+ while (uncompressedRecordStream.read(b) != -1); // we use read instead of skip because we want the CRC to be updated!
+ uncompressedRecordStream.skip(Long.MAX_VALUE);
+ }
+
+ private static int readLEInt(InputStream in) throws IOException {
+ int i = in.read() & 0xFF;
+ i |= (in.read() & 0xFF) << 8;
+ i |= (in.read() & 0xFF) << 16;
+ i |= (in.read() & 0xFF) << 24;
+ return i;
+ }
+
+ private static short readLEShort(InputStream in) throws IOException {
+ short s = (byte)in.read();
+ s |= (byte)in.read() << 8;
+ return s;
+ }
+
+ private static void writeLEInt(OutputStream out, int i) throws IOException {
+ out.write((byte)i);
+ out.write((byte)((i >> 8) & 0xFF));
+ out.write((byte)((i >> 16) & 0xFF));
+ out.write((byte)((i >> 24) & 0xFF));
+ }
+
+ private static void writeLEShort(OutputStream out, short s) throws IOException {
+ out.write((byte)s);
+ out.write((byte)((s >> 8) & 0xFF));
+ }
+
+}
+
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/HttpResponseFilteredIterator.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/HttpResponseFilteredIterator.java
new file mode 100644
index 0000000..bf68e71
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/HttpResponseFilteredIterator.java
@@ -0,0 +1,119 @@
+package it.unimi.dsi.law.warc.io;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.law.warc.filters.Filter;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.WarcHttpResponse;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A class to iterate over WARC files getting only records corresponding to
+ * {@link it.unimi.dsi.law.warc.util.HttpResponse} that satisfy a given filter. */
+
+public class HttpResponseFilteredIterator implements Iterator<WarcHttpResponse> {
+
+ private final FastBufferedInputStream in;
+ private final WarcRecord record;
+ private final WarcHttpResponse response;
+ private final Filter<HttpResponse> filter;
+ private final ProgressLogger pl;
+ private boolean eofIsReached;
+ private boolean cached;
+
+ /**
+ * Builds the filtered iterator.
+ *
+ * <p> This constructor takes a {@link WarcRecord} (or a {@link GZWarcRecord} if the stream
+ * contains compressed records) and a {@link WarcHttpResponse} that will be reused (and thus
+ * modified) by calls to {@link #hasNext()} and {@link #next()}.
+ *
+ * @param in the input stream.
+ * @param record the record used for reading.
+ * @param response the repsonse used for reading.
+ * @param filter the filter.
+ * @param pl the progress logger.
+ */
+ public HttpResponseFilteredIterator(final FastBufferedInputStream in, final WarcRecord record, final WarcHttpResponse response, final Filter<HttpResponse> filter, final ProgressLogger pl) {
+ this.in = in;
+ this.record = record;
+ this.response = response;
+ this.filter = filter;
+ this.pl = pl;
+ eofIsReached = false;
+ cached = false;
+ }
+ /**
+ * Builds the filtered iterator.
+ *
+ * <p> This constructor takes a {@link WarcRecord} (or a {@link GZWarcRecord} if the stream
+ * contains compressed records) and a {@link WarcHttpResponse} that will be reused (and thus
+ * modified) by calls to {@link #hasNext()} and {@link #next()}.
+ *
+ * @param in the input stream.
+ * @param record the record used for reading.
+ * @param response the repsonse used for reading.
+ * @param filter2 the filter.
+ */
+ public HttpResponseFilteredIterator(final FastBufferedInputStream in, final WarcRecord record, final WarcHttpResponse response, final Filter<HttpResponse> filter2) {
+ this(in, record, response, filter2, null);
+ }
+
+ public boolean hasNext() {
+ if (eofIsReached) return false;
+ if (cached) return true;
+ try {
+ long read;
+ do {
+ read = record.read(in);
+ if (read == -1) break;
+ if (pl != null && read != -1) pl.update();
+ if (! response.fromWarcRecord(record)) continue;
+ if (filter.apply(response)) {
+ cached = true;
+ break;
+ }
+ } while (read != -1);
+ eofIsReached = read == -1;
+ } catch (IOException e) {
+ throw new RuntimeException("IOException while reading next record", e);
+ } catch (FormatException e) {
+ throw new RuntimeException("FormatException while reading next record", e);
+ }
+ return ! eofIsReached;
+ }
+
+ public WarcHttpResponse next() {
+ if (! hasNext()) throw new NoSuchElementException();
+ cached = false;
+ return response;
+ }
+
+ public void remove() {
+ throw new UnsupportedOperationException();
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/InspectableBufferedInputStream.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/InspectableBufferedInputStream.java
new file mode 100644
index 0000000..335c1de
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/InspectableBufferedInputStream.java
@@ -0,0 +1,468 @@
+package it.unimi.dsi.law.warc.io;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.channels.FileChannel;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.bytes.ByteArrays;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.warc.util.Util;
+
+
+// RELEASE-STATUS: DIST
+
+/** An input stream that wraps an underlying input stream to make it
+ * rewindable and partially inspectable, using a bounded-capacity memory buffer and an overflow file.
+ *
+ * <h2>Stream behaviour</h2>
+ *
+ * <p>In the following description, we let <var>K</var><sub>0</sub> be the buffer
+ * size, <var>K</var> be the number of bytes read from the underlying stream,
+ * <var>P</var> be the index of the next byte that will be returned by this
+ * stream (indices start from 0) and <var>L</var> be the number of bytes that
+ * will be required. Note that <var>P</var>&le;<var>K</var>, and equality
+ * holds if this stream was never rewound; otherwise, <var>P</var> may be
+ * smaller than <var>K</var> (and, in particular, it will be zero just after
+ * a rewind).
+ *
+ * <p>When the stream is connected, up to <var>K</var><sub>0</sub> bytes are read
+ * and stored in the buffer; after that, the buffer itself becomes available for
+ * inspection. Of course, <var>K</var> is set to the number of bytes actually read,
+ * whereas <var>P</var>=0.
+ *
+ * <p>Upon reading, as long as <var>P</var>+<var>L</var>-1&lt;<var>K</var>, no byte must actually be read from the input
+ * stream. Otherwise, up to <var>P</var>+<var>L</var>-<var>K</var> bytes are read from the input
+ * stream and stored onto the overflow file before returning them to the user.
+ *
+ * <h2>Connecting and disposing</h2>
+ *
+ * <p>Objects of this class are reusable by design. At any moment, they may be in one of three states:
+ * connected, ready, disposed:
+ * <ul>
+ * <li> <strong>connected</strong>: this stream is connected to an underlying input stream, and has an
+ * overflow file open and partially filled; notice that, since the overflow file is reused, the file itself
+ * may be larger than the number of bytes written in it;
+ * <li> <strong>ready</strong>: this stream is not connected to any underlying input stream, but it has
+ * an overflow file (not open, but ready to be used); notice that, since the overflow file is reused, the file itself
+ * may be nonempty;
+ * <li> <strong>disposed</strong>: this stream cannot be used anymore: its resources are disposed and, in particular,
+ * its overflow file was actually deleted.
+ * </ul>
+ *
+ * <p>At creation, this stream is ready; it can be connected using {@link #connect(InputStream)}. At any time,
+ * it can become ready again by a call to {@link #close()}. The {@link #close()} method does not truncate the
+ * overflow file; if the user wants to truncate the file, it can do so by calling {@link #truncate(long)} after
+ * closing. The {@link #dispose()} method makes this stream disposed; this method is called on finalization.
+ *
+ * <h2>Buffering</h2>
+ *
+ * <p>This class provides no form of buffering except for the memory buffer described above. Users should consider providing
+ * a buffered underlying input stream, or wrapping instances of this class by a {@link FastBufferedInputStream}: the
+ * former would be appropriate only for those cases when {@link #fillAndRewind()} is not used; the latter can make accesses more efficient,
+ * only if the size of the underlying input stream is often much larger than the buffer size.
+ *
+ */
+public class InspectableBufferedInputStream extends MeasurableInputStream {
+
+ public static final Logger LOGGER = LoggerFactory.getLogger(InspectableBufferedInputStream.class);
+ public static final boolean DEBUG = false;
+
+ /** The number of path elements for the hierarchical overflow file (see {@link Util#createHierarchicalTempFile(File, int, String, String)}). */
+ public static final int OVERFLOW_FILE_RANDOM_PATH_ELEMENTS = 3;
+
+ /** The possible states of this stream, as explained above. */
+ public static enum State { CONNECTED, READY, DISPOSED };
+
+ /** The default buffer size (64KiB). */
+ public static final int DEFAULT_BUFFER_SIZE = 64 * 1024;
+
+ /** A private throw-away buffer used by {@link #fill(long)} and {@link #skip(long)}. */
+ private final byte[] b = new byte[8 * 1024];
+
+ /** The buffer. When connected, it is filled with the first portion of the underlying input stream (read at connection).
+ * The buffer is available for inspection, but users should not modify its content; the number of bytes actually available
+ * is {@link #inspectable}.
+ */
+ public byte[] buffer;
+
+ /** Whether we already got a -1 from a read operation. */
+ boolean eof;
+
+ /** The number of bytes read in the buffer, when connected. It is the minimum between <code>buffer.size</code> and the length
+ * of the underlying stream.
+ */
+ public int inspectable;
+
+ /** The overflow file used by this stream: it is created at construction time, and deleted on {@link #dispose()}, finalization,
+ * or exit.
+ */
+ public final File overflowFile;
+
+ /** When connected, this is the output stream of the overflow file where data should be written. */
+ private FileOutputStream overflowOut;
+
+ /** When connected, this is the file channel that underlies the output stream of the overflow file. */
+ private FileChannel overflowOutChannel;
+
+ /** When connected, this is the input stream of the overflow file whence the data should be read. This is
+ * positioned at the beginning of the input stream if <code>position</code>&lt;<code>buffer.length</code>;
+ * otherwise, the next byte returned by this stream is going to be the (<code>position</code>-<code>buffer.length</code>)-th
+ * of this stream (numbered from 0). It is anyway always positioned before <code>overflowOut</code>. */
+ private FileInputStream overflowIn;
+
+ /** When connected, this is the file channel that underlies the input stream of the overflow file. */
+ private FileChannel overflowInChannel;
+
+ /** When connected, this is the underlying input stream. */
+ private InputStream underlying;
+
+ /** When connected, this is the number of bytes (ever) read from {@link #underlying}. */
+ private long readBytes;
+
+ /** The position on this stream (i.e., the index of the next byte to be returned). */
+ private long position;
+
+ /** The state of this stream. */
+ private State state;
+
+ /** Whether we know the length already (this only happens if the entire underlying file has been completely read). */
+ private boolean lengthKnown;
+
+ /** Creates a new ready stream.
+ *
+ * @param bufferSize the buffer size, in bytes.
+ * @param overflowFileDir the directory where the overflow file should be created, or {@code null} for the default temporary directory.
+ * @throws IOException if some exception occurs during creation.
+ */
+ public InspectableBufferedInputStream(final int bufferSize, File overflowFileDir) throws IOException {
+ if (overflowFileDir != null && ! overflowFileDir.isDirectory()) throw new IllegalArgumentException("Wrong overflow directory " + overflowFileDir);
+ if (bufferSize <=0) throw new IllegalArgumentException("Wrong buffer size " + bufferSize);
+ buffer = new byte[bufferSize];
+ if (overflowFileDir == null) overflowFileDir = new File(System.getProperty("java.io.tmpdir", "/tmp"));
+ overflowFile = Util.createHierarchicalTempFile(overflowFileDir, OVERFLOW_FILE_RANDOM_PATH_ELEMENTS, getClass().getSimpleName() + '.', ".overflow");
+ LOGGER.debug("Creating overflow file " + overflowFile);
+ overflowFile.deleteOnExit();
+ state = State.READY;
+
+ overflowOut = new FileOutputStream(overflowFile);
+ overflowOutChannel = overflowOut.getChannel();
+ overflowIn = new FileInputStream(overflowFile);
+ overflowInChannel = overflowIn.getChannel();
+ }
+
+ /** Creates a new ready stream using default temporary directory for the overflow file.
+ *
+ * @param bufferSize the buffer size, in bytes.
+ * @throws IOException if some exception occurs during creation.
+ */
+ public InspectableBufferedInputStream(final int bufferSize) throws IOException {
+ this(bufferSize, null);
+ }
+
+ /** Creates a new ready stream with default buffer size, and using default temporary directory for the overflow file.
+ *
+ * @throws IOException if some exception occurs during creation.
+ */
+ public InspectableBufferedInputStream() throws IOException {
+ this(DEFAULT_BUFFER_SIZE);
+ }
+
+ /** Connects to a given input stream, and fills the buffer accordingly. Can only be called on a non-disposed stream.
+ *
+ * @param underlying the underlying input stream to which we should connect.
+ * @throws IOException if some exception occurs while reading
+ */
+ public void connect(final InputStream underlying) throws IOException {
+ if (state == State.DISPOSED) throw new IllegalStateException("Connecting a disposed stream");
+ if (underlying == null) throw new IllegalArgumentException("Cannot connect to null");
+ this.underlying = underlying;
+
+ inspectable = 0;
+ eof = false;
+ int result;
+ while((result = underlying.read(buffer, inspectable, buffer.length - inspectable)) > 0) inspectable += result;
+ if (result < 0) eof = true;
+ readBytes = inspectable;
+ position = 0;
+
+ overflowInChannel.position(0);
+ overflowOutChannel.position(0);
+ state = State.CONNECTED;
+ lengthKnown = false;
+ }
+
+ /** Truncates the overflow file to a given size. Can only be called when this stream is ready.
+ *
+ * @param size the new size; the final size is guaranteed to be no more than this.
+ * @throws IOException if some exception occurs while truncating the file
+ */
+ public void truncate(final long size) throws FileNotFoundException, IOException {
+ if (state != State.READY) throw new IllegalStateException("Truncation is possible only for non-connected and non-disposed streams");
+ overflowOutChannel.truncate(size);
+ }
+
+ /** The number of bytes read so far from the underlying stream.
+ *
+ * @return the number of bytes read so far from the underlying stream.
+ */
+ public long readBytes() {
+ return readBytes;
+ }
+
+ /** Disposes this stream, deleting the overflow file and nulling the buffer. After this, the stream is unusable. */
+ public void dispose() throws IOException {
+ buffer = null;
+ overflowOut.close();
+ overflowIn.close();
+ overflowFile.delete();
+ state = State.DISPOSED;
+ }
+
+ protected void finalize() throws Throwable {
+ try {
+ if (state != State.DISPOSED) dispose();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ /** Makes this stream ready. Can only be called on a non-disposed stream. If the stream is ready, it does nothing. If the stream
+ * is connected, it closes the underlying stream, making this stream ready for a new {@link #connect(InputStream) connection} or to be
+ * {@link #dispose() disposed}.
+ *
+ */
+ public void close() throws IOException {
+ if (state == State.READY) return;
+ if (state == State.DISPOSED) throw new IllegalStateException("Cannot close a disposed s tream");
+ underlying.close();
+ readBytes = position = 0;
+ state = State.READY;
+ }
+
+ /** Rewinds this stream. Can only be called on a connected stream. */
+ public void rewind() throws IOException {
+ if (state != State.CONNECTED) throw new IllegalStateException("Cannot rewind a non-connected (" + state + ") stream");
+ position = 0;
+ overflowInChannel.position(0);
+ }
+
+
+ @Override
+ public int available() throws IOException {
+ if (state != State.CONNECTED) throw new IllegalStateException();
+ long av = readBytes - position;
+ if (! eof) av += underlying.available();
+ return (int)Math.min(Integer.MAX_VALUE, av);
+ }
+
+ /** Reads at most <code>k</code> bytes from the underlying stream, using the given buffer and starting
+ * from a given offset, and copy them to the overflow file. Updates the number of bytes read. Differently from all other
+ * public methods, this method does not perform any state-consistency check.
+ *
+ * @param buffer the buffer where the bytes are read.
+ * @param offset the offset from where the bytes are written onto <code>buffer</code>.
+ * @param length the maximum number of bytes to be read.
+ * @return the number of bytes actually read.
+ * @throws IOException if some exception occurs while copying.
+ */
+ private int copy(byte[] buffer, int offset, int length) throws IOException {
+ LOGGER.debug("Copying " + length + " more bytes from the underlying stream");
+ if (eof) return 0;
+ int totallyRead = 0, read;
+ do {
+ read = underlying.read(buffer, offset + totallyRead, length - totallyRead);
+ if (read < 0) {
+ eof = true;
+ break;
+ }
+ totallyRead += read;
+ } while (length > totallyRead);
+ readBytes += totallyRead;
+ overflowOut.write(buffer, offset, totallyRead);
+ overflowOut.flush();
+ return totallyRead;
+ }
+
+ @Override
+ public int read(byte[] b, int offset, int length) throws IOException {
+ if (state != State.CONNECTED) throw new IllegalStateException("Cannot read from an unconnected stream");
+ if (b != null) ByteArrays.ensureOffsetLength(b, offset, length);
+ if (length == 0) return 0;
+ int copied = 0; // The overall number of bytes copied onto b
+ LOGGER.debug("Requested to read " + length);
+ if (position < inspectable) {
+ /* The first Math.min(inspectable-position,length) bytes should be taken from the buffer.
+ * inspectable - position = actual number of bytes available to be read */
+ copied = Math.min(inspectable - (int)position, length);
+ LOGGER.debug(" -> from memory buffer " + copied);
+ System.arraycopy(buffer, (int)position, b, offset, copied);
+ position += copied;
+ offset += copied;
+ length -= copied;
+ }
+ LOGGER.debug("After buffer, remaining to read " + length);
+ /* If the underlying file is shorter than the buffer, we stop here.
+ * Notice that if we copied no byte, we must return 0 or -1 depending on
+ * whether we could have returned something (were length positive) or not.
+ */
+ if (readBytes < buffer.length) {
+ LOGGER.debug("Underlying is shorter than buffer; returning " + (copied > 0? copied : (position < readBytes? 0 : -1)));
+ return copied > 0? copied : (position < readBytes? 0 : -1); // Underlying is shorter than buffer
+ }
+ /* If there is still some byte to be copied, we check whether some of
+ * them are available in the overflow file.
+ */
+ if (length > 0) {
+ // Some of them are already available in the overflow file: copy them
+ int toBeReadFromOverflow = Math.min((int)(readBytes - position), length);
+ int readFromOverflow = 0, c;
+ do {
+ c = overflowIn.read(b, offset + readFromOverflow, toBeReadFromOverflow - readFromOverflow);
+ if (c < 0) break;
+ readFromOverflow += c;
+ } while (toBeReadFromOverflow > readFromOverflow);
+ LOGGER.debug(" -> from overflow file " + toBeReadFromOverflow);
+ copied += toBeReadFromOverflow;
+ position += toBeReadFromOverflow;
+ offset += toBeReadFromOverflow;
+ length -= toBeReadFromOverflow;
+ }
+ LOGGER.debug("After file, remaining to read " + length);
+ /* If there is still some byte to be copied, we copy it from the
+ * underlying input stream.
+ */
+ if (length > 0) {
+ // Should read from the underlying stream
+ int c = copy(b, offset, length);
+ LOGGER.debug(" -> copied from underlying stream " + c);
+ copied += c;
+ position += c;
+ }
+ LOGGER.debug("Returning " + (copied > 0? copied : -1));
+ return copied > 0? copied : -1;
+ }
+
+ @Override
+ public int read(byte[] b) throws IOException {
+ return read(b, 0, b.length);
+ }
+
+ @Override
+ public long skip(final long n) throws IOException {
+ long skipped = 0;
+ for(int read; (read = read(b, 0, (int)Math.min(n - skipped, b.length))) > 0;) skipped += read;
+ return skipped;
+ }
+
+ @Override
+ public int read() throws IOException {
+ if (state != State.CONNECTED) throw new IllegalStateException("Cannot read from an unconnected stream");
+ if (position < buffer.length) // In memory
+ return position < readBytes? buffer[(int)(position++)] & 0xFF : -1;
+ else {
+ if (position >= readBytes) { // Prefill buffer
+ if (eof) return -1;
+ int read = underlying.read();
+ if (read < 0) {
+ eof = true;
+ return read;
+ }
+ readBytes++;
+ overflowOut.write(read);
+ overflowOut.flush();
+ }
+ position++;
+ return overflowIn.read();
+ }
+ }
+
+ /** Returns the current length of the overflow file.
+ *
+ * @return the length of the overflow file.
+ */
+ public long overflowLength() {
+ return overflowFile.length();
+ }
+
+ /** Reads the underlying input stream up to a given limit.
+ *
+ * @param limit the maximum number of bytes to be read, except for the memory buffer. More precisely, up to <code>Math.max(buffer.length,limit)</code>
+ * bytes are read (because the buffer is filled at connection).
+ */
+ public void fill(long limit) throws IOException {
+ if (state != State.CONNECTED) throw new IllegalStateException("Cannot read from an unconnected stream");
+ while (readBytes < limit && read(b, 0, (int)Math.min(limit - readBytes, b.length)) > 0);
+ }
+
+ /** Reads fully the underlying input stream.
+ * @see #fill(long)
+ */
+ public void fill() throws IOException {
+ fill(Integer.MAX_VALUE);
+ }
+
+
+ /** Reads fully the underlying input stream and rewinds.
+ *
+ * @see #fill()
+ * @see #rewind()
+ */
+ public void fillAndRewind() throws IOException {
+ fill();
+ rewind();
+ }
+
+ /** Returns the overall length of this input stream.
+ * This method calls it with argument {@link Long#MAX_VALUE}.
+ *
+ * @throws RuntimeException wrapping an {@link IOException} if the call to {@link #fill(long)} does.
+ */
+ @Override
+ public long length() {
+ if (! lengthKnown) {
+ final long position = position();
+ try {
+ fill(Long.MAX_VALUE); // Read to the end: necessary to know the length.
+ lengthKnown = true;
+ rewind();
+ skip(position);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ return readBytes;
+ }
+
+ @Override
+ public long position() {
+ return position;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/MeasurableSequenceInputStream.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/MeasurableSequenceInputStream.java
new file mode 100644
index 0000000..c834555
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/MeasurableSequenceInputStream.java
@@ -0,0 +1,118 @@
+package it.unimi.dsi.law.warc.io;
+
+import java.io.IOException;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+
+// RELEASE-STATUS: DIST
+
+/**
+ * A {@link it.unimi.dsi.fastutil.io.MeasurableInputStream} version of a {@link java.io.SequenceInputStream}.
+ *
+ */
+public class MeasurableSequenceInputStream extends MeasurableInputStream {
+
+ /** The array of streams. */
+ private final MeasurableInputStream[] streams;
+
+ /** The current stream ({@code null} when no input streams are left). */
+ private MeasurableInputStream currentStream;
+
+ /** The index in {@link #streams} of the {@link #currentStream}. */
+ private int currentStreamIndex;
+
+ /** The overall length of the sequence. */
+ private final long length;
+
+ /** The position (number of byetes read so far). */
+ private long position;
+
+ /**
+ * Constructs a sequence from an array of input streams.
+ *
+ * @param streams the streams (some of which may be {@code null}).
+ * @throws IOException
+ */
+ public MeasurableSequenceInputStream(MeasurableInputStream... streams) throws IOException {
+ if (streams == null) throw new NullPointerException();
+ this.streams = streams;
+ long l = 0;
+ for (MeasurableInputStream is : streams) if (is != null) l += is.length();
+ length = l;
+ position = 0;
+ currentStreamIndex = -1;
+ nextStream();
+ }
+
+ /**
+ * Updates {@link #currentStream} and {@link #currentStreamIndex} to the next non-null
+ * input stream in {@link #streams}, closing the previous streams.
+ *
+ * @return the next stream.
+ */
+ private boolean nextStream() {
+ do currentStreamIndex++; while (currentStreamIndex < streams.length && streams[currentStreamIndex] == null);
+ if (currentStreamIndex < streams.length) {
+ currentStream = streams[currentStreamIndex];
+ return true;
+ } else return false;
+ }
+
+ public long length() {
+ return length;
+ }
+
+ public long position() {
+ return position;
+ }
+
+ public int read() throws IOException {
+ if (currentStream == null) return -1;
+ int b;
+ do {
+ b = currentStream.read();
+ if (b != -1) position += 1;
+ } while (b == -1 && nextStream());
+ return b;
+ }
+
+ @Override
+ public int read(byte[] buf, int offset, int length) throws IOException {
+ if (buf == null) throw new NullPointerException();
+ if ((offset < 0) || (offset > buf.length) || (length < 0) ||
+ ((offset + length) > buf.length) || ((offset + length) < 0))
+ throw new IndexOutOfBoundsException();
+ if (currentStream == null) return -1;
+ if (length == 0) return 0;
+ int r;
+ do {
+ r = currentStream.read(buf, offset, length);
+ if (r != -1) position += r;
+ } while (r == -1 && nextStream());
+ return r;
+ }
+
+ @Override
+ public void close() throws IOException {
+ for (MeasurableInputStream i : streams) if (i != null) i.close();
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/WarcFilteredIterator.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/WarcFilteredIterator.java
new file mode 100644
index 0000000..d6146fb
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/WarcFilteredIterator.java
@@ -0,0 +1,109 @@
+package it.unimi.dsi.law.warc.io;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.NoSuchElementException;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.law.warc.filters.Filter;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A class to iterate over WARC files getting only records that satisfy a given filter. */
+
+public class WarcFilteredIterator implements Iterator<WarcRecord> {
+
+ private final FastBufferedInputStream in;
+ private final WarcRecord record;
+ private final Filter<WarcRecord> filter;
+ private final ProgressLogger pl;
+ private boolean eofIsReached;
+ private boolean cached;
+
+ /**
+ * Builds the filtered iterator.
+ *
+ * <p> This constructor takes a {@link WarcRecord} (or a {@link GZWarcRecord} if the stream
+ * contains compressed records) that will be reused (and thus modified) by calls to
+ * {@link #hasNext()} and {@link #next()}.
+ *
+ * @param in the input stream.
+ * @param record the record used for reading.
+ * @param filter the filter.
+ * @param pl a (pre-initialized) {@link ProgressLogger} that will be updated during reads.
+ */
+ public WarcFilteredIterator(final FastBufferedInputStream in, final WarcRecord record, final Filter<WarcRecord> filter, final ProgressLogger pl) {
+ this.in = in;
+ this.record = record;
+ this.filter = filter;
+ this.pl = pl;
+ eofIsReached = false;
+ cached = false;
+ }
+
+ /**
+ * Builds the filtered iterator.
+ *
+ * <p> This constructor takes a {@link WarcRecord} (or a {@link GZWarcRecord} if the stream
+ * contains compressed records) that will be reused (and thus modified) by calls to
+ * {@link #hasNext()} and {@link #next()}.
+ *
+ * @param in the input stream.
+ * @param record the record used for reading.
+ * @param filter the filter.
+ */
+ public WarcFilteredIterator(final FastBufferedInputStream in, final WarcRecord record, final Filter<WarcRecord> filter) {
+ this(in, record, filter, null);
+ }
+
+ public boolean hasNext() {
+ if (eofIsReached) return false;
+ if (cached) return true;
+ try {
+ long read;
+ do {
+ read = record.read(in);
+ if (pl != null && read != -1) pl.update();
+ }
+ while (read != -1 && ! filter.apply(record));
+ eofIsReached = read == -1;
+ cached = ! eofIsReached;
+ } catch (IOException e) {
+ throw new RuntimeException("IOException while reading next record", e);
+ } catch (FormatException e) {
+ throw new RuntimeException("FormatException while reading next record", e);
+ }
+ return ! eofIsReached;
+ }
+
+ public WarcRecord next() {
+ if (! hasNext()) throw new NoSuchElementException();
+ cached = false;
+ return record;
+ }
+
+ public void remove() {
+ throw new UnsupportedOperationException();
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/WarcRecord.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/WarcRecord.java
new file mode 100644
index 0000000..88ac218
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/WarcRecord.java
@@ -0,0 +1,573 @@
+
+package it.unimi.dsi.law.warc.io;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.net.URI;
+import java.nio.charset.Charset;
+import java.text.ParseException;
+import java.text.SimpleDateFormat;
+import java.util.Date;
+import java.util.EnumSet;
+import java.util.Map;
+import java.util.UUID;
+import java.util.zip.CRC32;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.bytes.ByteArrays;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream.LineTerminator;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.fastutil.objects.Object2ObjectMap;
+import it.unimi.dsi.fastutil.objects.Object2ObjectOpenCustomHashMap;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.util.Util;
+
+// RELEASE-STATUS: DIST
+
+/**
+ * A class to read/write WARC/0.9 records (for format details, please see the <a
+ * href='http://archive-access.sourceforge.net/warc/warc_file_format-0.9.html'>WARC</a> format specifications).
+ *
+ * <p> To write a record, set {@link #header} and {@link #block} appropriately and then call
+ * {@link #write(OutputStream)}. After such call, the {@link WarcRecord.Header#dataLength} field
+ * of the {@link #header} will be modified to reflect the write operation.
+ *
+ * <p> To perform a sequence of consecutive read/skip, call {@link #read(FastBufferedInputStream)}
+ * or {@link #skip(FastBufferedInputStream)}. After a read, the {@link #block} can (but it is not
+ * required to) be read to obtain the read data. The {@link WarcRecord.Header#contentType contentType}
+ * field of the {@link #header} can be used to determine how to further process the content of
+ * {@link #block}.
+ *
+ * <p> As an implementation note: skipping just returns the value of the <code>data-length</code>
+ * field of the skipped record. On the other hand, reading parses the <code>header</code> and sets
+ * all the {@link #header} fields appropriately; hence it sets {@link #block} so that it refers to
+ * the <code>block</code> part of the record. Observe that since {@link #block} is just a "view"
+ * over the underlying stream, its content, or position, are not guaranteed to remain the same after
+ * a consecutive read/skip on the same stream.
+ *
+ * <p>This object can be reused for non-consecutive writes on different streams. On the other hand,
+ * to reuse this object for non-consecutive read/skip, the method {@link #resetRead()} must be
+ * called any time a read/skip does not follow a read/skip from the same stream.
+ *
+ * <p> This class uses internal buffering, hence it is not thread safe.
+ */
+public class WarcRecord {
+// private final static Logger LOGGER = LoggerFactory.getLogger(WarcRecord.class);
+ public static final boolean DEBUG = false;
+ public static final boolean ASSERTS = true;
+
+ /** Tells what method to use to skip bytes in the input stream. It's here for profiling purposes. */
+ public static final boolean USE_POSITION_INSTEAD_OF_SKIP = false;
+
+ /** Content types. */
+ public static enum ContentType {
+ HTTP("message/http"),
+ HTTPS("message/https");
+ public final byte[] byteRepresentation;
+ ContentType(String name) {
+ byteRepresentation = Util.getASCIIBytes(name);
+ }
+ };
+ public static final Object2ObjectMap<byte[],ContentType> BYTE_REPRESENTATION_TO_CONTENT_TYPE =
+ new Object2ObjectOpenCustomHashMap<byte[],ContentType>(ByteArrays.HASH_STRATEGY);
+
+ /** Record types. */
+ public static enum RecordType {
+ WARCINFO("warcinfo"),
+ RESPONSE("response"),
+ RESOURCE("resource"),
+ REQUEST("request"),
+ METADATA("metadata"),
+ REVISIT("revisit"),
+ CONVERSION("conversion"),
+ CONTINUATION("continuation");
+ public final byte[] byteRepresentation;
+ RecordType(String name) {
+ byteRepresentation = Util.getASCIIBytes(name);
+ }
+ };
+ public static final Object2ObjectMap<byte[],RecordType> BYTE_REPRESENTATION_TO_RECORD_TYPE =
+ new Object2ObjectOpenCustomHashMap<byte[],RecordType>(ByteArrays.HASH_STRATEGY);
+
+ static {
+ for (ContentType ct : ContentType.values())
+ BYTE_REPRESENTATION_TO_CONTENT_TYPE.put(ct.byteRepresentation, ct);
+ for (RecordType rt : RecordType.values())
+ BYTE_REPRESENTATION_TO_RECORD_TYPE.put(rt.byteRepresentation, rt);
+ }
+
+ /** A class to contain fields contained in the warc <code>header</code>. */
+ public static class Header {
+
+ /* These are all public to avoid getter/setters. */
+
+ /** The warc <code>data-length</code>. */
+ public long dataLength;
+
+ /** The warc <code>record-type</code>. */
+ public RecordType recordType;
+
+ /** The warc <code>subject-uri</code>. */
+ public URI subjectUri;
+
+ /** The warc <code>creation-date</code>. */
+ public Date creationDate;
+
+ /** The warc <code>content-type</code>. */
+ public ContentType contentType;
+
+ /** The warc <code>record id</code>. */
+ public UUID recordId;
+
+ /** The warc <code>anvl-field</code>s. */
+ public final Map<String,String> anvlFields = new Object2ObjectOpenCustomHashMap<String,String>(Util.CASE_INSENSITIVE_STRING_HASH_STRATEGY);
+
+ /**
+ * Copies this heaer fields from another header.
+ *
+ * @param header the header to copy from.
+ */
+ public void copy(final Header header) {
+ dataLength = header.dataLength;
+ recordType = header.recordType;
+ subjectUri = header.subjectUri;
+ creationDate = header.creationDate;
+ contentType = header.contentType;
+ recordId = header.recordId;
+ anvlFields.clear();
+ anvlFields.putAll(header.anvlFields);
+ }
+
+ @Override
+ public int hashCode() {
+ return recordId.hashCode();
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (! (o instanceof WarcRecord.Header)) return false;
+ if (! recordId.equals(((WarcRecord.Header)o).recordId)) return false;
+ return true;
+ }
+
+ @Override
+ public String toString() {
+ MutableString s = new MutableString();
+ s.append("dataLength: ");
+ s.append(dataLength);
+ s.append(", recordType: ");
+ s.append(recordType);
+ s.append(", subjectUri: ");
+ s.append(subjectUri);
+ s.append(", creationDate: ");
+ s.append(creationDate);
+ s.append(", contentType: ");
+ s.append(contentType);
+ s.append(", recordId: ");
+ s.append(recordId);
+ s.append(", anvlFields: ");
+ s.append(anvlFields);
+ return s.toString();
+ }
+
+ }
+
+ /** An exception to denote parsing errors during reads. */
+ public static class FormatException extends Exception {
+ private static final long serialVersionUID = -1L;
+ public FormatException(String s) {
+ super(s);
+ }
+ }
+
+ /** A minimalistic parser used for header parsing. */
+ private static class MinimalisticParser {
+ @SuppressWarnings("hiding")
+ private static final boolean DEBUG = false;
+ private int start, end, length;
+ private byte[] buf;
+ @SuppressWarnings("unused")
+ public void setInput(byte[] buf) {
+ setInput(buf, 0, buf.length);
+ }
+ public void setInput(byte[] buf, int offset, int length) {
+ this.buf = buf;
+ start = offset;
+ end = start;
+ this.length = length;
+ }
+ public void positionAtNextWord() {
+ start = end;
+ while ((start < length) && Character.isWhitespace(buf[start])) start++;
+ end = start;
+ while ((end < length) && ! Character.isWhitespace(buf[end])) end++;
+ if (DEBUG) System.err.println("Next word '" + Util.getString(buf, start, end - start) + "'");
+ }
+ public boolean startsWith(byte[] m) {
+ int i = 0, ml = m.length;
+ while (i < ml && start + i < length && buf[start + i] == m[i]) i++;
+ return i == ml;
+ }
+ public int asInt() {
+ int i = start, val = 0;
+ while (i < end) {
+ val = val * 10 + (buf[i] - (byte)'0');
+ i++;
+ }
+ if (DEBUG) System.err.println("Returned long " + val);
+ return val;
+ }
+ public String asAsciiSting() {
+ String ret = Util.getString(buf, start, end - start);
+ if (DEBUG) System.err.println("Returned string '" + ret + "'");
+ return ret;
+ }
+ public byte[] asByteArray() {
+ byte[] b = new byte[end - start];
+ System.arraycopy(buf, start, b, 0, end - start);
+ return b;
+ }
+ }
+
+ /** The default size of the internal buffer used for headers read/write. */
+ public static final int DEFAULT_BUFFER_SIZE = 4 * 1024;
+
+ /** The {@link java.nio.charset.Charset} used to encode <code>anvl-field</code>s. */
+ private static final Charset ANVL_CHARSET = Charset.forName("UTF-8");
+
+ /** Some constant strings in their byte equivalent. */
+ public static final byte[] WARC_ID = Util.getASCIIBytes("warc/0.9"),
+ UUID_FIELD_NAME = Util.getASCIIBytes("uuid"),
+ CRLF = new byte[] { 0x0D, 0x0A };
+
+ /** The terminator used reading the warc header with {@link FastBufferedInputStream#readLine(byte[], java.util.EnumSet)}. */
+ final EnumSet<LineTerminator> LINE_TERMINATOR = EnumSet.of(FastBufferedInputStream.LineTerminator.CR_LF);
+
+ /** A formatter for the <code>creation-date</code> field of warc <code>header-line</code>. */
+ private static final SimpleDateFormat DATE_FORMAT = new SimpleDateFormat("yyyyMMddHHmmss");
+
+ /** The internal buffer used for headers read/write. */
+ private final byte[] buffer;
+
+ /** The instance of header parser. */
+ private final MinimalisticParser minimalisticParser = new MinimalisticParser();
+
+ /** The position of the first byte of the last read <code>header</code> (set by {@link #readHeaderLine(FastBufferedInputStream)}). */
+ private long positionOfLastHeader;
+
+ /** The <code>data-length</code> found in the last read <code>header</code> (set by {@link #readHeaderLine(FastBufferedInputStream)}). */
+ private long dataLengthInLastHeader;
+
+ /** The class used in {@link #write(OutputStream)} to compute CRC32 of the content for {@link GZWarcRecord}. If {@code null} the crc will not be updated. */
+ protected CRC32 crc = null;
+
+ /** The warc <code>header</code>. */
+ public final Header header = new Header();
+
+ /** The warc <code>block</code>. */
+ public MeasurableInputStream block;
+
+ /** Builds a warc record.
+ *
+ * @param buffer the buffer used for header read/write buffering.
+ */
+ public WarcRecord(byte[] buffer) {
+ this.buffer = buffer;
+ positionOfLastHeader = -1;
+ }
+
+ /**
+ * Builds a warc record.
+ *
+ * It will allocate an internal buffer of size {@link #DEFAULT_BUFFER_SIZE} bytes to buffer
+ * header read/writes.
+ */
+ public WarcRecord() {
+ this(new byte[DEFAULT_BUFFER_SIZE]);
+ }
+
+ /**
+ * Copies this warc record fields from another warc record.
+ *
+ * @param record the record to copy from.
+ */
+ public void copy(final WarcRecord record) {
+ header.copy(record.header);
+ block = record.block;
+ }
+
+ /**
+ * A method to allow the reuse of the present object for non consecutive reads.
+ */
+ public void resetRead() {
+ positionOfLastHeader = -1;
+ }
+
+ /**
+ * A method to skip a record from an {@link java.io.InputStream}.
+ *
+ * @param bin the {@link FastBufferedInputStream} to read from.
+ * @return the value of the <code>data-length</code>, or -1 if eof has been reached.
+ */
+ public long skip(FastBufferedInputStream bin) throws IOException, FormatException {
+ if (readHeaderLine(bin) == -1) return -1;
+ final long newPosition = positionOfLastHeader + dataLengthInLastHeader;
+ if (ASSERTS) assert newPosition >= bin.position();
+ if (USE_POSITION_INSTEAD_OF_SKIP)
+ bin.position(newPosition);
+ else
+ bin.skip(newPosition - bin.position());
+ return dataLengthInLastHeader;
+ }
+
+ /**
+ * A method to read a record from an {@link java.io.InputStream}.
+ *
+ * @param bin the {@link FastBufferedInputStream} to read from.
+ * @return the value of the <code>data-length</code>, or -1 if eof has been reached.
+ */
+ public long read(FastBufferedInputStream bin) throws IOException, FormatException {
+
+ // read the header-line
+
+ if (readHeaderLine(bin) == -1) return -1;
+ header.dataLength = dataLengthInLastHeader;
+
+ // parse the rest of it
+
+ minimalisticParser.positionAtNextWord();
+ header.recordType = BYTE_REPRESENTATION_TO_RECORD_TYPE.get(minimalisticParser.asByteArray());
+
+ minimalisticParser.positionAtNextWord();
+ header.subjectUri = BURL.parse(minimalisticParser.asAsciiSting());
+
+ minimalisticParser.positionAtNextWord();
+ try {
+ header.creationDate = DATE_FORMAT.parse(minimalisticParser.asAsciiSting());
+ }
+ catch (ParseException e) {
+ throw new FormatException("Error parsing creation-date: " + e.getMessage());
+ }
+
+ minimalisticParser.positionAtNextWord();
+ header.contentType = BYTE_REPRESENTATION_TO_CONTENT_TYPE.get(minimalisticParser.asByteArray());
+
+ minimalisticParser.positionAtNextWord();
+ String recordIdAsString = minimalisticParser.asAsciiSting();
+ if (! minimalisticParser.startsWith(UUID_FIELD_NAME)) throw new FormatException("Unknown type of record-id." + recordIdAsString);
+ try {
+ header.recordId = UUID.fromString(recordIdAsString.substring(UUID_FIELD_NAME.length + 1));
+ }
+ catch (IllegalArgumentException e) {
+ throw new FormatException("Error parsing record-id '" + recordIdAsString + "'; " + e.getMessage());
+ }
+
+ // done with the header, now anvl-fields and block
+ header.anvlFields.clear();
+ Util.readANVLHeaders(bin, header.anvlFields, ANVL_CHARSET);
+ final long blockLength = header.dataLength - (bin.position() - positionOfLastHeader) - 4; // the two ending CRLF
+ if (ASSERTS) assert blockLength >= 0;
+ block = new BoundedCountingInputStream(bin, blockLength);
+
+ return dataLengthInLastHeader;
+ }
+
+ /**
+ * A method to write this record to an {@link java.io.OutputStream}.
+ *
+ * @param out where to write.
+ */
+ public void write(OutputStream out) throws IOException {
+
+ // prebuffer the header
+
+ FastByteArrayOutputStream hbuf = new FastByteArrayOutputStream(buffer);
+ header.dataLength = prebufferHeader(hbuf);
+
+ // header-line start (warc-id and size)
+
+ byte[] dataLengthAsBytes = Util.getASCIIBytes(Long.toString(header.dataLength));
+
+ out.write(WARC_ID);
+ out.write(' ');
+ out.write(dataLengthAsBytes);
+ out.write(' ');
+
+ // prebuffered stuff (already CRLF terminated)
+
+ out.write(hbuf.array, 0, hbuf.length);
+
+ // update crc
+
+ if (crc != null) {
+ crc.update(WARC_ID);
+ crc.update(' ');
+ crc.update(dataLengthAsBytes);
+ crc.update(' ');
+ crc.update(hbuf.array, 0, hbuf.length);
+ }
+ hbuf = null;
+
+ // block
+
+ int read;
+ long remaining = block.length();
+
+ do {
+ read = block.read(buffer, 0, (int)Math.min(remaining, buffer.length));
+ if (read == -1) break;
+ out.write(buffer, 0, read);
+ if (crc != null) crc.update(buffer, 0, read);
+ remaining -= read;
+ } while (remaining > 0);
+
+ assert remaining == 0 : remaining;
+
+ // two ending CRLFs
+
+ out.write(CRLF);
+ out.write(CRLF);
+
+ if (crc != null) {
+ crc.update(CRLF);
+ crc.update(CRLF);
+ }
+
+ }
+
+ @Override
+ public String toString() {
+ return header.toString();
+ }
+
+ /**
+ * Prebuffers the warc <code>header</code>.
+ *
+ * <p> This method writes in the given stream the header part of the warc record starting from
+ * <code>record-type</code> filed included to the last CLRF before the <code>block</code>
+ * included and computes the length of the overall record, returning it.
+ *
+ * @param hbuf the stream where to prebuffer
+ * @return the length of the record.
+ */
+ private long prebufferHeader(FastByteArrayOutputStream hbuf) throws IOException {
+
+ long dataLength;
+
+ // warc header-line (staring from record-type included)
+
+ hbuf.write(header.recordType.byteRepresentation);
+ hbuf.write(' ');
+ hbuf.write(BURL.toByteArray(header.subjectUri));
+ hbuf.write(' ');
+ hbuf.write(Util.getASCIIBytes(DATE_FORMAT.format(header.creationDate)));
+ hbuf.write(' ');
+ hbuf.write(header.contentType.byteRepresentation);
+ hbuf.write(' ');
+ hbuf.write(UUID_FIELD_NAME);
+ hbuf.write(':');
+ hbuf.write(Util.getASCIIBytes(header.recordId.toString()));
+ hbuf.write(CRLF);
+
+ // warc anvl-filed(s) (possibly empty, but to be CRLF terminated in any case)
+
+ Util.writeANVLHeaders(hbuf, header.anvlFields, ANVL_CHARSET);
+ hbuf.write(CRLF);
+
+ /* compute overall size: prebuffered size + warc header-line (up to record-type excluded) + block size */
+
+ // (6 = 2 blanks around size, and 2 ending crlf)
+ dataLength = (int)(WARC_ID.length + hbuf.length + block.length() + 6);
+
+ // if d(x) are the decimal digits of x, it holds that d(x + d(x)) >= d(x) and if >, d(x + (d(x) + 1)) = (d(x) + 1)
+ int digits = Util.digits(dataLength);
+ dataLength += Util.digits(dataLength + digits) == digits ? digits : digits + 1;
+
+ return dataLength;
+ }
+
+ /**
+ * Reads the <code>header-line</code> of the next warc record.
+ *
+ * <p> This method first positions the stream to the start of the next record (more precisely:
+ * if {@link #positionOfLastHeader} is not -1 it assumes to have been previously called on this
+ * same stream so that {@link #positionOfLastHeader} and {@link #dataLengthInLastHeader} are
+ * sensibly set).
+ *
+ * <p> It then reads the <code>header-line</code> (skipping the possible extra CRLFs before
+ * the <code>warc-id</code>) setting {@link #positionOfLastHeader} to the position in the
+ * stream of the first byte of the <code>header-line</code>.
+ *
+ * <p> Hence, it sets such line as the {@link #minimalisticParser} input and uses the parser to
+ * obtain the value of the <code>data-length</code> filed, that finally stores in
+ * {@link #dataLengthInLastHeader} (and returns).
+ *
+ * @param bin the stream where to read from.
+ * @return the value of the <code>data-length</code> field, or -1 if eof has been reached.
+ */
+ private long readHeaderLine(FastBufferedInputStream bin) throws IOException, FormatException {
+
+ // possibly consume the block part of a previous read
+
+ if (positionOfLastHeader != -1) {
+ long newPosition = positionOfLastHeader + dataLengthInLastHeader;
+ if (ASSERTS) assert newPosition >= bin.position();
+ if (USE_POSITION_INSTEAD_OF_SKIP)
+ bin.position(newPosition);
+ else
+ bin.skip(newPosition - bin.position());
+ }
+
+ // read the header line
+
+ int length = 0, read;
+ byte[] gwowing = buffer;
+ do {
+ positionOfLastHeader = bin.position();
+
+ while((read = bin.readLine(gwowing, length, gwowing.length - length, LINE_TERMINATOR)) == gwowing.length - length) {
+ length += read;
+ gwowing = ByteArrays.grow(gwowing, gwowing.length + 1);
+ };
+ if (read == -1) return -1;
+ length += read;
+
+ if (DEBUG) if (length >= 0) System.err.println("length: " + length + "\nline: " + Util.getString(buffer, 0, length));
+ } while (length == 0);
+
+ // set the parser to the read line
+
+ minimalisticParser.setInput(gwowing, 0, length);
+
+ // parse warc-id, parse data-length and assign it to dataLength
+
+ minimalisticParser.positionAtNextWord();
+ if (! minimalisticParser.startsWith(WARC_ID)) throw new FormatException("Missing or incorrect warc-id.");
+
+ minimalisticParser.positionAtNextWord();
+ dataLengthInLastHeader = minimalisticParser.asInt();
+
+ return dataLengthInLastHeader;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialHttpResponseRead.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialHttpResponseRead.java
new file mode 100644
index 0000000..d1976e8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialHttpResponseRead.java
@@ -0,0 +1,87 @@
+package it.unimi.dsi.law.warc.io.examples;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.InputStreamReader;
+import java.io.Reader;
+import java.nio.charset.Charset;
+import java.nio.charset.IllegalCharsetNameException;
+import java.nio.charset.UnsupportedCharsetException;
+
+import com.google.common.base.Charsets;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.WarcHttpResponse;
+
+// RELEASE-STATUS: DIST
+
+public class SequentialHttpResponseRead {
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+
+ final String warcFile = "test";
+ final boolean isGZipped = true;
+
+ final WarcRecord record = isGZipped ? new GZWarcRecord() : new WarcRecord();
+ final WarcHttpResponse response = new WarcHttpResponse();
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(
+ new FileInputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), IO_BUFFER_SIZE);
+
+ for (;;) {
+
+ if (record.read(in) == -1) break;
+ if (isGZipped) System.out.println("GZip header:\n" + ((GZWarcRecord)record).gzheader);
+ System.out.println("WARC header:\n" + record.header);
+
+ if (! response.fromWarcRecord(record)) continue;
+
+ System.out.println("HTTP status line:\n" + response.statusLine());
+ System.out.println("HTTP headers:\n" + response.headers());
+
+ System.out.println("First few bytes of content:");
+
+ Charset charset = Charsets.ISO_8859_1;
+ String charsetName = record.header.anvlFields.get(HttpResponse.GUESSED_CHARSET_HEADER);
+ if (charsetName != null) try {
+ charset = Charset.forName(charsetName);
+ } catch (IllegalCharsetNameException e) {
+ System.err.println("Illegal charset, using " + Charsets.ISO_8859_1);
+ } catch (UnsupportedCharsetException e) {
+ System.err.println("Unsupported charset, using " + Charsets.ISO_8859_1);
+ }
+
+ final Reader reader = new InputStreamReader(response.contentAsStream(), charset);
+ int n = 100, r;
+ while ((r = reader.read()) != -1 && n-- > 0)
+ System.out.print((char)r);
+
+ System.out.println("\n");
+
+ }
+
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialHttpResponseWrite.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialHttpResponseWrite.java
new file mode 100644
index 0000000..2c117a8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialHttpResponseWrite.java
@@ -0,0 +1,67 @@
+package it.unimi.dsi.law.warc.io.examples;
+
+import java.io.File;
+import java.io.FileOutputStream;
+
+import org.apache.http.ProtocolVersion;
+import org.apache.http.message.BasicStatusLine;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.MutableHttpResponse;
+import it.unimi.dsi.law.warc.util.Util;
+
+// RELEASE-STATUS: DIST
+
+public class SequentialHttpResponseWrite {
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+
+ final String warcFile = "test";
+ final boolean isGZipped = true;
+
+ final WarcRecord record = isGZipped ? new GZWarcRecord() : new WarcRecord();
+ final MutableHttpResponse response = new MutableHttpResponse();
+
+ final FastBufferedOutputStream out = new FastBufferedOutputStream(
+ new FileOutputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), IO_BUFFER_SIZE);
+
+ for (int i = 0; i < 10; i++) {
+
+ response.statusLine(new BasicStatusLine(new ProtocolVersion("HTTP", 1, 0), 200, "OK"));
+ response.uri(BURL.parse("http://localhost/" + i));
+ response.headers(null);
+ response.contentAsStream(new FastByteArrayInputStream(Util.getASCIIBytes("<html><head><title>Doc " + i + "</title><body><p>This is document nr. " + i + "</body></html>")));
+
+ response.toWarcRecord(record);
+ record.write(out);
+
+ }
+
+ out.close();
+
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialWarcRecordRead.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialWarcRecordRead.java
new file mode 100644
index 0000000..c7de0c2
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialWarcRecordRead.java
@@ -0,0 +1,59 @@
+package it.unimi.dsi.law.warc.io.examples;
+
+import java.io.File;
+import java.io.FileInputStream;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+
+// RELEASE-STATUS: DIST
+
+public class SequentialWarcRecordRead {
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+
+ final String warcFile = "test";
+ final boolean isGZipped = true;
+ final WarcRecord record = isGZipped ? new GZWarcRecord() : new WarcRecord();
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(
+ new FileInputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), IO_BUFFER_SIZE);
+
+ for (;;) {
+
+ if (record.read(in) == -1) break;
+ if (isGZipped) System.out.println("GZip header:\n" + ((GZWarcRecord)record).gzheader);
+ System.out.println("WARC header:\n" + record.header);
+ System.out.println("First ten bytes of block:");
+
+ int n = 10, r;
+ while ((r = record.block.read()) != -1 && n-- > 0)
+ System.out.print(Integer.toHexString(r) + " ");
+
+ System.out.println("\n");
+
+ }
+
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialWarcRecordWrite.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialWarcRecordWrite.java
new file mode 100644
index 0000000..3f3bbe0
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/examples/SequentialWarcRecordWrite.java
@@ -0,0 +1,71 @@
+package it.unimi.dsi.law.warc.io.examples;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.util.Date;
+import java.util.UUID;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.ContentType;
+import it.unimi.dsi.law.warc.io.WarcRecord.RecordType;
+import it.unimi.dsi.law.warc.util.Util;
+
+// RELEASE-STATUS: DIST
+
+public class SequentialWarcRecordWrite {
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+
+ final String warcFile = "test";
+ final boolean isGZipped = true;
+
+ final WarcRecord record = isGZipped ? new GZWarcRecord() : new WarcRecord();
+
+ final FastBufferedOutputStream out = new FastBufferedOutputStream(
+ new FileOutputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), IO_BUFFER_SIZE);
+
+ final WarcRecord.Header header = record.header;
+
+ for (int i = 0; i < 10; i++) {
+
+ header.recordType = RecordType.RESPONSE;
+ header.subjectUri = BURL.parse("http://localhost/" + i);
+ header.creationDate = new Date();
+ header.contentType = ContentType.HTTP;
+ header.recordId = UUID.randomUUID();
+ header.anvlFields.clear();
+
+ record.block = new FastByteArrayInputStream(Util.getASCIIBytes("HTTP/1.0 200 OK\n\n<html><head><title>Doc " + i + "</title><body><p>This is document nr. " + i + "</body></html>"));
+
+ record.write(out);
+
+ }
+
+ out.close();
+
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/package.html
new file mode 100644
index 0000000..d1f0f2a
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/io/package.html
@@ -0,0 +1,89 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>WARC I/O</title>
+ </head>
+
+ <body>
+
+ <p>Provides classes performing low and high level WARC I/O (for format details, please see the <a href='http://archive-access.sourceforge.net/warc/warc_file_format-0.9.html'>ISO</a> draft).
+
+ <p>This code is designed to be very efficient. In particular some choices are worth mentioning:
+
+ <dl>
+
+ <dt><strong>Object reuse</strong></dt>
+
+ <dd>
+
+ <p>Whenever possible, a single object must be used to perform multiple operations.
+
+ <p>For instance, to read a sequence of WARC records, one must instantiate a single {@link it.unimi.dsi.law.warc.io.WarcRecord} object and invoke repeatedly its {@link it.unimi.dsi.law.warc.io.WarcRecord#read(it.unimi.dsi.fastutil.io.FastBufferedInputStream bin) read} method, as opposed to have a new {@link it.unimi.dsi.law.warc.io.WarcRecord} object for every record read.
+
+ <p>The convenience iterators offered by {@link it.unimi.dsi.law.warc.io.WarcFilteredIterator} and {@link it.unimi.dsi.law.warc.io.HttpResponseFilteredIterator} follow this approach
+ and need a {@link it.unimi.dsi.law.warc.io.WarcRecord} to be given at construction time; such object will be reused by the iterator to expose the record read at every iteration.
+
+ <dt><strong>Pull writes</strong></dt>
+
+ <dd>
+
+ <p>To write a record, one should provide a {@link it.unimi.dsi.fastutil.io.MeasurableInputStream} (by setting the {@link it.unimi.dsi.law.warc.io.WarcRecord#block block} field of {@link it.unimi.dsi.law.warc.io.WarcRecord}) from which the {@link it.unimi.dsi.law.warc.io.WarcRecord#write(java.io.OutputStream out) write}
+ method of {@link it.unimi.dsi.law.warc.io.WarcRecord} will pull the data to be written.
+
+ <p>This is especially convenient when the data comes from the network (as during a crawl) or from a read operation (as during batch processing of data) so that one can write a record simply using the available (suitably wrapped) input stream.
+
+ <dt><strong>Public fields</strong></dt>
+
+ <dd>
+
+ <p>To avoid the overhead of getters and setters, some fields (such as the {@link it.unimi.dsi.law.warc.io.WarcRecord#block block} and {@link it.unimi.dsi.law.warc.io.WarcRecord#header header} of {@link it.unimi.dsi.law.warc.io.WarcRecord} and
+ {@link it.unimi.dsi.law.warc.io.GZWarcRecord#gzheader gzheader} of
+ {@link it.unimi.dsi.law.warc.io.GZWarcRecord}) are declared <code>public</code>.
+
+ </dl>
+
+ <h3>Low level I/O</h3>
+
+ <p>Low level I/O can be performed using {@link it.unimi.dsi.law.warc.io.WarcRecord} for the WARC format or {@link it.unimi.dsi.law.warc.io.GZWarcRecord} for the compressed WARC format. For further detail, see the class documentation.
+
+ <p>A very simple example of low level <em>sequential reads</em> can be found in the source code of {@link it.unimi.dsi.law.warc.io.examples.SequentialWarcRecordRead}, while an example of low level <em>sequential writes</em> can be found in the source code of {@link it.unimi.dsi.law.warc.io.examples.SequentialWarcRecordWrite}; both examples show how to use the plain and compressed WARC format.
+
+ <p>A (more complex) example of low level <em>random access reads</em> can be found in the source code of the {@link it.unimi.dsi.law.warc.tool.CutWarc} tool (see also the {@link it.unimi.dsi.law.warc.tool.IndexWarc} tool on how to create the index).
+
+ <h3>High level I/O</h3>
+
+ <p>High level I/O is provided at the moment only for (a subset of) the <code>response</code> record type of the WARC specification. More precisely, interfaces and classes are provided to deal with responses with <code>content-type</code> of <code>HTTP</code> (or <code>HTTPS</code>) kind. At the interface level, read-only access is mainly provided, in particular, all the interfaces specify getters for the various properties, but no getters.
+
+ <h4>Interfaces</h4>
+
+ <p>More in detail, the {@link it.unimi.dsi.law.warc.util.Response} interface (besides the abovementioned getters), prescribes the method {@link it.unimi.dsi.law.warc.util.Response#fromWarcRecord fromWarcRecord} that allows to obtain a response from record. The {@link it.unimi.dsi.law.warc.util.HttpResponse} specialises such interface to deal with responses having <code>content-type</code> of <code>HTTP</code> (or <code>HTTPS</code>) kind.
+
+ <h4>Classes</h4>
+
+ <p>The {@link it.unimi.dsi.law.warc.util.AbstractHttpResponse} provides an abstract implementation of a {@link it.unimi.dsi.law.warc.util.HttpResponse}
+ with all the getters except for {@link it.unimi.dsi.law.warc.util.HttpResponse#contentAsStream() contentAsStream} that concrete subclasses must implement according to the way they represent the response content; the {@link it.unimi.dsi.law.warc.util.Response#fromWarcRecord(WarcRecord record) fromWarcRecord}) method is also left unimplemented much for the same reason. On the other hand, there is a {@link it.unimi.dsi.law.warc.util.AbstractHttpResponse#toWarcRecord toWarcRecord} method that can be used by concrete subclasses to populate a WARC record with the data in the response (for example, in order to subsequently write such record).
+
+ <p>The concrete class {@link it.unimi.dsi.law.warc.util.WarcHttpResponse}, for instance, implements {@link it.unimi.dsi.law.warc.util.HttpResponse#contentAsStream() contentAsStream} and {@link it.unimi.dsi.law.warc.util.Response#fromWarcRecord(WarcRecord record) fromWarcRecord} methods so that the content is read from a WARC file and should indeed be used to read a {@link it.unimi.dsi.law.warc.io.WarcRecord} (of suitable <code>record-type</code>) as an {@link it.unimi.dsi.law.warc.util.HttpResponse}.
+
+ <p>An example of the usage of such class to read a sequence of responses from a WARC file can be found in the source code of {@link it.unimi.dsi.law.warc.io.examples.SequentialHttpResponseRead}.
+
+ <p>Finally, the {@link it.unimi.dsi.law.warc.util.MutableHttpResponse} is a concrete implementation of an {@link it.unimi.dsi.law.warc.util.AbstractHttpResponse} endowed with setters that can be used to populate the response and hence a {@link it.unimi.dsi.law.warc.io.WarcRecord} (via {@link it.unimi.dsi.law.warc.util.AbstractHttpResponse#toWarcRecord(WarcRecord record) toWarcRecord} method) in order to write it.
+
+ <!-- Once BUbiNG will become public, here we should also describe it.unimi.dsi.law.bubing.util.FetchedHttpResponse as an example of AbstractHttpResponse that implements contentAsStream passing the net stream, but does not implement fromWarcRecord for obvious reasons. -->
+
+ <h3 id='dup'>Digest based duplicate detection</h3>
+
+ <p>There are many ways to detect duplicates in a crawl. UbiCrawler adopts a technique based on page content digests. The interface {@link it.unimi.dsi.law.warc.util.DigestBasedDuplicateDetection} is used to represent duplicate information present in WARC files produced by UbiCrawler.
+
+ <!-- Once BUbiNG will become public, here we should describe the digesting process, with a see to it.unimi.dsi.law.warc.parser.Digester. -->
+
+ <p>More precisely, the class {@link it.unimi.dsi.law.warc.util.WarcHttpResponse} implements such interface so that after using it to read a WARC record, one can use the class {@link it.unimi.dsi.law.warc.util.DigestBasedDuplicateDetection#isDuplicate() isDuplicate} method to know whether the response is a duplicate of a previously crawled one, and can use the {@link it.unimi.dsi.law.warc.util.DigestBasedDuplicateDetection#digest() digest} method to get the digest of the response content.
+
+ <!-- Once BUbiNG will become public, here we should also mention it.unimi.dsi.law.bubing.util.FetchedHttpResponse. -->
+
+ <p>Observe that {@link it.unimi.dsi.law.warc.util.AbstractHttpResponse} takes this interface into account and if the concrete class used to invoke
+ {@link it.unimi.dsi.law.warc.util.AbstractHttpResponse#toWarcRecord toWarcRecord} implements such interface, such method will populate the {@link it.unimi.dsi.law.warc.io.WarcRecord} with the duplicate information.
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/BinaryParser.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/BinaryParser.java
new file mode 100644
index 0000000..40d6e31
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/BinaryParser.java
@@ -0,0 +1,78 @@
+package it.unimi.dsi.law.warc.parser;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.Response;
+
+// RELEASE-STATUS: DIST
+
+/** A universal binary parser that just computes digests. */
+
+public class BinaryParser implements Parser {
+ private final MessageDigest messageDigest;
+
+ /** Builds a parser for digesting a page.
+ *
+ * @param messageDigest the digesting algorithm, or {@code null} if no digesting will be performed.
+ */
+ public BinaryParser(final MessageDigest messageDigest) {
+ this.messageDigest = messageDigest;
+ }
+
+ /** Builds a parser for digesting a page.
+ *
+ * @param messageDigestAlgorithm the digesting algorithm (as a string).
+ * @throws NoSuchAlgorithmException
+ */
+ public BinaryParser(final String messageDigestAlgorithm) throws NoSuchAlgorithmException {
+ this(MessageDigest.getInstance(messageDigestAlgorithm));
+ }
+
+ @Override
+ public byte[] parse(final Response response, final LinkReceiver linkReceiver) throws IOException {
+ if (messageDigest == null) return null;
+ final HttpResponse httpResponse = (HttpResponse)response;
+ final byte[] buffer = new byte[1024];
+ InputStream is = httpResponse.contentAsStream();
+ messageDigest.reset();
+ for(int length; (length = is.read(buffer, 0, buffer.length)) > 0;) messageDigest.update(buffer, 0, length);
+ return messageDigest.digest();
+ }
+
+ @Override
+ public boolean apply(Response response) {
+ return true;
+ }
+
+ @Override
+ public Object clone() {
+ return new BinaryParser(messageDigest);
+ }
+
+ @Override
+ public String guessedCharset() {
+ return null;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/Digester.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/Digester.java
new file mode 100644
index 0000000..b602d90
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/Digester.java
@@ -0,0 +1,221 @@
+package it.unimi.dsi.law.warc.parser;
+
+import java.lang.reflect.Field;
+import java.net.URI;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
+import java.util.Map;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.Object2ObjectOpenHashMap;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.warc.util.Util;
+import it.unimi.dsi.parser.Attribute;
+import it.unimi.dsi.parser.BulletParser;
+import it.unimi.dsi.parser.Element;
+import it.unimi.dsi.parser.callback.Callback;
+
+// RELEASE-STATUS: DIST
+
+/** A callback computing the digest of a page.
+ *
+ * <p>The page is somewhat simplified before being passed (as a sequence of bytes obtained
+ * by breaking each character into the upper and lower byte) to {@link MessageDigest#update(byte[])}.
+ * All start/end tags are case-normalized, and all their content (except for the
+ * element-type name) is removed. An exception is made for <code>SRC</code> attribute of
+ * <code>FRAME</code> and <code>IFRAME</code> elements, as they are necessary to
+ * distinguish correctly framed pages without alternative text. The attributes will be resolved
+ * w.r.t. the {@linkplain #url(URI) URL associated to the page}.
+ *
+ * <p>To avoid clashes between digests coming from different sites, you can optionally set a URL
+ * whose authority that will be used to update the digest before adding the actual text page.
+ * You can set the URL with {@link #url(URI)}. A good idea is to use
+ * the host name (or even the authority).
+ */
+
+public class Digester implements Callback {
+ /** The size of the internal buffer. */
+ private final static int BYTE_BUFFER_SIZE = 8 * 1024;
+ /** A char array used to separate the {@link #url} from the content of the page. */
+ private final static char AUTHORITY_DELIMITER[] = "\u0000".toCharArray();
+ /** A char array used to delimit attribute values. */
+ private final static char ATTRIBUTE_VALUE_DELIMITER[] = "\"".toCharArray();
+
+ private static final boolean DEBUG = false;
+
+ /** Cached byte representations of all opening tags. */
+ private static final Object2ObjectOpenHashMap<Element,byte[]> startTag;
+ /** Cached byte representations of all closing tags. */
+ private static final Object2ObjectOpenHashMap<Element,byte[]> endTag;
+
+ /** A resuable message digester. */
+ private final MessageDigest md;
+ /** An internal buffer where bytes are accumulated to avoid excessive calls to {@link MessageDigest#update(byte)}. */
+ private byte byteBuffer[] = new byte[BYTE_BUFFER_SIZE];
+ /** The current number of bytes in {@link #byteBuffer}. */
+ private int fill;
+ /** The URI for the next digest, set by {@link #url(URI)}. */
+ private URI url;
+ /** The digest of the last page we parsed. */
+ private byte[] digest;
+
+ static {
+ startTag = new Object2ObjectOpenHashMap<Element,byte[]>();
+ endTag = new Object2ObjectOpenHashMap<Element,byte[]>();
+
+ // Scan all known element types and fill startTag/endTag
+ for(Field f: Element.class.getFields()) {
+ if (f.getType() == Element.class) {
+ Element element;
+ try {
+ element = (Element)f.get(null);
+ }
+ catch (Exception e) {
+ throw new RuntimeException(e);
+ }
+ startTag.put(element, Util.getASCIIBytes("<" + element + ">"));
+ endTag.put(element, Util.getASCIIBytes("</" + element + ">"));
+ }
+ }
+
+ // Set up defaults for bizarre element types
+ startTag.defaultReturnValue(Util.getASCIIBytes("<unknown>"));
+ endTag.defaultReturnValue(Util.getASCIIBytes("</unknown>"));
+ }
+
+
+ /** Creates a new callback using the given message digest.
+ *
+ * @param algorithm a message digest algorithm (to be passed to {@link MessageDigest#getInstance(java.lang.String)}).
+ */
+
+ public Digester(String algorithm) throws NoSuchAlgorithmException {
+ this.md = MessageDigest.getInstance(algorithm);
+ }
+
+
+ public void configure(BulletParser parser) {
+ parser.parseTags(true);
+ parser.parseAttributes(true);
+ parser.parseText(true);
+ parser.parseAttribute(Attribute.SRC);
+ }
+
+ /** Updates the digest with the given array.
+ *
+ * @param c the array from which bytes should be taken.
+ * @see #update(char[], int, int)
+ */
+ private void update(char c[]) {
+ update(c, 0, c.length);
+ }
+
+ /** Updates the digest with the given array fragment.
+ *
+ * <p>This method uses the {@linkplain #byteBuffer nternal byte buffer}
+ * to avoid calling {@link MessageDigest#update(byte[])} too many times.
+ *
+ * @param c the array from which bytes should be taken.
+ * @param offset the starting offset.
+ * @param length the number of character to be translated.
+ */
+ private void update(char c[], int offset, int length) {
+ for(int i = 0; i < length; i++) {
+ if (fill == BYTE_BUFFER_SIZE) {
+ md.update(byteBuffer);
+ fill = 0;
+ }
+ byteBuffer[fill++] = (byte)(c[offset + i] >> 8);
+ byteBuffer[fill++] = (byte)c[offset + i];
+ }
+ }
+
+
+ /** Returns the digest computed.
+ *
+ * @return the digest computed.
+ *
+ */
+
+ public byte[] digest() {
+ return digest;
+ }
+
+ /** Sets the URI that will be used to tune the next digest.
+ *
+ * @param uri a URI, or {@code null} for no URL.
+ */
+
+ public void url(URI uri) {
+ this.url = uri;
+ }
+
+ public void startDocument() {
+ md.reset();
+ fill = 0;
+
+ if (url != null) {
+ update(url.getAuthority().toCharArray());
+ update(AUTHORITY_DELIMITER);
+ }
+ }
+
+
+ public boolean startElement(Element element, Map<Attribute, MutableString> attributes) {
+ md.update(startTag.get(element)); // Bizarre elements
+
+ if (element == Element.FRAME || element == Element.IFRAME) {
+ final MutableString urlSpec = attributes.get(Attribute.SRC);
+ if (urlSpec != null) {
+ // TODO: Should we resolve?
+ update(ATTRIBUTE_VALUE_DELIMITER);
+ update(urlSpec.array(), 0, urlSpec.length());
+ update(ATTRIBUTE_VALUE_DELIMITER);
+ }
+ }
+ return true;
+ }
+
+
+ public boolean endElement(Element element) {
+ md.update(endTag.get(element));
+ return true;
+ }
+
+
+ public boolean characters(char[] data, int offset, int length, boolean flowBroken) {
+ if (DEBUG) System.err.println(new String(data, offset, length));
+ update(data, offset, length);
+ return true;
+ }
+
+
+ public boolean cdata(Element element, char[] data, int offset, int length) {
+ // TODO: maybe we want to use this
+ return true;
+ }
+
+
+ public void endDocument() {
+ md.update(byteBuffer, 0, fill);
+ fill = 0;
+ digest = md.digest();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/HTMLParser.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/HTMLParser.java
new file mode 100644
index 0000000..2fc1005
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/HTMLParser.java
@@ -0,0 +1,548 @@
+package it.unimi.dsi.law.warc.parser;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.net.URI;
+import java.nio.charset.Charset;
+import java.nio.charset.IllegalCharsetNameException;
+import java.nio.charset.UnsupportedCharsetException;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Set;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.http.HttpHeaders;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.ObjectLinkedOpenHashSet;
+import it.unimi.dsi.fastutil.objects.Reference2ObjectOpenHashMap;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.io.InspectableBufferedInputStream;
+import it.unimi.dsi.law.warc.util.ByteArrayCharSequence;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.Response;
+import it.unimi.dsi.law.warc.util.Util;
+import it.unimi.dsi.util.TextPattern;
+import net.htmlparser.jericho.CharacterReference;
+import net.htmlparser.jericho.EndTag;
+import net.htmlparser.jericho.EndTagType;
+import net.htmlparser.jericho.HTMLElementName;
+import net.htmlparser.jericho.HTMLElements;
+import net.htmlparser.jericho.Segment;
+import net.htmlparser.jericho.StartTag;
+import net.htmlparser.jericho.StartTagType;
+import net.htmlparser.jericho.StreamedSource;
+
+// RELEASE-STATUS: DIST
+
+/** An HTML parser with additional responsibilities (such as guessing the character encoding
+ * and resolving relative URLs).
+ *
+ * <p>An instance of this class contains buffers and classes that makes it possible to
+ * parse quickly a {@link it.unimi.dsi.law.warc.util.HttpResponse}. Instances are heavyweight&mdash;they
+ * should be pooled and shared, since their usage is transitory and CPU-intensive.
+ *
+ */
+
+public class HTMLParser implements Parser {
+ private final static Logger LOGGER = LoggerFactory.getLogger(HTMLParser.class);
+
+ public final static class SetLinkReceiver implements LinkReceiver {
+ private final Set<URI> urls = new ObjectLinkedOpenHashSet<>();
+
+ @Override
+ public void location(URI location) {
+ urls.add(location);
+ }
+
+ @Override
+ public void metaLocation(URI location) {
+ urls.add(location);
+ }
+
+ @Override
+ public void metaRefresh(URI refresh) {
+ urls.add(refresh);
+ }
+
+ @Override
+ public void link(URI link) {
+ urls.add(link);
+ }
+
+ @Override
+ public void init(URI responseUrl) {
+ urls.clear();
+ }
+
+ @Override
+ public Iterator<URI> iterator() {
+ return urls.iterator();
+ }
+ }
+
+ /** A class computing the digest of a page.
+ *
+ * <p>The page is somewhat simplified before being passed (as a sequence of bytes obtained
+ * by breaking each character into the upper and lower byte) to {@link MessageDigest#update(byte[])}.
+ * All start/end tags are case-normalized, and all their content (except for the
+ * element-type name) is removed. An exception is made for <code>SRC</code> attribute of
+ * <code>FRAME</code> and <code>IFRAME</code> elements, as they are necessary to
+ * distinguish correctly framed pages without alternative text. The attributes will be resolved
+ * w.r.t. the {@linkplain #uri(URI) URL associated to the page}.
+ *
+ * <p>To avoid clashes between digests coming from different sites, you can optionally set a URL
+ * whose authority that will be used to update the digest before adding the actual text page.
+ * You can set the URL with {@link #uri(URI)}. A good idea is to use
+ * the host name (or even the authority).
+ */
+
+
+ private final static class DigestAppendable implements Appendable {
+ /** The size of the internal buffer. */
+ private final static int BYTE_BUFFER_SIZE = 1024;
+
+ /** Cached byte representations of all opening tags. The map must be queried using {@linkplain HTMLElementName Jericho names}. */
+ private static final Reference2ObjectOpenHashMap<String, byte[]> startTags;
+
+ /** Cached byte representations of all closing tags. The map must be queried using {@linkplain HTMLElementName Jericho names}. */
+ private static final Reference2ObjectOpenHashMap<String, byte[]> endTags;
+
+ static {
+ final List<String> elementNames = HTMLElements.getElementNames();
+ startTags = new Reference2ObjectOpenHashMap<String, byte[]>(elementNames.size());
+ endTags = new Reference2ObjectOpenHashMap<String, byte[]>(elementNames.size());
+
+ // Set up defaults for bizarre element types
+ startTags.defaultReturnValue(Util.getASCIIBytes("<unknown>"));
+ endTags.defaultReturnValue(Util.getASCIIBytes("</unknown>"));
+
+ // Scan all known element types and fill startTag/endTag
+ for (String name : elementNames) {
+ startTags.put(name, Util.getASCIIBytes("<" + name + ">"));
+ endTags.put(name, Util.getASCIIBytes("</" + name + ">"));
+ }
+ }
+
+ /** An internal buffer where bytes are accumulated to avoid excessive calls to
+ * {@link MessageDigest#update(byte)}. */
+ private final byte byteBuffer[] = new byte[BYTE_BUFFER_SIZE];
+
+ /** The algorithm used to compute the digest. */
+ private final MessageDigest digester;
+
+ /** The current number of bytes in {@link #byteBuffer}. */
+ private int fill;
+
+ /** Create a digest appendable using a given algorithm.
+ *
+ * @param digester a digesting algorithm. */
+ public DigestAppendable(final MessageDigest digester) {
+ this.digester = digester;
+ }
+
+ /** Initializes the digest computation.
+ *
+ * @param url a URL, or {@code null} for no URL.
+ */
+ public void init(final URI url) {
+ digester.reset();
+ fill = 0;
+
+ if (url != null) {
+ append(url.getAuthority());
+ append('\0');
+ }
+ }
+
+ @Override
+ public Appendable append(CharSequence csq, int start, int end) {
+ for (int i = start; i < end; i++) {
+ final char c = csq.charAt(i);
+ if (fill >= BYTE_BUFFER_SIZE - 1) {
+ digester.update(byteBuffer, 0, fill);
+ fill = 0;
+ }
+ byteBuffer[fill++] = (byte)(c >> 8);
+ byteBuffer[fill++] = (byte)c;
+ }
+ return this;
+ }
+
+ @Override
+ public Appendable append(char c) {
+ if (fill >= BYTE_BUFFER_SIZE - 1) {
+ digester.update(byteBuffer, 0, fill);
+ fill = 0;
+ }
+ byteBuffer[fill++] = (byte)(c >> 8);
+ byteBuffer[fill++] = (byte)c;
+ return this;
+ }
+
+ @Override
+ public Appendable append(CharSequence csq) {
+ return append(csq, 0, csq.length());
+ }
+
+ public byte[] digest() {
+ digester.update(byteBuffer, 0, fill);
+ fill = 0;
+ return digester.digest();
+ }
+
+ private void update(byte[] a) {
+ for (byte b : a) {
+ if (fill == BYTE_BUFFER_SIZE) {
+ digester.update(byteBuffer);
+ fill = 0;
+ }
+ byteBuffer[fill++] = b;
+ }
+ }
+
+ public void startTag(final StartTag startTag) {
+ final String name = startTag.getName();
+ update(startTags.get(name));
+
+ // IFRAME or FRAME + SRC
+ if (name == HTMLElementName.IFRAME || name == HTMLElementName.FRAME) {
+ String s = startTag.getAttributeValue("src");
+ if (s != null) {
+ append('\"');
+ append(s);
+ append('\"');
+ }
+ }
+ }
+
+ public void endTag(final EndTag endTag) {
+ update(endTags.get(endTag.getName()));
+ }
+ }
+
+ /** The pattern prefixing the URL in a <code>META </code> <code>HTTP-EQUIV </code> element of refresh type. */
+ private static final TextPattern URLEQUAL_PATTERN = new TextPattern("URL=", TextPattern.CASE_INSENSITIVE);
+ /** The size of the internal Jericho buffer. */
+ public static final int CHAR_BUFFER_SIZE = 65536;
+
+ /** The character buffer. It is set up at construction time, but it can be changed later. */
+ public final char[] buffer;
+ /** The charset we guessed for the last response. */
+ private String guessedCharset;
+ /** The digesting algorithm used, or {@code null} if no digesting is to be performed. */
+ private MessageDigest messageDigest;
+ /** An object emboding the digest logic, or {@code null} for no digest computation. */
+ private final DigestAppendable digestAppendable;
+ /** The location URL from headers of the last response, if any, or {@code null}. */
+ private URI location;
+ /** The location URL from <code>META</code> elements of the last response, if any, or {@code null}. */
+ private URI metaLocation;
+
+ /**
+ * Builds a parser for link extraction and, possibly, digesting a page.
+ *
+ * @param messageDigest the digesting algorithm, or {@code null} if digesting will be performed.
+ */
+ public HTMLParser(final MessageDigest messageDigest) {
+ buffer = new char[CHAR_BUFFER_SIZE];
+ this.messageDigest = messageDigest;
+ digestAppendable = messageDigest == null ? null : new DigestAppendable(messageDigest);
+ }
+
+
+ /**
+ * Builds a parser for link extraction and, possibly, digesting a page.
+ *
+ * @param messageDigest the digesting algorithm (as a string).
+ * @throws NoSuchAlgorithmException
+ */
+ public HTMLParser(final String messageDigest) throws NoSuchAlgorithmException {
+ this(MessageDigest.getInstance(messageDigest));
+ }
+
+
+ /**
+ * Builds a parser for link extraction.
+ */
+ public HTMLParser() {
+ this((MessageDigest)null);
+ }
+
+
+ private void process(final LinkReceiver linkReceiver, final URI base, final String s) {
+ if (s == null) return;
+ URI url = BURL.parse(s);
+ if (url == null) return;
+ linkReceiver.link(base.resolve(url));
+ }
+
+ @Override
+ public byte[] parse(final Response response, final LinkReceiver linkReceiver) throws IOException {
+ final URI responseUrl = response.uri();
+ final HttpResponse httpResponse = (HttpResponse)response;
+
+ guessedCharset = "ISO-8859-1";
+
+ // Try to guess using headers
+ final String contentTypeHeader = httpResponse.headers().get(HttpHeaders.CONTENT_TYPE);
+ if (contentTypeHeader != null) {
+ final String headerCharset = getCharsetNameFromHeader(contentTypeHeader);
+ if (headerCharset != null) guessedCharset = headerCharset;
+ }
+
+ final InputStream contentStream = httpResponse.contentAsStream();
+ if (contentStream instanceof InspectableBufferedInputStream) {
+ final InspectableBufferedInputStream inspectableStream = (InspectableBufferedInputStream)contentStream;
+ final String metaCharset = getCharsetName(inspectableStream.buffer, inspectableStream.inspectable);
+ if (metaCharset != null) guessedCharset = metaCharset;
+ }
+
+ LOGGER.debug("Guessing charset " + guessedCharset + " for URL " + responseUrl);
+
+ Charset charset = Charsets.ISO_8859_1; // Fallback
+ try {
+ charset = Charset.forName(guessedCharset);
+ }
+ catch(IllegalCharsetNameException e) {
+ LOGGER.warn("Response for " + responseUrl + " contained an illegal charset name: " + guessedCharset);
+ }
+ catch(UnsupportedCharsetException e) {
+ LOGGER.warn("Response for " + responseUrl + " contained an unsupported charset: " + guessedCharset);
+ }
+
+ linkReceiver.init(responseUrl);
+
+ // Get location if present
+ location = null;
+ metaLocation = null;
+
+ if (httpResponse.headers().get(HttpHeaders.LOCATION) != null) {
+ final URI location = BURL.parse(httpResponse.headers().get(HttpHeaders.LOCATION));
+ if (location != null) {
+ // This shouldn't happen by standard, but people unfortunately does it.
+ if (! location.isAbsolute()) LOGGER.warn("Found relative header location URL: \"" + location + "\"");
+ linkReceiver.location(this.location = responseUrl.resolve(location));
+ }
+ }
+
+ @SuppressWarnings("resource")
+ final StreamedSource streamedSource = new StreamedSource(new InputStreamReader(contentStream, charset));
+ streamedSource.setBuffer(buffer);
+ if (digestAppendable != null) digestAppendable.init(responseUrl);
+ URI base = responseUrl;
+
+ int lastSegmentEnd = 0;
+ int inSpecialText = 0;
+ for (Segment segment : streamedSource) {
+ if (segment.getEnd() > lastSegmentEnd) {
+ lastSegmentEnd = segment.getEnd();
+ if (segment instanceof StartTag) {
+ final StartTag startTag = (StartTag)segment;
+ if (startTag.getTagType() != StartTagType.NORMAL) continue;
+ final String name = startTag.getName();
+ if (name == HTMLElementName.STYLE || name == HTMLElementName.SCRIPT) inSpecialText++;
+
+ if (digestAppendable != null) digestAppendable.startTag(startTag);
+ if (linkReceiver == null) continue; // No link receiver, nothing to do.
+
+ // IFRAME or FRAME + SRC
+ if (name == HTMLElementName.IFRAME || name == HTMLElementName.FRAME || name == HTMLElementName.EMBED) process(linkReceiver, base, startTag.getAttributeValue("src"));
+ else if (name == HTMLElementName.IMG || name == HTMLElementName.SCRIPT) process(linkReceiver, base, startTag.getAttributeValue("src"));
+ else if (name == HTMLElementName.OBJECT) process(linkReceiver, base, startTag.getAttributeValue("data"));
+ else if (name == HTMLElementName.A || name == HTMLElementName.AREA || name == HTMLElementName.LINK) process(linkReceiver, base, startTag.getAttributeValue("href"));
+ else if (name == HTMLElementName.BASE) {
+ String s = startTag.getAttributeValue("href");
+ if (s != null) {
+ final URI uri = BURL.parse(s);
+ if (uri != null) {
+ if (uri.isAbsolute()) base = uri;
+ else LOGGER.warn("Found relative BASE URL: \""+ uri + "\"");
+ }
+ }
+ }
+
+ // META REFRESH/LOCATION
+ else if (name == HTMLElementName.META) {
+ final String equiv = startTag.getAttributeValue("http-equiv");
+ final String content = startTag.getAttributeValue("content");
+ if (equiv != null && content != null) {
+ equiv.toLowerCase();
+
+ // http-equiv="refresh" content="0;URL=http://foo.bar/..."
+ if (equiv.equals("refresh")) {
+
+ final int pos = URLEQUAL_PATTERN.search(content);
+ if (pos != -1) {
+ final String urlPattern = content.substring(pos + URLEQUAL_PATTERN.length());
+ final URI refresh = BURL.parse(urlPattern);
+ if (refresh != null) {
+ // This shouldn't happen by standard, but people unfortunately does it.
+ if (! refresh.isAbsolute()) LOGGER.warn("Found relative META refresh URL: \"" + urlPattern + "\"");
+ linkReceiver.metaRefresh(base.resolve(refresh));
+ }
+ }
+ }
+
+ // http-equiv="location" content="http://foo.bar/..."
+ if (equiv.equals("location")) {
+ final URI metaLocation = BURL.parse(content);
+ if (metaLocation != null) {
+ // This shouldn't happen by standard, but people unfortunately does it.
+ if (! metaLocation.isAbsolute()) LOGGER.warn("Found relative META location URL: \"" + content + "\"");
+ linkReceiver.metaLocation(this.metaLocation = base.resolve(metaLocation));
+ }
+ }
+ }
+ }
+ }
+ else if (segment instanceof EndTag) {
+ final EndTag endTag = (EndTag)segment;
+ final String name = endTag.getName();
+ if (name == HTMLElementName.STYLE || name == HTMLElementName.SCRIPT) inSpecialText--;
+
+ if (digestAppendable != null) {
+ if (endTag.getTagType() != EndTagType.NORMAL) continue;
+ digestAppendable.endTag(endTag);
+ }
+ }
+ else if (digestAppendable != null && inSpecialText == 0) {
+ if (segment instanceof CharacterReference) ((CharacterReference)segment).appendCharTo(digestAppendable);
+ else digestAppendable.append(segment);
+ }
+ }
+ }
+
+ return digestAppendable != null ? digestAppendable.digest() : null;
+ }
+
+ public String guessedCharset() {
+ return guessedCharset;
+ }
+
+ /** Returns the BURL location header, if present; if it is not present, but the page contains a valid metalocation, the latter
+ * is returned. Otherwise, {@code null} is returned.
+ *
+ * @return the location (or metalocation), if present; {@code null} otherwise.
+ */
+ public URI location() {
+ //TODO: see if we must derelativize
+ if (location != null) return location;
+ else if (metaLocation != null) return metaLocation;
+ else return null;
+ }
+
+ /** Used by {@link #getCharsetName(byte[], int)}. */
+ private static final TextPattern META_PATTERN = new TextPattern("<meta", TextPattern.CASE_INSENSITIVE);
+ /** Used by {@link #getCharsetName(byte[], int)}. */
+ private static final Pattern HTTP_EQUIV_PATTERN = Pattern.compile(".*http-equiv\\s*=\\s*('|\")?content-type('|\")?.*", Pattern.CASE_INSENSITIVE);
+ /** Used by {@link #getCharsetName(byte[], int)}. */
+ private static final Pattern CONTENT_PATTERN = Pattern.compile(".*content\\s*=\\s*('|\")([^'\"]*)('|\").*", Pattern.CASE_INSENSITIVE);
+ /** Used by {@link #getCharsetName(byte[], int)}. */
+ private static final Pattern CHARSET_PATTERN = Pattern.compile (".*charset\\s*=\\s*(([\\041-\\0176&&[^<>\\{\\}\\\\/:,;@?=]])+|\"[^\"]*\").*", Pattern.CASE_INSENSITIVE);
+
+ /** Returns the charset name as indicated by a <code>META</code>
+ * <code>HTTP-EQUIV</code> element, if
+ * present, interpreting the provided byte array as a sequence of
+ * ISO-8859-1-encoded characters. Only the first such occurrence is considered (even if
+ * it might not correspond to a valid or available charset).
+ *
+ * <p><strong>Beware</strong>: it might not work if the
+ * <em>value</em> of some attribute in a <code>meta</code> tag
+ * contains a string matching (case insensitively) the r.e.
+ * <code>http-equiv\s*=\s*('|")content-type('|")</code>, or
+ * <code>content\s*=\s*('|")[^"']*('|")</code>.
+ *
+ * @param buffer a buffer containing raw bytes that will be interpreted as ISO-8859-1 characters.
+ * @param length the number of significant bytes in the buffer.
+ * @return the charset name, or {@code null} if no
+ * charset is specified; note that the charset might be not valid or not available.
+ */
+
+ public static String getCharsetName(final byte buffer[], final int length) {
+ int start = 0;
+ while((start = META_PATTERN.search(buffer, start, length)) != -1) {
+
+ /* Look for attribute http-equiv with value content-type,
+ * if present, look for attribute content and, if present,
+ * return its value. */
+
+ int end = start;
+ while(end < length && buffer[end] != '>') end++; // Look for closing '>'
+ if (end == length) return null; // No closing '>'
+
+ final ByteArrayCharSequence tagContent = new ByteArrayCharSequence(buffer, start + META_PATTERN.length(), end - start - META_PATTERN.length());
+ if (HTTP_EQUIV_PATTERN.matcher(tagContent).matches()) {
+ final Matcher m = CONTENT_PATTERN.matcher(tagContent);
+ if (m.matches()) return getCharsetNameFromHeader(m.group(2)); // got it!
+ }
+
+ start = end + 1;
+ }
+
+ return null; // no '<meta' found
+ }
+
+ /** Extracts the charset name from the header value of a <code>content-type</code>
+ * header.
+ *
+ * TODO: explain better
+ * <strong>Warning</strong>: it might not work if someone puts the string <code>charset=</code>
+ * in a string inside some attribute/value pair.
+ *
+ * @param headerValue The value of a <code>content-type</code> header.
+ * @return the charset name, or {@code null} if no
+ * charset is specified; note that the charset might be not valid or not available.
+ */
+ public static String getCharsetNameFromHeader(final String headerValue) {
+ final Matcher m = CHARSET_PATTERN.matcher(headerValue);
+ if (m.matches()) {
+ final String s = m.group(1);
+ int start = 0, end = s.length();
+ // TODO: we discard delimiting single/double quotes; is it necessary?
+ if (end > 0 && (s.charAt(0) == '\"' || s.charAt(0) == '\'')) start = 1;
+ if (end > 0 && (s.charAt(end - 1) == '\"' || s.charAt(end - 1) == '\'')) end--;
+ if (start < end) return s.substring(start, end);
+ }
+ return null;
+ }
+
+ @Override
+ public boolean apply(final Response response) {
+ if (! (response instanceof HttpResponse)) return false;
+ HttpResponse httpResponse = (HttpResponse)response;
+ final String contentType = httpResponse.headers().get(HttpHeaders.CONTENT_TYPE);
+ return contentType != null && contentType.startsWith("text/");
+ }
+
+ @Override
+ public Object clone() {
+ return new HTMLParser(messageDigest);
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/Parser.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/Parser.java
new file mode 100644
index 0000000..b2e52da
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/Parser.java
@@ -0,0 +1,98 @@
+package it.unimi.dsi.law.warc.parser;
+
+import java.io.IOException;
+import java.net.URI;
+import java.util.Iterator;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.ObjectSets;
+import it.unimi.dsi.law.warc.filters.Filter;
+import it.unimi.dsi.law.warc.util.Response;
+
+// RELEASE-STATUS: DIST
+
+/** A generic parser for {@link Response responses}. It provides link extraction through a
+ * {@link LinkReceiver} callback and optional digesting.
+ */
+
+public interface Parser extends Filter<Response>, Cloneable {
+ /** A class that can receive URLs discovered during parsing. It may be used to
+ * iterate over the URLs found in the current page, but what will be actually
+ * returned by the iterator is implementation-dependent. */
+ public static interface LinkReceiver extends Iterable<URI> {
+ /** Handles the location defined by headers.
+ *
+ * @param location the location defined by headers.
+ */
+ public void location(URI location);
+ /** Handles the location defined by a <code>META</code> element.
+ *
+ * @param location the location defined by the <code>META</code> element.
+ */
+ public void metaLocation(URI location);
+ /** Handles the refresh defined by a <code>META</code> element.
+ *
+ * @param refresh the URL defined by the <code>META</code> element.
+ */
+ public void metaRefresh(URI refresh);
+ /** Handles a link.
+ *
+ * @param uri a link discovered during the parsing phase.
+ */
+ public void link(URI uri);
+ /** Initializes this receiver for a new page.
+ *
+ * @param responseUrl the URL of the page to be parsed.
+ */
+ public void init(URI responseUrl);
+ }
+
+ /** A no-op implementation of {@link LinkReceiver}. */
+ public final static LinkReceiver NULL_LINK_RECEIVER = new LinkReceiver() {
+ @Override
+ public void location(URI location) {}
+ @Override
+ public void metaLocation(URI location) {}
+ @Override
+ public void metaRefresh(URI refresh) {}
+ @Override
+ public void link(URI link) {}
+ @Override
+ public void init(URI responseUrl) {}
+ @SuppressWarnings("unchecked")
+ @Override
+ public Iterator<URI> iterator() { return ObjectSets.EMPTY_SET.iterator(); }
+ };
+
+ /** Parses a response.
+ *
+ * @param response a response to parse.
+ * @param linkReceiver a link receiver.
+ * @return a byte digest for the page, or {@code null} if no digest has been computed.
+ */
+ public byte[] parse(Response response, LinkReceiver linkReceiver) throws IOException;
+
+ /** Returns a guessed charset for the document, or {@code null} if the charset could
+ * not be guessed.
+ *
+ * @return a charset or {@code null}.
+ */
+ public String guessedCharset();
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/package.html
new file mode 100644
index 0000000..02d5467
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/parser/package.html
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>LAW software</title>
+ </head>
+
+ <body>
+
+ <p>Extensions of the {@link it.unimi.dsi.parser.BulletParser}.
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/CompressWarc.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/CompressWarc.java
new file mode 100644
index 0000000..1eefc09
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/CompressWarc.java
@@ -0,0 +1,108 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.stat.SummaryStats;
+
+// RELEASE-STATUS: DIST
+
+/** A tool to compress a WARC file. */
+
+public class CompressWarc {
+ private final static Logger LOGGER = LoggerFactory.getLogger(CompressWarc.class);
+
+ /**
+ * This method reads from a given input stream a sequence of uncompressed
+ * WARC records and writes to a given output stream a compressed version of
+ * them.
+ *
+ * @param in the input stream.
+ * @param out the output stream.
+ * @throws IOException
+ * @throws FormatException
+ */
+ public static void run(final FastBufferedInputStream in, final OutputStream out) throws IOException, FormatException {
+ final WarcRecord inRecord = new WarcRecord();
+ final GZWarcRecord outRecord = new GZWarcRecord();
+
+ final SummaryStats compressionRatio = new SummaryStats();
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, "records");
+ pl.logInterval = ProgressLogger.TEN_SECONDS;
+ pl.info = new Object() {
+ @Override
+ public String toString() {
+ final long size = compressionRatio.size64();
+ return "compression ratio: " + (size != 0 ? (int)(100 * compressionRatio.sum() / size) + "%" : "NA");
+ }
+ };
+
+ pl.start("Compressing...");
+ while (inRecord.read(in) != -1) {
+ outRecord.copy(inRecord);
+ outRecord.write(out);
+ compressionRatio.add((double)outRecord.gzheader.compressedSkipLength / inRecord.header.dataLength);
+ pl.update();
+ }
+ pl.done();
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(CompressWarc.class.getName(), "Compress a warc file.",
+ new Parameter[] {
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, "-", JSAP.REQUIRED, JSAP.NOT_GREEDY, "The Warc file basename (if not present, or -, stdin/stdout will be used).")
+ });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) return;
+
+ final String warcFile = jsapResult.getString("warcFile");
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File( warcFile + ".warc")), IO_BUFFER_SIZE);
+ final FastBufferedOutputStream out = new FastBufferedOutputStream(warcFile.equals("-") ? System.out : new FileOutputStream(new File(warcFile + ".warc.gz")), IO_BUFFER_SIZE);
+
+ run(in, out);
+
+ in.close();
+ out.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/CutWarc.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/CutWarc.java
new file mode 100644
index 0000000..f03f88c
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/CutWarc.java
@@ -0,0 +1,150 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.RandomAccessFile;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.io.FileLinesCollection;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.StringMap;
+
+// RELEASE-STATUS: DIST
+
+/** A class to extract specific records from a WARC file. */
+
+public class CutWarc {
+ private final static Logger LOGGER = LoggerFactory.getLogger(CutWarc.class);
+
+ public static void run(final FastBufferedInputStream warc, final RandomAccessFile idx, final boolean isGZippedInput, final boolean isGZippedOutput, long[] record, int recordCount, final OutputStream out) throws IOException, FormatException {
+ final WarcRecord inRecord = isGZippedInput ? new GZWarcRecord() : new WarcRecord();
+ final WarcRecord outRecord = isGZippedOutput ? new GZWarcRecord() : new WarcRecord();
+
+ final ProgressLogger logger = new ProgressLogger(LOGGER, "documents");
+
+ logger.start("Cutting documents");
+ for (int i = 0; i < recordCount; i++)
+ {
+ idx.seek(record[i] * 8);
+ final long pos = idx.readLong();
+ warc.position(pos);
+ inRecord.resetRead();
+ inRecord.read(warc);
+ outRecord.copy(inRecord);
+ outRecord.write(out);
+ logger.lightUpdate();
+ }
+
+ logger.stop();
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ @SuppressWarnings("unchecked")
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(CutWarc.class.getName(), "Cuts (that is, extracts record) from a warc file. It requires an index.",
+ new Parameter[] {
+ new Switch("gzip", 'z', "gzip", "Tells if the input warc is compressed."),
+ new Switch("outzip", 'Z', "outzip", "Tells if the output warc must be compressed."),
+ new Switch("permissive", 'p', "permissive", "Ignore unknown urls instead of throwing an exception"),
+ new FlaggedOption("recordFile", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'r', "recordFile", "A file containing, one per line, the ordinal numbers or URL of records to be output."),
+ new FlaggedOption("urlMap", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'm', "url-map", "The term map from URL to record number."),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The Warc file basename."),
+ new UnflaggedOption("recordSpec", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.GREEDY , "The spec (ordinal number or URL) of records to be output."),
+ });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ CharSequence[] recordSpec = null;
+ if (!jsapResult.userSpecified("recordFile") && !jsapResult.userSpecified("recordSpec"))
+ throw new IllegalArgumentException("One of the two options recordFile and recordSpec must be set.");
+ if (jsapResult.userSpecified("recordSpec") && jsapResult.userSpecified("recordFile"))
+ throw new IllegalArgumentException("You cannot specify both recordFile and recordSpec options");
+
+ if (jsapResult.userSpecified("recordSpec"))
+ recordSpec = jsapResult.getStringArray("recordSpec");
+ else
+ // TODO Variable charset spec
+ recordSpec = new FileLinesCollection (jsapResult.getString ("recordFile"), "UTF-8").allLines().toArray (new CharSequence[0]) ;
+
+ final String warcFile = jsapResult.getString("warcFile");
+ final boolean isGZippedInput = jsapResult.getBoolean("gzip");
+ final boolean isGZippedOutput = jsapResult.getBoolean ("outzip");
+ final boolean bePermissive = jsapResult.getBoolean ("permissive");
+
+ final long[] record = new long[recordSpec.length];
+ final StringMap<? extends CharSequence> map = jsapResult.getString("urlMap") == null? null : (StringMap<? extends CharSequence>)BinIO.loadObject(jsapResult.getString("urlMap"));
+
+ int recordCount = 0;
+
+ for (int i = 0; i < recordSpec.length;i ++) {
+ try {
+ record[recordCount] = Long.parseLong(recordSpec[i].toString());
+ if (record [recordCount] >= 0) recordCount++; // Skip non-existing urls.
+
+ } catch (NumberFormatException e) {
+ if (map == null) throw new RuntimeException("URLs cannot be specified if a map is not provided");
+ record[recordCount] = map.getLong(recordSpec[i]);
+ if (record[recordCount] < 0)
+ {
+ if (!bePermissive)
+ throw new RuntimeException("URL " + recordSpec[i] + " cannot be resolved");
+ } else
+ recordCount++;
+ }
+ }
+
+
+ // Now sort records for better file access .
+ LongArrays.quickSort(record, 0, recordCount);
+
+
+ final FastBufferedInputStream warc = new FastBufferedInputStream(new FileInputStream(new File(warcFile + ".warc" + (isGZippedInput ? ".gz" : ""))), IO_BUFFER_SIZE);
+ final RandomAccessFile idx = new RandomAccessFile(new File(warcFile + ".warc" + (isGZippedInput ? ".gz" : "") + ".idx"), "r");
+ final FastBufferedOutputStream out = new FastBufferedOutputStream(System.out, IO_BUFFER_SIZE);
+
+ run(warc, idx, isGZippedInput, isGZippedOutput, record, recordCount, out);
+
+ warc.close();
+ idx.close();
+ out.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ExtractDigestUrls.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ExtractDigestUrls.java
new file mode 100644
index 0000000..e334fe5
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ExtractDigestUrls.java
@@ -0,0 +1,113 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.io.PrintWriter;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.law.warc.filters.Filter;
+import it.unimi.dsi.law.warc.filters.parser.FilterParser;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.HttpResponseFilteredIterator;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.Util;
+import it.unimi.dsi.law.warc.util.WarcHttpResponse;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A tool to extract digests and URLs from response records of a WARC file. */
+
+public class ExtractDigestUrls {
+ private final static Logger LOGGER = LoggerFactory.getLogger(ExtractDigestUrls.class);
+
+ public static void run(final FastBufferedInputStream in, final boolean isGZipped, final Filter<HttpResponse> filter, final PrintWriter pw) throws IOException {
+ final WarcRecord record = isGZipped ? new GZWarcRecord() : new WarcRecord();
+ final ProgressLogger pl = new ProgressLogger(LOGGER, 1, TimeUnit.MINUTES, "records");
+ final WarcHttpResponse response = new WarcHttpResponse();
+ final HttpResponseFilteredIterator it = new HttpResponseFilteredIterator(in, record, response, filter);
+
+ pl.start("Listing...");
+ long pos = -1;
+ WarcRecord.Header header = null;
+ try{
+ while (it.hasNext()) {
+ // ALERT: meaningless position detection?
+ pos = in.position();
+ it.next();
+ header = record.header;
+ String digest = response.digest() != null ? Util.toHexString(response.digest()) : header.recordId.toString();
+ pw.println(digest + "\t" + header.subjectUri + "\t" + (response.isDuplicate() ? "D" : "N"));
+ pl.update();
+ }
+ } catch (RuntimeException e) {
+ System.err.println("Got " + e);
+ System.err.println("Position: " + pos + ", last url header:\n" + header);
+ throw e;
+ }
+ pl.done();
+ }
+
+ public static final String DEFAULT_BUFFER_SIZE = "64Ki";
+
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(ExtractDigestUrls.class.getName(), "Extracts digests and URLs from http response records of a WARC file.",
+ new Parameter[] {
+ new Switch("gzip", 'z', "gzip", "Whether the Warc file is compressed."),
+ new FlaggedOption("filter", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "filter", "The filter."),
+ new FlaggedOption("bufferSize", JSAP.INTSIZE_PARSER, DEFAULT_BUFFER_SIZE, JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of an I/O buffer."),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, "-", JSAP.REQUIRED, JSAP.NOT_GREEDY, "The Warc input file basename (if not present, or -, stdin will be used)."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean isGZipped = jsapResult.getBoolean("gzip");
+ final String filterSting = jsapResult.getString("filter") == null ? "TRUE" : jsapResult.getString("filter");
+ final int bufferSize = jsapResult.getInt("bufferSize");
+ final String warcFile = jsapResult.getString("warcFile");
+
+ final Filter<HttpResponse> filter = new FilterParser<HttpResponse>(HttpResponse.class).parse(filterSting);
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), bufferSize);
+ final PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FastBufferedOutputStream(System.out, bufferSize), "ASCII"));
+
+ run(in, isGZipped, filter, pw);
+
+ in.close();
+ pw.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ExtractLinks.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ExtractLinks.java
new file mode 100644
index 0000000..1bf5ff9
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ExtractLinks.java
@@ -0,0 +1,195 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.io.PrintWriter;
+import java.net.URI;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.util.StringMap;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.law.warc.filters.Filter;
+import it.unimi.dsi.law.warc.filters.parser.FilterParser;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.HttpResponseFilteredIterator;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.parser.HTMLParser;
+import it.unimi.dsi.law.warc.parser.HTMLParser.SetLinkReceiver;
+import it.unimi.dsi.law.warc.util.HttpResponse;
+import it.unimi.dsi.law.warc.util.WarcHttpResponse;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** Extracts links from a WARC file.
+ *
+ * <p>This class scans a WARC file, parsing pages and extracting links. Links are resolved using
+ * a given {@link StringMap}. The resulting successors lists are given one
+ * per line, with the index of the source node followed by the outdegree, followed by successor indices.
+ *
+ * <p>Optionally, it is possible to specify a secondary {@link StringMap} for duplicates. It
+ * must return, for each duplicate, the number of the corresponding archetype (the page that it is equal to). This
+ * map is usually a {@link it.unimi.dsi.law.warc.util.RemappedStringMap}.
+ */
+
+public class ExtractLinks {
+ private final static Logger LOGGER = LoggerFactory.getLogger(ExtractLinks.class);
+
+ final public static String DEFAULT_BUFFER_SIZE = "64Ki";
+
+ /** Extracts links from a WARC file.
+ *
+ * @param in the WARC file as an input stream.
+ * @param isGZipped whether <code>in</code> is compressed.
+ * @param filter the filter.
+ * @param pw a print writer there the links will be printed in ASCII format (node number followed by successors).
+ * @param urls the term map for URLs.
+ * @param duplicates the term map for duplicate URLs.
+ */
+
+ public static void run(final FastBufferedInputStream in, final boolean isGZipped, final Filter<HttpResponse> filter, final PrintWriter pw, final StringMap<? extends CharSequence> urls, final StringMap<? extends CharSequence> duplicates) throws IOException {
+ final WarcRecord record = isGZipped ? new GZWarcRecord() : new WarcRecord();
+ final WarcHttpResponse response = new WarcHttpResponse();
+ final HttpResponseFilteredIterator it = new HttpResponseFilteredIterator(in, record, response, filter);
+ // TODO: check this size
+ final HTMLParser parser = new HTMLParser();
+ final SetLinkReceiver setLinkReceiver = new SetLinkReceiver();
+ final IntOpenHashSet successors = new IntOpenHashSet();
+ int[] successor = IntArrays.EMPTY_ARRAY;
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, 1, TimeUnit.MINUTES, "pages");
+
+ pl.start("Extracting...");
+ int k, d;
+
+ while (it.hasNext()) {
+ it.next();
+
+ if (urls != null) {
+ k = (int)urls.getLong(response.uri().toString());
+ if (response.isDuplicate()) {
+ if (k >= 0) {
+ LOGGER.error("URL " + response.uri() + " is contained in the URL map but it is a duplicate");
+ pw.println(k);
+ pl.update();
+ }
+ continue;
+ }
+
+ if (k == -1) {
+ LOGGER.error("URL " + response.uri() + " is not contained in the URL map; this may happen if the original digest/URL file was sorted unstably or if there are several non-duplicate pages with the same digest");
+ continue;
+ }
+ pw.print(k);
+ pw.print('\t');
+ parser.parse(response, setLinkReceiver);
+ successors.clear();
+
+ for (URI url : setLinkReceiver) {
+ if ((k = (int)urls.getLong(url.toString())) != -1) {
+ LOGGER.debug("Adding successor " + url + ":" + k);
+ successors.add(k);
+ }
+ else if (duplicates != null && (k = (int)duplicates.getLong(url.toString())) != -1) {
+ LOGGER.debug("Adding duplicate " + url + ":" + k);
+ successors.add(k);
+ }
+ }
+
+ d = successors.size(); // Outdegree
+ successors.toArray(successor = IntArrays.grow(successor, d, 0));
+ IntArrays.quickSort(successor, 0, d);
+
+ for(int i = 0; i < d; i++) {
+ pw.print(successor[i]);
+ pw.print('\t');
+ }
+ }
+ else {
+ pw.print(response.uri());
+ parser.parse(response, setLinkReceiver);
+ for (URI url : setLinkReceiver) {
+ pw.print('\t');
+ pw.print(url);
+ }
+ }
+
+ pw.println();
+ pl.update();
+ }
+ pl.done();
+ }
+ @SuppressWarnings("unchecked")
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(ExtractLinks.class.getName(), "Extract links in pages from a WARC file.",
+ new Parameter[] {
+ new FlaggedOption("bufferSize", JSAP.INTSIZE_PARSER, DEFAULT_BUFFER_SIZE, JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of an I/O buffer."),
+ new Switch("gzip", 'z', "gzip", "Tells if the warc is compressed."),
+ new FlaggedOption("filter", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "filter", "The filter."),
+ new FlaggedOption("start", JSAP.LONG_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "start", "The starting offset (in bytes) in the WARC file (mainly for debugging purposes)."),
+ new FlaggedOption("duplicates", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'd', "duplicates", "The (remapped) term map for duplicate URLs. If not present, only links pointing to URLs in <urls> will be used."),
+ new FlaggedOption("urls", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'u', "The term map for the node URLs."),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, "-", JSAP.REQUIRED, JSAP.NOT_GREEDY, "The WARC file basename (if not present, or -, stdin will be used)."),
+ });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean isGZipped = jsapResult.getBoolean("gzip");
+ final String filterSting = jsapResult.getString("filter") == null ? "TRUE" : jsapResult.getString("filter");
+ final String warcFile = jsapResult.getString("warcFile");
+ final int bufferSize = jsapResult.getInt("bufferSize");
+
+ final Filter<HttpResponse> filter = new FilterParser<HttpResponse>(HttpResponse.class).parse(filterSting);
+
+ final StringMap<? extends CharSequence> urls = (StringMap<? extends CharSequence>)(jsapResult.userSpecified("urls") ? BinIO.loadObject(jsapResult.getString("urls")) : null);
+ final StringMap<? extends CharSequence> duplicates = (StringMap<? extends CharSequence>)(jsapResult.userSpecified("duplicates") ? BinIO.loadObject(jsapResult.getString("duplicates")) : null);
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), bufferSize);
+ if (jsapResult.userSpecified("start")) in.skip(jsapResult.getLong("start"));
+ final PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FastBufferedOutputStream(System.out, bufferSize), "ASCII"));
+
+ try {
+ run(in, isGZipped, filter, pw, urls, duplicates);
+ }
+ finally {
+ in.close();
+ pw.close();
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/GZWarcStats.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/GZWarcStats.java
new file mode 100644
index 0000000..49a97d8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/GZWarcStats.java
@@ -0,0 +1,137 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.GZWarcRecord.GZHeader;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.stat.SummaryStats;
+
+// RELEASE-STATUS: DIST
+
+/** A tool to compute some statistics about a gzipped WARC file. */
+
+public class GZWarcStats {
+ private final static Logger LOGGER = LoggerFactory.getLogger(GZWarcStats.class);
+
+ public static long run(final FastBufferedInputStream in, final SummaryStats uncompressedSize, final SummaryStats compressedSize, final SummaryStats compressionRatio) throws IOException, FormatException {
+ final GZWarcRecord r = new GZWarcRecord();
+ final ProgressLogger pl = new ProgressLogger(LOGGER, "records");
+ pl.logInterval = ProgressLogger.TEN_SECONDS;
+ pl.start("Analyzing...");
+ while (r.read(in) != -1) {
+ final GZHeader gzheader = r.gzheader;
+ compressedSize.add(gzheader.compressedSkipLength);
+ uncompressedSize.add(gzheader.uncompressedSkipLength);
+ compressionRatio.add((int)(100 * (double)gzheader.compressedSkipLength / gzheader.uncompressedSkipLength));
+ pl.update();
+ }
+ pl.done();
+ return pl.count;
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(GZWarcStats.class.getName(), "Compute some statistics about a gzipped warc file.",
+ new Parameter[] {
+ new Switch("html", 'h', "html", "Generate output in HTML format."),
+ new Switch("headers", 'H', "header", "Generate HTML table headers format."),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, "-", JSAP.REQUIRED, JSAP.NOT_GREEDY, "The gzipped Warc file basename (if not present, or -, stdin/stdout will be used).")
+ });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) return;
+
+ final String warcFile = jsapResult.getString("warcFile");
+ final boolean html = jsapResult.getBoolean("html");
+ final boolean headers = jsapResult.getBoolean("headers");
+
+ final SummaryStats uncompressedSize = new SummaryStats();
+ final SummaryStats compressedSize = new SummaryStats();
+ final SummaryStats compressionRatio = new SummaryStats();
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File( warcFile + ".warc.gz")), IO_BUFFER_SIZE);
+ final long n = run(in, uncompressedSize, compressedSize, compressionRatio);
+ in.close();
+
+ if (html) {
+
+ if (headers) {
+ System.out.println("<TABLE border='1'>");
+ System.out.println("<TR><TH rowspan='2'>Name<TH rowspan='2'>Num.<br>Records<TH colspan='5'>Compressed byte size<TH colspan='5'>Uncompressed byte size<TH colspan='5'>Compression ratio (%)");
+ System.out.println("<TR><TH>min<TH>max<TH>average<TH>stdev<TH>sum<TH>min<TH>max<TH>average<TH>stdev<TH>sum<TH>min<TH>max<TH>average<TH>stdev");
+ }
+
+ System.out.print("<tr><td>" + warcFile + "<td>" + n);
+ System.out.print("<td>" + (long)compressedSize.min()
+ + "<td>" + (long)compressedSize.max()
+ + "<td>" + (int)(100 * compressedSize.mean()) / 100.0
+ + "<td>" + (int)(100 * compressedSize.standardDeviation()) / 100.0
+ + "<td>" + (long)compressedSize.sum());
+ System.out.print("<td>" + (long)uncompressedSize.min()
+ + "<td>" + (long)uncompressedSize.max()
+ + "<td>" + (int)(100 * uncompressedSize.mean()) / 100.0
+ + "<td>" + (int)(100 * uncompressedSize.standardDeviation()) / 100.0
+ + "<td>" + (long)uncompressedSize.sum());
+ System.out.print("<td>" + (long)compressionRatio.min()
+ + "<td>" + (long)compressionRatio.max()
+ + "<td>" + (int)(100 * compressionRatio.mean()) / 100.0
+ + "<td>" + (int)(100 * compressionRatio.standardDeviation()) / 100.0);
+ System.out.println();
+
+ if (headers) System.out.println("</TABLE>");
+
+ } else {
+
+ System.out.println("Records: " + n);
+ System.out.println("Compressed size: min = " + (long)compressedSize.min()
+ + ", max = " + (long)compressedSize.max()
+ + ", avg = " + compressedSize.mean()
+ + ", sd = " + compressedSize.standardDeviation()
+ + ", sum = " + (long)compressedSize.sum());
+ System.out.println("Uncompressed size: min = " + (long)uncompressedSize.min()
+ + ", max = " + (long)uncompressedSize.max()
+ + ", avg = " + uncompressedSize.mean()
+ + ", sd = " + uncompressedSize.standardDeviation()
+ + ", sum = " + (long)uncompressedSize.sum());
+ System.out.println("Compression ratio: min = " + (long)compressionRatio.min()
+ + "%, max = " + (long)compressionRatio.max()
+ + "%, avg = " + compressionRatio.mean()
+ + "%, sd = " + compressionRatio.standardDeviation() +"%");
+
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/GrepWarc.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/GrepWarc.java
new file mode 100644
index 0000000..d56eab9
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/GrepWarc.java
@@ -0,0 +1,113 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.SortedSet;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.objects.ObjectRBTreeSet;
+import it.unimi.dsi.law.warc.filters.AbstractFilter;
+import it.unimi.dsi.law.warc.filters.Filter;
+import it.unimi.dsi.law.warc.filters.Filters;
+import it.unimi.dsi.law.warc.filters.parser.FilterParser;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcFilteredIterator;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A "grep" for WARC files. */
+
+public class GrepWarc {
+ private final static Logger LOGGER = LoggerFactory.getLogger(GrepWarc.class);
+
+ /**
+ * This method acts as a sort of "grep" for WARC files.
+ *
+ * <p> It reads from a given input stream a sequence of (possibly compressed)
+ * WARC records, and writes the one that are accepted by the specified
+ * {@link AbstractFilter} to a given output stream (uncompressed).
+ *
+ * @param in the input stream.
+ * @param isGZipped tells if the input stream contains compressed WARC records.
+ * @param filter the filter.
+ * @param out the output stream.
+ * @throws IOException
+ */
+ public static void run(final FastBufferedInputStream in, final boolean isGZipped, final Filter<WarcRecord> filter, final OutputStream out) throws IOException {
+ final WarcRecord inRecord = isGZipped ? new GZWarcRecord() : new WarcRecord();
+ final WarcRecord outRecord = new WarcRecord();
+ final ProgressLogger pl = new ProgressLogger(LOGGER, "records");
+ final WarcFilteredIterator it = new WarcFilteredIterator(in, inRecord, filter, pl);
+
+ pl.logInterval = ProgressLogger.TEN_SECONDS;
+ pl.start("Grepping...");
+ while (it.hasNext()) {
+ it.next(); // this will update pl
+ outRecord.copy(inRecord);
+ outRecord.write(out);
+ }
+ pl.done();
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+ SortedSet<String> filterNames = new ObjectRBTreeSet<String>();
+
+ for(Class<? extends Filter<?>> c : Filters.standardFilters()) filterNames.add(c.getSimpleName());
+
+ SimpleJSAP jsap = new SimpleJSAP(GrepWarc.class.getName(), "Grep for warc files.",
+ new Parameter[] {
+ new Switch("gzip", 'z', "gzip", "Tells if the warc is compressed."),
+ new UnflaggedOption("filter", JSAP.STRING_PARSER, JSAP.REQUIRED, "The filter. " + filterNames),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, "-", JSAP.REQUIRED, JSAP.NOT_GREEDY, "The Warc input file basename (if not present, or -, stdin will be used)."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean isGZipped = jsapResult.getBoolean("gzip");
+ final Filter<WarcRecord> filter = new FilterParser<WarcRecord>(WarcRecord.class).parse(jsapResult.getString("filter"));
+ final String warcFile = jsapResult.getString("warcFile");
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), IO_BUFFER_SIZE);
+ final FastBufferedOutputStream out = new FastBufferedOutputStream(System.out, IO_BUFFER_SIZE);
+
+ run(in, isGZipped, filter, out);
+
+ in.close();
+ out.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/IndexWarc.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/IndexWarc.java
new file mode 100644
index 0000000..624ce83
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/IndexWarc.java
@@ -0,0 +1,101 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.DataOutput;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.OutputStream;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A tool to index a WARC file. */
+
+public class IndexWarc {
+ private final static Logger LOGGER = LoggerFactory.getLogger(IndexWarc.class);
+
+ /**
+ * This method reads from a given input stream a sequence of WARC records and writes to a given output stream
+ * the byte offset of the read records.
+ *
+ * @param in the input warc stream.
+ * @param isGZipped tells if the input stream contains compressed WARC records.
+ * @param out the output index stream.
+ * @throws IOException
+ * @throws FormatException
+ */
+ public static void run(final FastBufferedInputStream in, final boolean isGZipped, final OutputStream out) throws IOException, FormatException {
+ final WarcRecord inRecord = isGZipped ? new GZWarcRecord() : new WarcRecord();
+ final ProgressLogger pl = new ProgressLogger(LOGGER, "records");
+ final DataOutput dout = new DataOutputStream(out);
+ pl.logInterval = ProgressLogger.TEN_SECONDS;
+
+ pl.start("Indexing...");
+ dout.writeLong(0);
+ while (inRecord.skip(in) != -1) {
+ dout.writeLong(in.position());
+ pl.update();
+ }
+ pl.done();
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(IndexWarc.class.getName(), "Index a warc file.",
+ new Parameter[] {
+ new Switch("gzip", 'z', "gzip", "Tells if the warc is compressed."),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, "-", JSAP.REQUIRED, JSAP.NOT_GREEDY, "The Warc file basename (if not present, or -, stdin/stdout will be used).")
+ });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) return;
+
+ final String warcFile = jsapResult.getString("warcFile");
+ final boolean isGZipped = jsapResult.getBoolean("gzip");
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : ""))), IO_BUFFER_SIZE);
+ final FastBufferedOutputStream out = new FastBufferedOutputStream(warcFile.equals("-") ? System.out : new FileOutputStream(new File(warcFile + ".warc" + (isGZipped ? ".gz" : "") + ".idx")), IO_BUFFER_SIZE);
+
+ run(in, isGZipped, out);
+
+ in.close();
+ out.close();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ListGZWarcComments.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ListGZWarcComments.java
new file mode 100644
index 0000000..79fc1de
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/ListGZWarcComments.java
@@ -0,0 +1,97 @@
+package it.unimi.dsi.law.warc.tool;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.io.PrintWriter;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.law.warc.io.GZWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.law.warc.util.Util;
+import it.unimi.dsi.logging.ProgressLogger;
+
+// RELEASE-STATUS: DIST
+
+/** A tool to list the GZip header comments contained in a compressed WARC file. */
+
+public class ListGZWarcComments {
+ private final static Logger LOGGER = LoggerFactory.getLogger(ListGZWarcComments.class);
+
+ /**
+ * Writes on the given writer the GZip header comment filed.
+ *
+ * @param in the input stream.
+ * @param pw the writer.
+ * @throws IOException
+ * @throws FormatException
+ */
+ public static void run(final FastBufferedInputStream in, final PrintWriter pw) throws IOException, FormatException {
+ final GZWarcRecord record = new GZWarcRecord();
+ final ProgressLogger pl = new ProgressLogger(LOGGER, 1, TimeUnit.MINUTES, "records");
+
+ pl.start("Listing...");
+ while (record.skip(in) != -1) { // we just need headers
+ pw.println(Util.getString(record.gzheader.comment));
+ pl.update();
+ }
+ pl.done();
+ }
+
+ public static final String DEFAULT_BUFFER_SIZE = "64Ki";
+
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(ListGZWarcComments.class.getName(), "Lists the gzip comments of a compressed warc file.",
+ new Parameter[] {
+ new FlaggedOption("bufferSize", JSAP.INTSIZE_PARSER, DEFAULT_BUFFER_SIZE, JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of an I/O buffer."),
+ new UnflaggedOption("warcFile", JSAP.STRING_PARSER, null, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The Warc input file basename (if not present, or -, stdin will be used)."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String warcFile = jsapResult.getString("warcFile");
+ final int bufferSize = jsapResult.getInt("bufferSize");
+
+ final FastBufferedInputStream in = new FastBufferedInputStream(warcFile.equals("-") ? System.in : new FileInputStream(new File(warcFile + ".warc.gz")), bufferSize);
+ final PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FastBufferedOutputStream(System.out, bufferSize), "ASCII"));
+
+ try {
+ run(in, pw);
+ }
+ finally {
+ pw.close();
+ in.close();
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/package.html b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/package.html
new file mode 100644
index 0000000..efa7ee4
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/tool/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>LAW software</title>
+ </head>
+
+ <body>
+
+ <p>Command-line tools that manipulate WARC files.
+
+</body>
+</html>
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/AbstractHttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/AbstractHttpResponse.java
new file mode 100644
index 0000000..540f6bf
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/AbstractHttpResponse.java
@@ -0,0 +1,110 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.util.Date;
+import java.util.UUID;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.warc.io.MeasurableSequenceInputStream;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.ContentType;
+import it.unimi.dsi.law.warc.io.WarcRecord.RecordType;
+import it.unimi.dsi.util.XorShift128PlusRandomGenerator;
+
+// RELEASE-STATUS: DIST
+
+/** An abstract implementation of {@link HttpResponse} providing a {@link #toWarcRecord(WarcRecord)} method that can
+ * be used to populate a WARC record (in order to write it). */
+
+public abstract class AbstractHttpResponse implements HttpResponse {
+
+ /** A high-quality pseudorandom generator to generate UUIDs. */
+ protected final XorShift128PlusRandomGenerator random = new XorShift128PlusRandomGenerator();
+
+ /**
+ * Populates a WARC record with contents from this response.
+ *
+ * <p>This method uses the getters of the {@link HttpResponse} interface to populate the given record.
+ * For this reason, concrete implementations of this class must provide an implementation for
+ * {@link HttpResponse#contentAsStream()} that will be used to setup the {@link WarcRecord#block}.
+ *
+ * <p>Moreover, if the concrete implementation through which this method is called implements the
+ * {@link DigestBasedDuplicateDetection} interface, the WARC record will be also populated with the
+ * information required to represent the duplicate information.
+ *
+ * @param record the record.
+ * @throws UnsupportedEncodingException
+ * @throws IOException
+ */
+ @SuppressWarnings("resource")
+ public void toWarcRecord(WarcRecord record) throws UnsupportedEncodingException, IOException {
+
+ // fill header
+
+ WarcRecord.Header header = record.header;
+
+ header.recordType = RecordType.RESPONSE;
+ header.subjectUri = uri();
+ header.creationDate = new Date();
+
+ String scheme = header.subjectUri.getScheme();
+ if (scheme == null) throw new IllegalArgumentException("No scheme avaialbe for " + header.subjectUri);
+ if (scheme.equals("https")) header.contentType = ContentType.HTTPS;
+ else if (scheme.equals("http")) header.contentType = ContentType.HTTP;
+ else throw new IllegalArgumentException("Only http/https schemes allowed, instead scheme is " + scheme);
+
+ header.recordId = new UUID(random.nextLong(), random.nextLong());
+ header.anvlFields.clear();
+
+ // fill block with headers only
+
+ FastByteArrayOutputStream buf = new FastByteArrayOutputStream();
+
+ assert statusLine() != null;
+ assert statusLine().toString() != null;
+
+ buf.write(statusLine().toString().getBytes(HEADER_CHARSET));
+ buf.write(WarcRecord.CRLF);
+ Util.writeANVLHeaders(buf, headers(), HEADER_CHARSET);
+ buf.write(WarcRecord.CRLF);
+ buf.flush();
+ final MeasurableInputStream headerBlock = new FastByteArrayInputStream(buf.array, 0, buf.length);
+
+ // deal with block according to duplicate information
+
+ if (this instanceof DigestBasedDuplicateDetection) {
+ DigestBasedDuplicateDetection t = (DigestBasedDuplicateDetection)this;
+ if (t.digest() != null) record.header.anvlFields.put(HttpResponse.DIGEST_HEADER, Util.toHexString(t.digest()));
+ if (t.isDuplicate()) {
+ record.header.anvlFields.put(HttpResponse.ISDUPLICATE_HEADER, "true");
+ record.block = headerBlock;
+ return;
+ }
+ }
+
+ // append content
+
+ record.block = new MeasurableSequenceInputStream(headerBlock, contentAsStream());
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/ByteArrayCharSequence.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/ByteArrayCharSequence.java
new file mode 100644
index 0000000..9e89bdf
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/ByteArrayCharSequence.java
@@ -0,0 +1,120 @@
+package it.unimi.dsi.law.warc.util;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.bytes.ByteArrays;
+
+// RELEASE-STATUS: DIST
+
+/** An adapter exposing a byte array as an ISO-8859-1-encoded
+ * character sequence.
+ *
+ * <p>An instance of this adapter can be reused by {@linkplain #wrap(byte[], int, int)
+ * wrapping a new byte array}.
+ *
+ * <p>Note that for convenience this class exposes a {@link #hashCode()} method that
+ * return the same result as {@link String}, but equality is not by content.
+ */
+
+public class ByteArrayCharSequence implements CharSequence {
+ /** The underlying byte array. */
+ private byte[] b;
+ /** The first valid byte in {@link #b}. */
+ private int offset;
+ /** The number of valid bytes in {@link #b}, starting at {@link #offset}. */
+ private int length;
+
+ /** Creates a new byte-array character sequence using the provided byte-array fragment.
+ *
+ * @param b a byte array.
+ * @param offset the first valid byte in <code>b</code>.
+ * @param length the number of valid bytes in <code>b</code>, starting at <code>offset</code>.
+ */
+ public ByteArrayCharSequence(final byte[] b, int offset, int length) {
+ wrap(b, offset, length);
+ }
+
+ /** Creates a new byte-array character sequence using the provided byte array.
+ *
+ * @param b a byte array.
+ */
+ public ByteArrayCharSequence(final byte[] b) {
+ this(b, 0, b.length);
+ }
+
+ /** Creates a new empty byte-array character sequence.
+ */
+ public ByteArrayCharSequence() {
+ this(ByteArrays.EMPTY_ARRAY);
+ }
+
+ /** Wraps a byte-array fragment into this byte-array character sequence.
+ *
+ * @param b a byte array.
+ * @param offset the first valid byte in <code>b</code>.
+ * @param length the number of valid bytes in <code>b</code>, starting at <code>offset</code>.
+ * @return this byte-array character sequence.
+ */
+ public ByteArrayCharSequence wrap(final byte[] b, int offset, int length) {
+ ByteArrays.ensureOffsetLength(b, offset, length);
+ this.b = b;
+ this.offset = offset;
+ this.length = length;
+ return this;
+ }
+
+ /** Wraps a byte array into this byte-array character sequence.
+ *
+ * @param b a byte array.
+ */
+ public void wrap(final byte[] b) {
+ wrap(b, 0, b.length);
+ }
+
+ @Override
+ public int length() {
+ return length;
+ }
+
+ @Override
+ public char charAt(int index) {
+ if (index < 0 || index >= length) throw new IndexOutOfBoundsException(Integer.toString(index));
+ return (char)(b[offset + index] & 0xFF);
+ }
+
+ @Override
+ public CharSequence subSequence(int start, int end) {
+ if (start < 0 || end > length || end < 0 || end < start) throw new IndexOutOfBoundsException();
+ return new ByteArrayCharSequence(b, start + offset, end - start);
+ }
+
+ @Override
+ public String toString() {
+ final StringBuilder builder = new StringBuilder();
+ for(int i = 0; i < length; i++) builder.append((char)(b[offset + i] & 0xFF));
+ return builder.toString();
+ }
+
+ @Override
+ public int hashCode() {
+ int h = 0;
+ for (int i = 0; i < length; i++) h = 31 * h + b[offset + i];
+ return h;
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/DigestBasedDuplicateDetection.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/DigestBasedDuplicateDetection.java
new file mode 100644
index 0000000..d5628e7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/DigestBasedDuplicateDetection.java
@@ -0,0 +1,43 @@
+package it.unimi.dsi.law.warc.util;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+// RELEASE-STATUS: DIST
+
+/**
+ * Allows to determine if an {@link HttpResponse} is duplicate.
+ *
+ * For more details, see the relative <a href='../io/package-summary.html#dup'>section</a> in the <code>it.unimi.dsi.law.warc.io</code> package description.
+ */
+public interface DigestBasedDuplicateDetection {
+
+
+ /** Returns the content digest.
+ *
+ * @return the digest.
+ */
+ public byte[] digest();
+
+ /** Returns the duplicate status of this response.
+ *
+ * @return whether this response is a duplicate.
+ */
+ public boolean isDuplicate();
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/HttpComponentsHttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/HttpComponentsHttpResponse.java
new file mode 100644
index 0000000..52e3eb8
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/HttpComponentsHttpResponse.java
@@ -0,0 +1,226 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.util.Map;
+import java.util.NoSuchElementException;
+
+import org.apache.http.Header;
+import org.apache.http.HttpEntity;
+import org.apache.http.HttpResponse;
+import org.apache.http.StatusLine;
+import org.apache.http.util.EntityUtils;
+
+import com.google.common.io.ByteStreams;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.fastutil.objects.AbstractObject2ObjectMap;
+import it.unimi.dsi.fastutil.objects.AbstractObjectIterator;
+import it.unimi.dsi.fastutil.objects.AbstractObjectSet;
+import it.unimi.dsi.fastutil.objects.Object2ObjectMap;
+import it.unimi.dsi.fastutil.objects.ObjectIterator;
+import it.unimi.dsi.fastutil.objects.ObjectSet;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+
+// RELEASE-STATUS: DIST
+
+/** An concrete subclass of {@link AbstractHttpResponse} that implements
+ * missing methods by wrapping an Apache HTTP Components {@link HttpResponse}.
+ *
+ * <p>A typical use case of this class is storing in a {@link WarcRecord} the
+ * content of an Apache HTTP Components {@link HttpResponse}. The nested
+ * class {@link HttpResponseHeaderMap} can be used in other classes to expose
+ * as a standard Java map the header-related methods of an Apache HTTP Components {@link HttpResponse}.
+ *
+ * <p>To be able to return a {@link MeasurableInputStream}, this class caches the
+ * result of the underlying Apache HTTP Components {@link HttpResponse}. The cache
+ * is never shrunk, but {@link #clear()} will trim it to length zero. You can override
+ * {@link #contentAsStream()} to provide different ways of perform the caching.
+ */
+
+public class HttpComponentsHttpResponse extends AbstractHttpResponse {
+
+ /** A wrapper class exposing headers in {@link it.unimi.dsi.law.warc.util.HttpResponse#headers()}
+ * format by delegating to an {@link HttpResponse}. */
+ @SuppressWarnings("serial")
+ public final static class HttpResponseHeaderMap extends AbstractObject2ObjectMap<String, String> {
+ private HttpResponse httpResponse;
+
+ /** Sets the response whose headers will be wrapped by this map.
+ *
+ * @param httpResponse a response whose headers will be exposed by this map.
+ */
+ public void response(final HttpResponse httpResponse) {
+ this.httpResponse = httpResponse;
+ }
+
+ @Override
+ public ObjectSet<it.unimi.dsi.fastutil.objects.Object2ObjectMap.Entry<String, String>> object2ObjectEntrySet() {
+ return new AbstractObjectSet<Object2ObjectMap.Entry<String,String>>() {
+ private final Header[] header = httpResponse.getAllHeaders();
+
+ @Override
+ public ObjectIterator<it.unimi.dsi.fastutil.objects.Object2ObjectMap.Entry<String, String>> iterator() {
+ return new AbstractObjectIterator<Object2ObjectMap.Entry<String,String>>() {
+ private int i = 0;
+ @Override
+ public boolean hasNext() {
+ return i < header.length;
+ }
+
+ @Override
+ public it.unimi.dsi.fastutil.objects.Object2ObjectMap.Entry<String, String> next() {
+ if (! hasNext()) throw new NoSuchElementException();
+ return new BasicEntry<String,String>(header[i].getName(), header[i++].getValue());
+ }
+
+ };
+ }
+
+ @Override
+ public int size() {
+ return header.length;
+ }
+
+ };
+ }
+
+ @Override
+ public String get(Object key) {
+ final Header[] header = httpResponse.getHeaders(key.toString());
+ if (header == null || header.length == 0) return null;
+ if (header.length == 1) return header[0].getValue();
+ final StringBuilder stringBuilder = new StringBuilder();
+ for(int i = 0; i < header.length; i++) {
+ if (i != 0) stringBuilder.append(',');
+ stringBuilder.append(header[i].getValue());
+ }
+
+ return stringBuilder.toString();
+ }
+
+ @Override
+ public int size() {
+ return httpResponse.getAllHeaders().length;
+ }
+ }
+
+ /** The URL associated with {@link #httpResponse}. */
+ protected URI url;
+ /** The response wrapped by this {@link HttpComponentsHttpResponse}. */
+ protected HttpResponse httpResponse;
+ /** The header map wrapping {@link #httpResponse}'s headers. */
+ protected HttpResponseHeaderMap headerMap = new HttpResponseHeaderMap();
+ /** A cache for the {@linkplain HttpEntity#getContent() content} of the {@linkplain org.apache.http.HttpResponse#getEntity() entity}
+ * returned by {@link #httpResponse}. */
+ protected FastByteArrayOutputStream cachedContent = new FastByteArrayOutputStream();
+ /** Whether the {@linkplain HttpEntity#getContent() content} of the {@linkplain org.apache.http.HttpResponse#getEntity() entity}
+ * returned by {@link #httpResponse} has been cached in {@link #cachedContent}. */
+ protected boolean contentReady;
+
+ /** Creates a new instance.
+ *
+ * <p>Use {@link #set(URI, HttpResponse)} to wrap an Apache HTTP Components {@link HttpResponse}.
+ */
+ public HttpComponentsHttpResponse() {}
+
+ /** Creates a new instance wrapping a given Apache HTTP Components {@link HttpResponse}.
+ *
+ * @param url the URL for <code>httpResponse</code>.
+ * @param httpResponse the response to be wrapped.
+ */
+ public HttpComponentsHttpResponse(final URI url, final HttpResponse httpResponse) {
+ set(url, httpResponse);
+ }
+
+ /** Sets the response wrapped by this instance.
+ *
+ * @param url the URL for <code>httpResponse</code>.
+ * @param httpResponse the response to be wrapped.
+ */
+ public void set(final URI url, final HttpResponse httpResponse) {
+ this.url = url;
+ this.httpResponse = httpResponse;
+ headerMap.response(httpResponse);
+ contentReady = false;
+ cachedContent.reset();
+ }
+
+ /** Invokes {@link EntityUtils#consume(HttpEntity)} on the entity returned by the underlying
+ * Apache HTTP Components {@link HttpResponse}. */
+ public void consume() throws IOException {
+ EntityUtils.consume(httpResponse.getEntity());
+ }
+
+ @Override
+ public int status() {
+ return httpResponse.getStatusLine().getStatusCode();
+ }
+
+ @Override
+ public StatusLine statusLine() {
+ return httpResponse.getStatusLine();
+ }
+
+ @Override
+ public Map<String, String> headers() {
+ return headerMap;
+ }
+
+ @Override
+ public MeasurableInputStream contentAsStream() throws IOException {
+ final HttpEntity entity = httpResponse.getEntity();
+ if (entity == null) return null;
+ if (! contentReady) {
+ final InputStream instream = entity.getContent();
+ try {
+ contentReady = true;
+ ByteStreams.copy(entity.getContent(), cachedContent);
+ } finally {
+ try { instream.close(); } catch (Exception ignore) {}
+ }
+ }
+
+ return new FastByteArrayInputStream(cachedContent.array, 0, cachedContent.length);
+ }
+
+ @Override
+ public URI uri() {
+ return url;
+ }
+
+ @Override
+ public boolean fromWarcRecord(WarcRecord record) throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Clears this {@link HttpComponentsHttpResponse}, in particular trimming the content cache. */
+ public void clear() {
+ httpResponse = null;
+ headerMap.response(null);
+ contentReady = false;
+ cachedContent.reset();
+ cachedContent.trim();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/HttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/HttpResponse.java
new file mode 100644
index 0000000..b8dd492
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/HttpResponse.java
@@ -0,0 +1,80 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.IOException;
+import java.nio.charset.Charset;
+import java.util.Map;
+
+import org.apache.http.HttpHeaders;
+import org.apache.http.StatusLine;
+
+import com.google.common.base.Charsets;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+
+// RELEASE-STATUS: DIST
+
+/** Provides high level access to WARC records with <code>record-type</code> equal to
+ * <code>response</code> and <code>content-type</code> equal to <code>HTTP</code>
+ * (or <code>HTTPS</code>).
+ */
+public interface HttpResponse extends Response {
+
+ /** The WARC <code>anvl-filed</code> name to store the charset recognized during parsing. */
+ public static final String GUESSED_CHARSET_HEADER = "BUbiNG-guessed-charset";
+
+ /** The WARC <code>anvl-filed</code> name to store the digest. */
+ public static final String DIGEST_HEADER = "BUbiNG-content-digest";
+
+ /** The WARC <code>anvl-filed</code> name to store the digest. */
+ public static final String ISDUPLICATE_HEADER = "BUbiNG-is-duplicate";
+
+ /** The {@link Charset} used to encode/decode the HTTP headers. */
+ public static final Charset HEADER_CHARSET = Charsets.ISO_8859_1;
+
+ /** Returns the response status.
+ *
+ * @return the status of this response.
+ */
+ public int status();
+
+ /** Returns the response status line.
+ *
+ * @return the status line of this response.
+ */
+ public StatusLine statusLine();
+
+ /** Returns the headers of this response.
+ *
+ * <p><strong>Warning</strong>: as of LAW 3.0, and contrarily to previous behaviour,
+ * the map is case-sensitive. Use the predefined names in {@link HttpHeaders} or
+ * {@link com.google.common.net.HttpHeaders} to avoid typing and casing mistakes.
+ *
+ * @return the headers of this response.
+ */
+ public Map<String,String> headers();
+
+ /** Returns the content of this response as a stream.
+ *
+ * @return the content of this response as a stream.
+ * @throws IOException
+ */
+ public MeasurableInputStream contentAsStream() throws IOException;
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/MetadataHttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/MetadataHttpResponse.java
new file mode 100644
index 0000000..2271dd7
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/MetadataHttpResponse.java
@@ -0,0 +1,103 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.net.URI;
+import java.util.Map;
+
+import org.apache.http.StatusLine;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.Object2ObjectLinkedOpenCustomHashMap;
+import it.unimi.dsi.fastutil.objects.Object2ObjectMap;
+
+//RELEASE-STATUS: DIST
+
+/** An abstract extention of {@link AbstractHttpResponse} which additionally provides support
+ * for getting and setting metadata (i.e., {@link #uri()}, {@link #statusLine()}, {@link #status()} and {@link #headers()}). */
+
+public abstract class MetadataHttpResponse extends AbstractHttpResponse {
+
+ /** A special map used for headers: keys are case-insensitive, and multiple puts are converted into comma-separated values. */
+ public static final class HeaderMap extends Object2ObjectLinkedOpenCustomHashMap<String, String> {
+ private static final long serialVersionUID = 1L;
+
+ public HeaderMap() {
+ super(Util.CASE_INSENSITIVE_STRING_HASH_STRATEGY);
+ }
+
+ @Override
+ public String put(String key, String value) {
+ if (!containsKey(key)) return super.put(key, value);
+ else return super.put(key, get(key) + "," + value);
+ }
+ }
+
+ /** The URI that is currently contained in this response. */
+ protected URI uri;
+ /** The status line of this response. */
+ protected StatusLine statusLine;
+ /** The header map. */
+ protected final Map<String,String> headerMap = new HeaderMap();
+
+ @Override
+ public URI uri() {
+ return uri;
+ }
+
+ @Override
+ public int status() {
+ return statusLine.getStatusCode();
+ }
+
+ @Override
+ public StatusLine statusLine() {
+ return statusLine;
+ }
+
+ @Override
+ public Map<String, String> headers() {
+ return headerMap;
+ }
+
+ /** Sets the url.
+ *
+ * @param url the url.
+ */
+ public void uri(URI url) {
+ this.uri = url;
+ }
+
+ /** Sets the status line
+ *
+ * @param statusLine the status line.
+ */
+ public void statusLine(StatusLine statusLine) {
+ this.statusLine = statusLine;
+ }
+
+ /** Sets the headers.
+ *
+ * @param headerMap the content (it may be {@code null},
+ * in this case the headers will be left empty).
+ */
+ public void headers(Object2ObjectMap<String, String> headerMap) {
+ this.headerMap.clear();
+ if (headerMap != null) this.headerMap.putAll(headerMap);
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/MutableHttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/MutableHttpResponse.java
new file mode 100644
index 0000000..624f9bb
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/MutableHttpResponse.java
@@ -0,0 +1,54 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.IOException;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+
+
+
+//RELEASE-STATUS: DIST
+
+/** A mutable extension of {@link MetadataHttpResponse} that provides
+ * support for {@linkplain #contentAsStream(MeasurableInputStream) setting the content stream}.
+ * Note that {@link #fromWarcRecord(WarcRecord)} is not implemented. */
+public class MutableHttpResponse extends MetadataHttpResponse {
+
+ /** The content of this response. */
+ private MeasurableInputStream contentAsStream;
+
+ /** Sets the content.
+ *
+ * @param contentAsStream the content.
+ */
+ public void contentAsStream(MeasurableInputStream contentAsStream) {
+ this.contentAsStream = contentAsStream;
+ }
+
+ public MeasurableInputStream contentAsStream() throws IOException {
+ if (contentAsStream == null) throw new IllegalStateException("The content stream has not been set yet");
+ return contentAsStream;
+ }
+
+ public boolean fromWarcRecord(WarcRecord wr) throws IOException {
+ throw new UnsupportedOperationException();
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/RemappedStringMap.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/RemappedStringMap.java
new file mode 100644
index 0000000..4a5c22d
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/RemappedStringMap.java
@@ -0,0 +1,151 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.Serializable;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.io.FileLinesCollection;
+import it.unimi.dsi.big.util.ShiftAddXorSignedStringMap;
+import it.unimi.dsi.big.util.StringMap;
+import it.unimi.dsi.bits.TransformationStrategies;
+import it.unimi.dsi.fastutil.bytes.ByteArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.objects.AbstractObject2LongFunction;
+import it.unimi.dsi.fastutil.objects.ObjectBigList;
+import it.unimi.dsi.sux4j.mph.MWHCFunction;
+
+// RELEASE-STATUS: DIST
+
+/** A {@link StringMap} that remaps values returned by another {@link StringMap}.
+ *
+ * <p>
+ * Instances of this class wrap a given minimal perfect hash
+ * and a given map (an integer array). Queries to {@link #getLong(Object)} are
+ * solved by first inquiring the given map.
+ * If the result is -1, it is returned; otherwise, we use the result to index
+ * the map and return the corresponding element.
+ */
+
+public class RemappedStringMap extends AbstractObject2LongFunction<CharSequence> implements StringMap<CharSequence>, Serializable {
+
+ final public static String DEFAULT_BUFFER_SIZE = "64Ki";
+
+ private static final long serialVersionUID = 1L;
+ /** The underlying string map. */
+ private final StringMap<? extends CharSequence> stringMap;
+ /** The remapping array. */
+ private final int[] map;
+
+ /** Creates a new remapped minimal perfect hash.
+ *
+ * @param stringMap the underlying minimal perfect hash.
+ * @param map a map that will be used to remap the numbers returned by <code>mph</code>.
+ */
+
+ public RemappedStringMap(final StringMap<? extends CharSequence> stringMap, final int[] map) {
+ if (stringMap.size64() != map.length) throw new IllegalArgumentException("Minimal perfect hash size (" + stringMap.size64() + ") is not equal to map length (" + map.length + ")");
+ this.stringMap = stringMap;
+ this.map = map;
+ }
+
+ public long getLong(Object o) {
+ CharSequence term = (CharSequence)o;
+ final int x = (int)stringMap.getLong(term);
+ if (x == -1) return -1;
+ return map[x];
+ }
+
+ public long size64() {
+ return stringMap.size64();
+ }
+
+ public int size() {
+ return (int)Math.min(Integer.MAX_VALUE, size64());
+ }
+
+ public static void run(String duplicateURLs, String archetypeURLs, StringMap<? extends CharSequence> resolver, String remappedFilename, int bufferSize) throws IOException {
+
+ @SuppressWarnings("resource")
+ final FastBufferedInputStream arch = new FastBufferedInputStream(new FileInputStream(archetypeURLs), bufferSize);
+
+ // First we build a signed minimal perfect hash for duplicates.
+ FileLinesCollection flc = new FileLinesCollection(duplicateURLs, "ASCII");
+ final StringMap<CharSequence> duplicateMph = new ShiftAddXorSignedStringMap(flc.iterator(), new MWHCFunction.Builder<CharSequence>().keys(flc).transform(TransformationStrategies.utf16()).build());
+
+ // TODO: works only for less than Integer.MAX_VALUE duplicates.
+ if (duplicateMph.size64() > Integer.MAX_VALUE) throw new IndexOutOfBoundsException();
+
+ byte[] line = new byte[2048];
+ int[] map = new int[(int)duplicateMph.size64()];
+ int start, len;
+ ByteArrayCharSequence s = new ByteArrayCharSequence();
+
+ for(int n = 0; ; n++) {
+ start = 0;
+ while ((len = arch.readLine(line, start, line.length - start)) == line.length - start) {
+ start += len;
+ line = ByteArrays.grow(line, line.length + 1);
+ };
+
+ len = start + Math.max(len, 0);
+ if (len == 0) break;
+ // TODO: this is really inefficient (used to peek directly into the
+ map[n] = (int)resolver.getLong(s.wrap(line, 0, len));
+ if (map[n] == - 1) throw new IllegalArgumentException("URL " + new String(line, 0, len, "ASCII") + " cannot be resolved");
+ }
+
+ BinIO.storeObject(new RemappedStringMap(duplicateMph, map), remappedFilename);
+
+ arch.close();
+ }
+ public ObjectBigList<CharSequence> list() {
+ return null;
+ }
+
+ public boolean containsKey(final Object o) {
+ return stringMap.containsKey(o);
+ }
+
+ @SuppressWarnings("unchecked")
+ public static void main(String arg[]) throws Exception {
+ SimpleJSAP jsap = new SimpleJSAP(RemappedStringMap.class.getName(),
+ "Builds a remapped minimal perfect hash by reading two parallel files (duplicates and archetypes), and mapping each line of the first file to the number returned by a given minimal perfect hash on the corresponding line of the second file.",
+ new Parameter[] {
+ new FlaggedOption("bufferSize", JSAP.INTSIZE_PARSER, DEFAULT_BUFFER_SIZE, JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of an I/O buffer."),
+ new UnflaggedOption("duplicateURLs", JSAP.STRING_PARSER, JSAP.REQUIRED, "The duplicate file."),
+ new UnflaggedOption("archetypeURLs", JSAP.STRING_PARSER, JSAP.REQUIRED, "The archetype file."),
+ new UnflaggedOption("resolver", JSAP.STRING_PARSER, JSAP.REQUIRED, "The term map used to resolve the second field."),
+ new UnflaggedOption("remappedMph", JSAP.STRING_PARSER, JSAP.REQUIRED, "The resulting remapped minimal perfect hash."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ run(jsapResult.getString("duplicateURLs"), jsapResult.getString("archetypeURLs"), (StringMap<? extends CharSequence>)BinIO.loadObject(jsapResult.getString("resolver")), jsapResult.getString("remappedMph"), jsapResult.getInt("bufferSize"));
+ }
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/Response.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/Response.java
new file mode 100644
index 0000000..8ddc754
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/Response.java
@@ -0,0 +1,47 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.IOException;
+import java.net.URI;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.law.warc.io.WarcRecord;
+
+// RELEASE-STATUS: DIST
+
+/** Provides high level access to WARC records with <code>record-type</code> equal to
+ * <code>response</code>.
+ */
+public interface Response {
+
+ /** Returns the URI associated with this response.
+ *
+ * @return the URI associated with this response.
+ */
+ URI uri();
+
+ /** Fills this response with the content of a {@link WarcRecord} (optional operation).
+ *
+ * @param record the record.
+ * @return true iff the <code>record-type</code> of the given record is <code>response</code>.
+ * @throws IOException
+ */
+ public boolean fromWarcRecord(WarcRecord record) throws IOException;
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/Util.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/Util.java
new file mode 100644
index 0000000..ac9da1c
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/Util.java
@@ -0,0 +1,552 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.BufferedReader;
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.OutputStream;
+import java.io.OutputStreamWriter;
+import java.io.Writer;
+import java.net.URL;
+import java.nio.charset.Charset;
+import java.nio.charset.CharsetEncoder;
+import java.nio.charset.CodingErrorAction;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+import java.util.StringTokenizer;
+
+import org.apache.http.StatusLine;
+import org.apache.http.message.BasicLineParser;
+import org.apache.http.message.LineParser;
+import org.apache.http.message.ParserCursor;
+import org.apache.http.util.CharArrayBuffer;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.util.XorShift128PlusRandom;
+
+// RELEASE-STATUS: DIST
+
+/** Static utility methods. */
+
+public class Util {
+ private static final boolean ASSERTS = true;
+
+ private Util() {}
+
+ /** The strategy used to decide whether two header names are the same: we require that they are equal up to case. */
+ public static final Hash.Strategy<String> CASE_INSENSITIVE_STRING_HASH_STRATEGY = new Hash.Strategy<String>() {
+ public int hashCode(final String key) {
+ int h = 0xDEAFC1CC;
+ for(int i = key.length(); i-- != 0;) h ^= (h << 5) + Character.toLowerCase(key.charAt(i)) + (h >>> 2);
+ return h;
+ }
+
+ public boolean equals(final String key0, final String key1) {
+ return key0.equalsIgnoreCase(key1);
+ }
+ };
+
+ /** Returns the given ASCII string as a byte array; characters are filtered through the 1111111(=0x7F) mask.
+ *
+ * @param s a string.
+ * @return <code>s</code> as a byte array.
+ */
+ public static byte[] getASCIIBytes(String s) {
+ final byte[] result = new byte[s.length()];
+ if (ASSERTS) for (int i = result.length; i-- != 0;) assert s.charAt(i) < 0x80 : "Character at position " + i + " is " + (int)s.charAt(i) + " in \"" + s + "\"";
+ for (int i = result.length; i-- != 0;) result[i] = (byte)(s.charAt(i) & 0x7F);
+ return result;
+ }
+
+ /** Returns the given ASCII mutable string as a byte array; characters are filtered through the 1111111(=0x7F) mask.
+ *
+ * @param s a mutable string.
+ * @return <code>s</code> as a byte array.
+ */
+
+ public static byte[] getASCIIBytes(MutableString s) {
+ final byte[] result = new byte[s.length()];
+ final char[] a = s.array();
+ if (ASSERTS) for (int i = result.length; i-- != 0;) assert a[i] < 0x80 : "Character at position " + i + " is " + (int)a[i] + " in \"" + s + "\"";
+ for (int i = result.length; i-- != 0;) result[i] = (byte)(a[i] & 0x7F);
+ return result;
+ }
+
+ /** Returns the given byte array as an ASCII string. */
+ public static String getString(byte[] array) {
+ return getString(array, 0, array.length);
+ }
+
+ /** Returns the given byte array as an ASCII string. */
+ public static String getString(byte[] array, int offset, int length) {
+ if (ASSERTS) for (int j = length; j-- != 0;) assert array[offset + j] >= 0 : "Byte at position " + (offset + j) + " is " + array[offset + j];
+ int i = length;
+ final char charArray[] = new char[i];
+ while(i-- != 0) charArray[i] = (char)(array[offset + i] & 0x7F);
+ return new String(charArray);
+ }
+
+ /** Returns &lfloor; log<sub>10</sub>(<code>x</code>) &rfloor;.
+ *
+ * @param x an integer.
+ * @return &lfloor; log<sub>10</sub>(<code>x</code>) &rfloor;, or -1 if <code>x</code> is smaller than or equal to zero.
+ */
+
+ public static int log10(final int x) {
+ return (x < 100000 ?
+ (x < 100 ?
+ (x < 10 ?
+ (x < 1 ?
+ -1 /* 4 */
+ :
+ 0 /* 4 */
+ )
+ :
+ 1 /* 3 */
+ )
+ :
+ (x < 10000 ?
+ (x < 1000 ?
+ 2 /* 4 */
+ :
+ 3 /* 4 */
+ )
+ :
+ 4 /* 3 */
+ )
+ )
+ :
+ (x < 100000000 ?
+ (x < 10000000 ?
+ (x < 1000000 ?
+ 5 /* 4 */
+ :
+ 6 /* 4 */
+ )
+ :
+ 7 /* 3 */
+ )
+ :
+ (x < 1000000000 ?
+ 8 /* 3 */
+ :
+ 9 /* 3 */
+ )
+ )
+ );
+ }
+
+ /** Returns &lfloor; log<sub>10</sub>(<code>x</code>) &rfloor;.
+ *
+ * @param x an integer.
+ * @return &lfloor; log<sub>10</sub>(<code>x</code>) &rfloor;, or -1 if <code>x</code> is smaller than or equal to zero.
+ */
+
+ public static int log10(final long x) {
+ return (x < 1000000000 ?
+ (x < 10000 ?
+ (x < 100 ?
+ (x < 10 ?
+ (x < 1 ?
+ -1 /* 5 */
+ :
+ 0 /* 5 */
+ )
+ :
+ 1 /* 4 */
+ )
+ :
+ (x < 1000 ?
+ 2 /* 4 */
+ :
+ 3 /* 4 */
+ )
+ )
+ :
+ (x < 10000000 ?
+ (x < 1000000 ?
+ (x < 100000 ?
+ 4 /* 5 */
+ :
+ 5 /* 5 */
+ )
+ :
+ 6 /* 4 */
+ )
+ :
+ (x < 100000000 ?
+ 7 /* 4 */
+ :
+ 8 /* 4 */
+ )
+ )
+ )
+ :
+ (x < 100000000000000L ?
+ (x < 1000000000000L ?
+ (x < 100000000000L ?
+ (x < 10000000000L ?
+ 9 /* 5 */
+ :
+ 10 /* 5 */
+ )
+ :
+ 11 /* 4 */
+ )
+ :
+ (x < 10000000000000L ?
+ 12 /* 4 */
+ :
+ 13 /* 4 */
+ )
+ )
+ :
+ (x < 100000000000000000L ?
+ (x < 10000000000000000L ?
+ (x < 1000000000000000L ?
+ 14 /* 5 */
+ :
+ 15 /* 5 */
+ )
+ :
+ 16 /* 4 */
+ )
+ :
+ (x < 1000000000000000000L ?
+ 17 /* 4 */
+ :
+ 18 /* 4 */
+ )
+ )
+ )
+ );
+ }
+
+ /** Returns the number of decimal digits that are necessary to represent the argument.
+ *
+ * @param x a nonnegative integer.
+ * @return the number of decimal digits that are necessary to represent <code>x</code>.
+ */
+
+ public static int digits(final int x) {
+ if (ASSERTS) assert x >= 0 : x;
+ if (x == 0) return 1;
+ if (x > 1 << 30) return 10;
+ return log10(x) + 1;
+ }
+
+ /** Returns the number of decimal digits that are necessary to represent the argument.
+ *
+ * @param x a nonnegative long.
+ * @return the number of decimal digits that are necessary to represent <code>x</code>.
+ */
+ public static int digits(final long x) {
+ if (ASSERTS) assert x >= 0 : x;
+ if (x == 0) return 1;
+ if (x > 1L << 62) return 19;
+ return log10(x) + 1;
+ }
+
+ /** The given string is parsed as a comma-separated list of items, and the items are returned
+ * in the form of an array, possibly after resolving an indirection. More precisely, <code>s</code>
+ * is tokenized as a comma-separated list, and each item in the list is trimmed of all leading and trailing spaces. Then,
+ * if the remaining character sequence does not start with <code>@</code>, it is interpreted literally;
+ * otherwise, the <code>@</code> is stripped away and the remaining part is interpreted as a
+ * URL or as a filename (depending on whether it is a valid URL or not), and the corresponding URL or file is in turn
+ * read (ISO-8859-1 encoded) interpreted as a list of items, one per line, and the items are returned (literally)
+ * after trimming all leading and trailing spaces. Lines that start with a # are ignored.
+ *
+ * @param s the property to be parsed.
+ * @return the array of items (as explaied above).
+ * @throws IOException if an exception is thrown while reading indirect items.
+ */
+ public static String[] parseCommaSeparatedProperty(final String s) throws IOException {
+ final StringTokenizer st = new StringTokenizer(s, ",");
+ final List<String> result = new ArrayList<String>();
+ String item;
+
+ while (st.hasMoreTokens()) {
+ item = st.nextToken().trim();
+ if (item.length() > 1 && item.charAt(0) == '@') {
+ String urlOrFilename = item.substring(1);
+ URL url = new URL(urlOrFilename);
+ BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream(), "ISO-8859-1"));
+ while ((item = br.readLine()) != null) {
+ if (item.startsWith("#")) continue;
+ result.add(item.trim());
+ }
+ br.close();
+ } else result.add(item);
+ }
+ return result.toArray(new String[0]);
+ }
+
+ /** Consumes a given number of bytes from a stream.
+ *
+ * @param in the stream.
+ * @param howMany the number of bytes to read, actually fewer bytes may be read if end of file is reached.
+ * @throws IOException
+ */
+ public static void consume(InputStream in, long howMany) throws IOException {
+ byte[] b = new byte[1024]; // just to read a bunch at a time
+ long r;
+ while (howMany > 0 && (r = in.read(b, 0, (int)Math.min(howMany, 1024))) != -1) howMany -= r;
+ }
+
+ /** Consumes all the bytes of a stream.
+ *
+ * @param in the stream.
+ * @throws IOException
+ */
+ public static void consume(InputStream in) throws IOException {
+ consume(in, Long.MAX_VALUE);
+ }
+
+ /**
+ * Return byte array from an (unchunked) input stream.
+ * Stop reading when <tt>"\n"</tt> terminator encountered
+ * If the stream ends before the line terminator is found,
+ * the last part of the string will still be returned.
+ * If no input data available, {@code null} is returned.
+ *
+ * @param inputStream the stream to read from.
+ * @param charset the charset used to decode the stream.
+ *
+ * @throws IOException if an I/O problem occurs
+ * @return the read line.
+ */
+ public static String readHeaderLine(InputStream inputStream, Charset charset) throws IOException {
+ ByteArrayOutputStream buf = new ByteArrayOutputStream();
+ int ch;
+ while ((ch = inputStream.read()) >= 0) {
+ buf.write(ch);
+ if (ch == '\n') break; // be tolerant (RFC-2616 Section 19.3)
+ }
+ if (buf.size() == 0) return null;
+ byte[] rawdata = buf.toByteArray();
+ // strip CR and LF from the end
+ int len = rawdata.length; // len > 0 since if buf was empty we already returned
+ if (rawdata[len - 1] == '\n') {
+ len--;
+ if (len > 0 && rawdata[len - 1] == '\r') len--;
+ }
+ return new String(rawdata, 0, len, charset);
+ }
+
+ public static StatusLine readStatusLine(final MeasurableInputStream is, final Charset charset) throws IOException {
+ CharArrayBuffer buf = new CharArrayBuffer(64);
+ buf.append(readHeaderLine(is, charset));
+ LineParser parser = new BasicLineParser();
+ return parser.parseStatusLine(buf, new ParserCursor(0, buf.length()));
+ }
+
+ /**
+ * Parses headers from the given stream.
+ * Headers with the same name are not combined.
+ *
+ * @param is the stream to read headers from
+ * @param map is the map where the headers will be saved
+ * @param charset the charset to use for reading the data
+ *
+ * @throws IOException if an IO error occurs while reading from the stream
+ */
+ public static void readANVLHeaders(final MeasurableInputStream is, Map<String,String> map, final Charset charset) throws IOException, FormatException {
+ String name = null;
+ StringBuffer value = null;
+ for (;;) {
+ String line = readHeaderLine(is, charset);
+ if (line == null || line.trim().length() < 1) break;
+ // Parse the header name and value
+ // Check for folded headers first
+ // Detect LWS-char see HTTP/1.0 or HTTP/1.1 Section 2.2
+ // discussion on folded headers
+ if (line.charAt(0) == ' ' || line.charAt(0) == '\t') {
+ // we have continuation folded header so append value
+ if (value != null) {
+ value.append(' ');
+ value.append(line.trim());
+ }
+ } else {
+ // make sure we save the previous name,value pair if present
+ if (name != null) map.put(name, value.toString());
+ // Otherwise we should have normal HTTP header line
+ // Parse the header name and value
+ int colon = line.indexOf(":");
+ if (colon < 0) throw new FormatException("Unable to parse header: " + line);
+ name = line.substring(0, colon).trim();
+ value = new StringBuffer(line.substring(colon + 1).trim());
+ }
+ }
+ // make sure we save the last name,value pair if present
+ if (name != null) map.put(name, value.toString());
+ }
+
+ /**
+ * Writes a (name, value) map as an ANVL segment in a given stream.
+ *
+ * @param out the stream.
+ * @param map the map.
+ * @param charset the charset of the headers.
+ */
+ public static void writeANVLHeaders(final OutputStream out, final Map<String, String> map, final Charset charset) {
+ final CharsetEncoder encoder = charset.newEncoder();
+ encoder.onMalformedInput(CodingErrorAction.IGNORE);
+ encoder.onUnmappableCharacter(CodingErrorAction.IGNORE);
+ final Writer writer = new OutputStreamWriter(out, encoder);
+ try {
+ for(Map.Entry<String, String> e: map.entrySet()) {
+ writer.write(e.getKey());
+ writer.write(": ");
+ writer.write(e.getValue());
+ writer.write("\r\n");
+ }
+ writer.close();
+ }
+ catch(IOException cantHappen) {
+ throw new RuntimeException(cantHappen);
+ }
+ }
+
+ /** Returns a mutable string representing in hexadecimal a digest.
+ *
+ * @param a a digest, as a byte array.
+ * @return a string hexadecimal representation of <code>a</code>.
+ */
+ public static String toHexString(final byte[] a) {
+ MutableString result = new MutableString(a.length * 2);
+ for (int i = 0; i < a.length; i++)
+ result.append((a[i] >= 0 && a[i] < 16 ? "0" : "")).append(Integer.toHexString(a[i] & 0xFF));
+ return result.toString();
+ }
+
+ /** Returns a byte array corresponding to the given number.
+ *
+ * @param s the number, as a String.
+ * @return the byte array.
+ */
+ public static byte[] fromHexString(final String s) {
+ byte[] b = new byte[s.length() / 2];
+ for (int i = s.length() / 2; i-- != 0;)
+ b[i] = (byte)Integer.parseInt(s.substring(i * 2, i * 2 + 2), 16);
+ return b;
+ }
+
+ /** The random number generator used by {@link #createHierarchicalTempFile(File, int)}. */
+ private static final XorShift128PlusRandom RND = new XorShift128PlusRandom();
+
+ private static final Object CREATION_LOCK = new Object();
+
+ /**
+ * Creates a temporary file with a random hierachical path.
+ *
+ * <p> A random hierarchical path of <var>n</var> path elements is a sequence of <var>n</var>
+ * directories of two hexadecimal digits each, followed by a filename created by {@link File#createTempFile(String, String, File)}.
+ *
+ * <p> This method creates an empty file having a random hierarchical path of the specified
+ * number of path elements under a given base directory, creating all needed directories along
+ * the hierarchical path (whereas the base directory is expected to already exist).
+ *
+ * @param baseDirectory the base directory (it must exist).
+ * @param pathElements the number of path elements (filename excluded), must be in [0,8]
+ * @param prefix will be passed to {@link File#createTempFile(String, String, File)}
+ * @param suffix will be passed to {@link File#createTempFile(String, String, File)}
+ * @return the temporary file.
+ * @throws IOException
+ */
+ public static File createHierarchicalTempFile(final File baseDirectory, final int pathElements, final String prefix, final String suffix) throws IOException {
+ if (! baseDirectory.isDirectory()) throw new IllegalArgumentException(baseDirectory + " is not a directory.");
+ if (pathElements < 0 || pathElements > 8) throw new IllegalArgumentException();
+
+ long x;
+ synchronized (RND) { x = RND.nextLong(); }
+ StringBuilder stringBuilder = new StringBuilder();
+ for(int i = 0; i < pathElements; i++) {
+ if (i != 0) stringBuilder.append(File.separatorChar);
+ stringBuilder.append(Long.toHexString(x & 0xF));
+ x >>= 4;
+ stringBuilder.append(Long.toHexString(x & 0xF));
+ x >>= 4;
+ }
+
+ File directory = baseDirectory;
+ if (pathElements > 0) {
+ directory = new File(baseDirectory, stringBuilder.toString());
+ synchronized (CREATION_LOCK) {
+ if ((directory.exists() && ! directory.isDirectory()) || (! directory.exists() && ! directory.mkdirs())) throw new IOException("Cannot create directory " + directory);
+ }
+ }
+
+ return File.createTempFile(prefix, suffix, directory);
+ }
+
+ private static final char[] RESERVED = new char[] { '[', ']', '"', '|', '{', '}', '^', '<', '>', '`' };
+
+ private static final String[] RESERVED_SUBST;
+ static {
+ RESERVED_SUBST = new String[RESERVED.length];
+ for(int i = RESERVED_SUBST.length; i-- != 0;) RESERVED_SUBST[i] = (RESERVED[i] < 16 ? "%0" : "%") + Integer.toHexString(RESERVED[i]);
+ }
+
+ /** Fixes a given URL so that it is {@link BURL}-parsable.
+ *
+ * @param url a URL, possibly with bad characters in its path.
+ */
+
+
+ public static void fixURL(final MutableString url) {
+ // If they used %27 for the slash, fix it.
+ if (url.startsWith("http:%2F%2F")) url.replace("%2F", "/");
+ if (url.startsWith("http:%2f%2f")) url.replace("%2f", "/");
+ // If they wrote http:/<host>, fix it.
+ final int prefix = "http:/".length();
+ if (url.startsWith("http:/") && url.length() > prefix && url.charAt(prefix) != '/') url.insert(prefix, '/');
+ // If the last characters is a quote, eliminate it.
+ if (url.lastChar() == '"' || url.lastChar() == '\'') url.length(url.length() -1);
+ // Replace reserved characters (blindly)
+ url.replace(RESERVED, RESERVED_SUBST);
+
+ // Find percents not followed by two hexadecimal digits.
+ final char[] a = url.array();
+ final int l = url.length();
+ for(int i = l; i-- != 0;) {
+ if (a[i] == '%' && (i >= l - 2 || ! isHexDigit(a[i + 1]) || ! isHexDigit(a[i + 2]))) url.insert(i + 1, "25");
+ }
+ }
+
+ private static boolean isHexDigit(char c) {
+ return c >= '0' && c <= '9' || c >= 'A' && c <= 'F' || c >= 'a' && c <= 'f';
+ }
+
+ /** Checks if the given File exists and is a directory, or if not existent, it makes a directory (and its parent). */
+ public static boolean ensureDirectory(File dir) {
+ if (dir.exists() && ! dir.isDirectory()) return false;
+ if (! dir.exists()) dir.mkdirs();
+ return true;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/WarcHttpResponse.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/WarcHttpResponse.java
new file mode 100644
index 0000000..ea1ae1a
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/warc/util/WarcHttpResponse.java
@@ -0,0 +1,86 @@
+package it.unimi.dsi.law.warc.util;
+
+import java.io.IOException;
+import java.util.Map;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.warc.filters.IsHttpResponse;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+
+// RELEASE-STATUS: DIST
+
+/** An {@link AbstractHttpResponse} implementation that reads the response
+ * content from a WARC record (via the {@link #fromWarcRecord(WarcRecord)} method.
+ *
+ */
+public class WarcHttpResponse extends MetadataHttpResponse implements DigestBasedDuplicateDetection {
+
+ private MeasurableInputStream block;
+ private boolean headersHaveBeenParsed;
+
+ private byte[] digest;
+ private boolean isDuplicate;
+
+ @Override
+ public Map<String, String> headers() {
+ ensureHeadersHabeBeenParsed();
+ return headerMap;
+ }
+
+ public MeasurableInputStream contentAsStream() {
+ ensureHeadersHabeBeenParsed();
+ return block;
+ }
+
+ public boolean fromWarcRecord(WarcRecord wr) throws IOException {
+ if (! IsHttpResponse.INSTANCE.apply(wr)) return false;
+ uri = wr.header.subjectUri;
+ final String digestAsString = wr.header.anvlFields.get(HttpResponse.DIGEST_HEADER);
+ digest = digestAsString != null ? Util.fromHexString(digestAsString) : null;
+ isDuplicate = Boolean.valueOf(wr.header.anvlFields.get(HttpResponse.ISDUPLICATE_HEADER)).booleanValue();
+ statusLine = Util.readStatusLine(wr.block, HttpResponse.HEADER_CHARSET);
+ block = wr.block;
+ headersHaveBeenParsed = false;
+ return true;
+ }
+
+ private void ensureHeadersHabeBeenParsed() {
+ if (block == null) throw new NullPointerException("Block not yet read");
+ if (headersHaveBeenParsed) return;
+ headerMap.clear();
+ try {
+ Util.readANVLHeaders(block, headerMap, HEADER_CHARSET);
+ } catch (IOException | FormatException e) {
+ throw new RuntimeException(e); // before we where swalloing the exeption in the caller
+ }
+ headersHaveBeenParsed = true;
+ }
+
+ public byte[] digest() {
+ return digest;
+ }
+
+ public boolean isDuplicate() {
+ return isDuplicate;
+ }
+
+}
diff --git a/third_party/law-2.5.1/src/it/unimi/dsi/law/webgraph/CompressedIntLabel.java b/third_party/law-2.5.1/src/it/unimi/dsi/law/webgraph/CompressedIntLabel.java
new file mode 100644
index 0000000..22499b9
--- /dev/null
+++ b/third_party/law-2.5.1/src/it/unimi/dsi/law/webgraph/CompressedIntLabel.java
@@ -0,0 +1,151 @@
+package it.unimi.dsi.law.webgraph;
+
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.Serializable;
+
+/*
+ * Copyright (C) 2007-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.compression.Coder;
+import it.unimi.dsi.compression.Decoder;
+import it.unimi.dsi.fastutil.ints.Int2ObjectMap;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.webgraph.labelling.AbstractIntLabel;
+
+//RELEASE-STATUS: DIST
+
+/** An integer label that uses a coder/decoder pair depending on the source node.
+ *
+ * <p>This is a kind of int label whose serialization ({@link #fromBitStream(InputBitStream, int)}
+ * and {@link #toBitStream(OutputBitStream, int)} methods) rely on a coder/decoder pair that may depend on the source node of the arc.
+ * Different constructors provide different ways to assign coders/decoders to source nodes: consult their documentation for more information.
+ *
+ * <p>More precisely, the public field {@link #nodeLabels} exposes a list of labels for nodes. Decoders are chosen depending on the label provided
+ * by the list.
+ */
+public class CompressedIntLabel extends AbstractIntLabel implements Serializable {
+ private static final long serialVersionUID = 1L;
+ public static final boolean DEBUG = false;
+ /** The node labels stream containing the node labels. */
+ public final LongBigList nodeLabels;
+ /** A map assigning the coder to be used for every given source. */
+ private final Int2ObjectMap<Coder> source2Coder;
+ /** A map assigning the decoder to be used for every given source. */
+ private final Int2ObjectMap<Decoder> source2Decoder;
+ /** If not {@code null}, this label was produced with a specific spec, that is contained here. */
+ private String labelSpec;
+
+ protected CompressedIntLabel(final String key, final int value, final String labelSpec, final LongBigList nodeLabels, final Int2ObjectMap<Coder> source2Coder, final Int2ObjectMap<Decoder> source2Decoder) {
+ super(key, value);
+ this.labelSpec = labelSpec;
+ this.nodeLabels = nodeLabels;
+ this.source2Coder = source2Coder;
+ this.source2Decoder = source2Decoder;
+ }
+
+ /** Creates a compressed integer label.
+ *
+ * @param key the key.
+ * @param value the value.
+ * @param nodeLabels the node labels.
+ * @param sourceLabel2Decoder a map assigning a decoder to every possible node label; note that the number of node labels is 2<sup><var>w</var></sup>, where <var>w</var> is <code>nodeLabelPrototype.fixedWidth()</code>.
+ * @param sourceLabel2Coder an optional map assigning a coder to every possible node label, or {@code null}; note that the number of node labels is 2<sup><var>w</var></sup>, where <var>w</var> is <code>nodeLabelPrototype.fixedWidth()</code>.
+ */
+ public CompressedIntLabel(final String key, final int value, final LongBigList nodeLabels, final Int2ObjectMap<Decoder> sourceLabel2Decoder, final Int2ObjectMap<Coder> sourceLabel2Coder) {
+ super(key, value);
+ this.nodeLabels = nodeLabels;
+ this.source2Coder = sourceLabel2Coder;
+ this.source2Decoder = sourceLabel2Decoder;
+ }
+
+ private static LongBigList loadLabels(final File nodeLabels, final int nodeWidth) throws IOException {
+ final InputBitStream ibs = new InputBitStream(nodeLabels);
+ final int n = (int)((nodeLabels.length() * Byte.SIZE) / nodeWidth);
+ final LongBigList l = LongArrayBitVector.getInstance(n * nodeWidth).asLongBigList(nodeWidth);
+ for(int i = 0; i < n; i++) l.add(ibs.readInt(nodeWidth));
+ ibs.close();
+ return l;
+ }
+
+ /** Creates a compressed integer label from a specification that includes decoders and coders.
+ *
+ * <p><strong>Warning</strong>: the entire node-label stream is loaded into memory by this method.
+ *
+ * @param key the key of this label.
+ * @param value the value of this label.
+ * @param labels the filename of the file of labels.
+ * @param nodeWidth the width in bits of a node label.
+ * @param decoders the filename of a serialised {@link Int2ObjectMap} mapping node labels to decoders.
+ * @param coders the filename of a serialised {@link Int2ObjectMap} mapping node labels to coders.
+ */
+ @SuppressWarnings("unchecked")
+ public CompressedIntLabel(final Object directory, final String key, final String value, final String labels, final String nodeWidth, final String decoders, final String coders) throws NumberFormatException, FileNotFoundException, IOException, ClassNotFoundException {
+ this(key, Integer.parseInt(value), loadLabels(new File((File)directory, labels), Integer.parseInt(nodeWidth)),
+ (Int2ObjectMap<Decoder>)BinIO.loadObject(new File((File)directory, decoders)), (Int2ObjectMap<Coder>)BinIO.loadObject(new File((File)directory, coders)));
+ this.labelSpec = key + "," + value + "," + labels + "," + nodeWidth + "," + decoders +"," + coders;
+ }
+
+ /** Creates a compressed integer label from a specification that includes just decoders.
+ *
+ * <p><strong>Warning</strong>: the entire node-label stream is loaded into memory by this method.
+ *
+ * @param key the key of this label.
+ * @param value the value of this label.
+ * @param labels the filename of the file of nodes labels.
+ * @param nodeWidth the width in bits of a node label.
+ * @param decoders the filename of a serialised {@link Int2ObjectMap} mapping node labels to decoders.
+ */
+ @SuppressWarnings("unchecked")
+ public CompressedIntLabel(final Object directory, final String key, final String value, final String labels, final String nodeWidth, final String decoders) throws NumberFormatException, FileNotFoundException, IOException, ClassNotFoundException {
+ this(key, Integer.parseInt(value), loadLabels(new File((File)directory, labels), Integer.parseInt(nodeWidth)),
+ (Int2ObjectMap<Decoder>)BinIO.loadObject(new File((File)directory, decoders)), null);
+ this.labelSpec = key + "," + value + "," + labels + "," + nodeWidth + "," + decoders;
+ }
+
+ @Override
+ public CompressedIntLabel copy() {
+ return new CompressedIntLabel(key, value, labelSpec, nodeLabels, source2Coder, source2Decoder);
+ }
+
+ @Override
+ public int fromBitStream(final InputBitStream inputBitStream, final int source) throws IOException {
+ return value = source2Decoder.get((int)nodeLabels.getLong(source)).decode(inputBitStream);
+ }
+
+ @Override
+ public int toBitStream(final OutputBitStream outputBitStream, final int source) throws IOException {
+ if (source2Coder == null) throw new UnsupportedOperationException();
+ return source2Coder.get((int)nodeLabels.getLong(source)).encode(value, outputBitStream);
+ }
+
+ @Override
+ public int fixedWidth() {
+ return -1;
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + (labelSpec == null? key : labelSpec) + ")";
+ }
+}
diff --git a/third_party/law-2.5.1/src/overview.html b/third_party/law-2.5.1/src/overview.html
new file mode 100644
index 0000000..43040e0
--- /dev/null
+++ b/third_party/law-2.5.1/src/overview.html
@@ -0,0 +1,74 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<!-- RELEASE-STATUS: DIST -->
+<html>
+ <head>
+ <title>LAW software</title>
+ </head>
+ <body>
+
+ <p>This collection contains software distributed by the
+ <a href="http://law.dsi.unimi.it/">Laboratory for Web Algorithmics</a> (<acronym title="Laboratorio di algoritmica del web">LAW</acronym>),
+ and it is usually linked to some publication. If you find our software useful
+ while working at a scientific publication,
+ please cite us properly, either using the publications quoted in the documentation,
+ or contacting us for suggestions.</p>
+
+ <p>We try to distribute everything under the
+ <a href="http://www.gnu.org/copyleft/gpl.html"><acronym title="GNU's not Unix">GNU</acronym>
+ General Public License</a> or the
+ <a href="http://www.gnu.org/copyleft/lesser.html"><acronym title="GNU's not
+ Unix">GNU</acronym> Lesser General Public License</a>.</p>
+
+ <h2>Highlights</h2>
+ <UL>
+ <LI>Statistical tools to compute efficiently {@linkplain
+ it.unimi.dsi.law.stat.KendallTau Kendall's &tau;} and the {@linkplain
+ it.unimi.dsi.law.stat.WeightedTau weighted &tau;}.
+ They include tools to {@linkplain it.unimi.dsi.law.util.Precision limit accurately
+ the precision} of the involved ranks, as the noise caused by approximation can significantly alter the computation of &tau;
+ (see &ldquo;<a href="http://drops.dagstuhl.de/opus/volltexte/2007/1072/pdf/07071.VignaSebastiano.Paper.1072.pdf">Traps and pitfalls of topic-biased PageRank</a>&rdquo;, by Paolo Boldi, Massimo Santini,
+ Roberto Posenato, and Sebastiano Vigna, in <i>WAW 2006. Fourth Workshop on Algorithms and Models for the Web-Graph</i>,
+ volume 4936 of Lecture Notes in Computer Science, pages 107&minus;116, Springer&ndash;Verlag, 2008).</LI>
+
+ <LI>The largest publicly available set of classes and documentation
+ related to spectral ranking. It includes a detailed
+ explanation of theoretical formulations and of the algorithms actually
+ implementing them. In particular, {@link it.unimi.dsi.law.rank.PageRankParallelGaussSeidel} is our
+ best-of-breed implementation of PageRank,
+ whereas {@link it.unimi.dsi.law.rank.PageRankFromCoefficients}
+ makes it possible to compute PageRank and its derivatives for every value of
+ the damping factor using the precomputed coefficients of {@linkplain it.unimi.dsi.law.rank.PageRankPowerSeries PageRank's power series} (using the results described in &ldquo;PageRank:
+ Functional dependencies&rdquo;, by Paolo Boldi, Massimo Santini, and Sebastiano Vigna, <i>ACM Trans. Inf. Sys.</i>, 27(4):1&minus;23, 2009).
+ You can also compute, for instance, the {@linkplain it.unimi.dsi.law.rank.DominantEigenvectorParallelPowerMethod dominant eigenvector}
+ and {@linkplain it.unimi.dsi.law.rank.KatzParallelGaussSeidel Katz's index}.
+ </LI>
+
+ <LI>A highly scalable implementation of the {@linkplain it.unimi.dsi.law.graph.LayeredLabelPropagation Layered Label-Propagation} algorithm.
+
+ <LI>{@link it.unimi.dsi.law.util.ConsistentHashFunction} implements
+ the consistent hash function used by <a href="http://law.dsi.unimi.it/ubicrawler/">UbiCrawler</a>.
+ </LI>
+ </UL>
+
+
+ <h2>Package Dependencies</h2>
+
+ <P>The LAW software requires Java &ge;6; it uses the <a href="http://dsiutils.dsi.unimi.it/">DSI utilities</a>,
+ <a href="http://webgraph.dsi.unimi.it/">WebGraph</a>, <a href="http://mg4j.dsi.unimi.it/">MG4J</a>,
+ and three packages providing high-performance containers and
+ algorithms, that is, <A HREF="http://fastutil.dsi.unimi.it/">fastutil</A> 6.4 or greater,
+ the <A HREF="http://www-itg.lbl.gov/~hoschek/colt/">COLT distribution</A>,
+ and <A HREF="http://sux4j.dsi.unimi.it/">Sux4J</a>. Moreover, it uses <A
+ HREF="http://www.martiansoftware.com/jsap/">JSAP</A> for line-command parsing.
+
+ The LAW software uses also a number of useful libraries from the
+ <a href="http://jakarta.apache.org/commons/">Jakarta commons project</a>,
+ including <a href="http://jakarta.apache.org/commons/collections/">collections</a>,
+ <a href="http://jakarta.apache.org/commons/lang/">lang</a>,
+ <a href="http://jakarta.apache.org/commons/configuration/">configuration</a> and
+ <a href="http://jakarta.apache.org/commons/io/">io</a>.
+ All logging is performed using <a href="http://logging.apache.org/log4j/">log4j</a>.
+ Compiling the LAW software requires <A HREF="https://javacc.dev.java.net/">javacc</A>.
+
+ </body>
+</html>
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/TestUtil.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/TestUtil.java
new file mode 100644
index 0000000..f1354eb
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/TestUtil.java
@@ -0,0 +1,254 @@
+package it.unimi.dsi.law;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.OutputStream;
+import java.util.Random;
+
+import org.apache.commons.io.IOUtils;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import junit.framework.TestCase;
+
+
+
+//RELEASE-STATUS: DIST
+
+/** A static container of utility methods for test cases. */
+public final class TestUtil {
+
+ /** The property specifying the data directory. */
+ public static final String DATA_DIR = "it.unimi.dsi.law.data";
+ /** A threshold for equality testing. */
+ public static final double EQUALS_THRESHOLD = 1E-12;
+
+ /** Cannot be instantiated. */
+ private TestUtil() {}
+
+ /** Returns a derelativised version of the given filename.
+ *
+ * <P>The general data directory must be specified in the
+ * property <code>it.unimi.dsi.law.data</code>. The directory
+ * must contain a <code>test</code> subdirectory (in which
+ * <code>name</code> will be searched for).
+ *
+ * <P>If <code>name</code> starts with a slash, it will
+ * be derelativised w.r.t. the directory <code>test</code>
+ * in <code>it.unimi.dsi.law.data</code>. Otherwise,
+ * the package name of the provided class (with dots replaced by directory separator)
+ * will be used additionally. If <code>exactMatch</code> is false, the
+ * resulting filename is returned independently of whether the file exists or not.
+ * If <code>exactMatch</code> is true, and the file is not found,
+ * then another attempt is done to locate it one level above etc., until it is found
+ * or until there are no more levels;
+ *
+ * <P>This class will directly {@link org.junit.Assert#fail()}
+ * the current test if the property is not defined, or if the path that
+ * is supposed to contain the file does not exist or does not correspond
+ * to a directory.
+ * It will return {@code null} if <code>it.unimi.dsi.law.data</code>
+ * is the empty string. Thus, you should start your tests as follows:
+ *
+ * <pre>
+ * String filename = TestUtil.getTestFile(...);
+ * if (filename == null) return;
+ * </pre>
+ *
+ * @param klass the class performing the derelativisation.
+ * @param name a filename (to be found in the test data directory).
+ * @param exactMatch if true, the file must exist; if it does not exist, the
+ * hierarchy is scanned up to look for the file, as explained above.
+ * @return the derelativised filename, or {@code null}
+ * if <code>it.unimi.dsi.law.data</code> is not set.
+ */
+ public static String getTestFile(final Class<?> klass, final String name, boolean exactMatch) {
+
+ final String dataDirName = System.getProperty(DATA_DIR);
+
+ if (dataDirName == null) TestCase.fail(DATA_DIR + " is not defined");
+ else if (dataDirName.length() == 0) return null;
+
+ File testDir = new File(dataDirName);
+ File result;
+
+ if (name.charAt(0) != '/') {
+ final String[] piece = klass.getName().split("\\.");
+ int numberOfPieces = piece.length - 1;
+
+ File actualDir;
+ String firstAttempt = null;
+ do {
+ actualDir = testDir;
+ // Note that we skip "test".
+ for(int i = 0; i < numberOfPieces; i++) actualDir = new File(actualDir, piece[i]);
+ result = new File(actualDir, name);
+ if (!exactMatch) return result.toString();
+ if (firstAttempt == null) firstAttempt = result.toString();
+ if (! result.exists() && numberOfPieces > 0) numberOfPieces--;
+ } while (! result.exists() && numberOfPieces > 0);
+ if (! result.exists()) TestCase.fail(firstAttempt + " does not exist (not even in the rest of the hierarchy up to " + actualDir + ")");
+ } else
+ result = new File(testDir, name);
+ return result.toString();
+
+ }
+
+ /** Returns a random vector of given size, using the provided {@link java.util.Random} object.
+ *
+ * @param n the vector size.
+ * @param random the random number generator to be used (its {@link Random#nextDouble()} method will be called).
+ */
+ public static int[] randomIntVector(final int n, final Random random) {
+ int[] a = new int[n];
+ for (int i = 0; i < n; i++)
+ a[i] = random.nextInt();
+ return a;
+ }
+
+ /** Returns a random vector of given size and whose values are all in the range [<var>min</var>,<var>max</var>),
+ * using the provided {@link java.util.Random} object.
+ *
+ * @param n the vector size.
+ * @param min the minimum of the range.
+ * @param max the maximum of the range.
+ * @param seed the seed to be used to create random numbers.
+ */
+ public static int[] randomIntVector(final int n, final int min, final int max, final int seed) {
+ class MyRandom extends Random {
+ private static final long serialVersionUID = 1L;
+ public MyRandom() {
+ super(seed);
+ }
+ public int nextInt() {
+ return min + super.nextInt(max - min);
+ }
+ }
+ return randomIntVector(n, new MyRandom());
+ }
+
+ /** Returns a random vector of given size, using the provided {@link java.util.Random} object.
+ *
+ * @param n the vector size.
+ * @param random the random number generator to be used (its {@link Random#nextDouble()} method will be called).
+ */
+ public static double[] randomDoubleVector(final int n, final Random random) {
+ double[] a = new double[n];
+ for (int i = 0; i < n; i++)
+ a[i] = random.nextDouble();
+ return a;
+ }
+
+ /** Returns a random vector of given size and whose values are all in the range [<var>min</var>,<var>max</var>),
+ * using the provided {@link java.util.Random} object.
+ *
+ * @param n the vector size.
+ * @param min the minimum of the range.
+ * @param max the maximum of the range.
+ * @param seed the seed to be used to create random numbers.
+ */
+ public static double[] randomDoubleVector(final int n, final double min, final double max, final int seed) {
+ class MyRandom extends Random {
+ private static final long serialVersionUID = 1L;
+ public MyRandom() {
+ super(seed);
+ }
+ public double nextDouble() {
+ return min + super.nextDouble() * (max - min);
+ }
+ }
+ return randomDoubleVector(n, new MyRandom());
+ }
+
+ /** Returns a random matrix with given size, using the provided {@link java.util.Random} object.
+ *
+ * @param rows the number of rows (first index).
+ * @param columns the number of columns (second index).
+ * @param random the random number generator to be used (its {@link Random#nextDouble()} method will be called).
+ */
+ public static double[][] randomDoubleMatrix(int rows, int columns, Random random) {
+ double[][] a = new double[rows][columns];
+ for (int r = 0; r < rows; r++)
+ for (int c = 0; c < columns; c++)
+ a[r][c] = random.nextDouble();
+ return a;
+ }
+
+ /** Returns a random matrix with given size and whose values are all in the range [<var>min</var>,<var>max</var>).
+ *
+ * @param rows the number of rows (first index).
+ * @param columns the number of columns (second index).
+ * @param min the minimum of the range.
+ * @param max the maximum of the range.
+ * @param seed the seed to be used to create random numbers.
+ */
+ public static double[][] randomDoubleMatrix(final int rows, final int columns, final double min, final double max, final int seed) {
+ class MyRandom extends Random {
+ private static final long serialVersionUID = 1L;
+ public MyRandom() {
+ super(seed);
+ }
+ public double nextDouble() {
+ return min + super.nextDouble() * (max - min);
+ }
+ }
+ return randomDoubleMatrix(rows, columns, new MyRandom());
+ }
+
+ /** Returns the norm of the componentwise difference between two vectors.
+ *
+ * @param v0 the first vector.
+ * @param v1 the second vector.
+ * @return the norm.
+ */
+ public static double normOfDifference(final double[] v0, final double[] v1) {
+ if (v0.length != v1.length) throw new IllegalArgumentException();
+ double s = 0.0;
+ for (int i = v0.length - 1; i >= 0; i--) {
+ double d = v0[i] - v1[i];
+ s += d * d;
+ }
+ return Math.sqrt(s);
+ }
+
+ /**
+ * Duplicates a given input {@link MeasurableInputStream} both coping it to
+ * a given {@link OutputStream} and also returning it as a
+ * {@link FastByteArrayInputStream}.
+ *
+ * @param in the input stream.
+ * @param out where to copy.
+ * @return a byte array buffered copy of the input stream.
+ * @throws IOException
+ */
+ public static MeasurableInputStream tee(MeasurableInputStream in, OutputStream out) throws IOException {
+ FastByteArrayOutputStream tmp = new FastByteArrayOutputStream();
+ IOUtils.copy(in, tmp);
+ FastByteArrayInputStream copy = new FastByteArrayInputStream(tmp.array, 0, tmp.length);
+ IOUtils.copy(copy, out);
+ copy.position(0);
+ return copy;
+ }
+
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/big/rank/PageRankParallelGaussSeidelTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/big/rank/PageRankParallelGaussSeidelTest.java
new file mode 100644
index 0000000..9f2edc8
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/big/rank/PageRankParallelGaussSeidelTest.java
@@ -0,0 +1,417 @@
+package it.unimi.dsi.law.big.rank;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.BeforeClass;
+import org.junit.Ignore;
+import org.junit.Test;
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrayBigList;
+import it.unimi.dsi.fastutil.doubles.DoubleBigArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.rank.PowerSeries;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PageRankParallelGaussSeidelTest {
+ final static String GRAPH_NAME = "test50-.6-7-3-2-10-graph";
+ static double[][] exactResult;
+ static double[][] preference;
+ static long n;
+ static String baseNameGraph;
+ static String baseNamePreference;
+ static ImmutableGraph g;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(it.unimi.dsi.law.rank.PageRankParallelGaussSeidelTest.class, GRAPH_NAME, false);
+ baseNamePreference = baseNameGraph + "-preferenceVector";
+
+ g = ImmutableGraph.load(baseNameGraph + "T"); // I need the transposed graph!
+ n = g.numNodes();
+ exactResult = DoubleBigArrays.newBigArray(n);
+ preference = DoubleBigArrays.newBigArray(n);
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ System.out.println("rank without preference vector");
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = null;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-w.out", exactResult);
+ System.out.println(Arrays.toString(exactResult));
+ System.out.println(Arrays.toString(pr.rank));
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleBigArrayBigList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(final int p: new int[] { 10, 50, 100 }) {
+ for(final int k: new int[] { 10, 50, 100 }) {
+ final ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ final it.unimi.dsi.webgraph.ImmutableGraph g = mg.immutableView();
+
+ final PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(final double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(ImmutableGraph.wrap(g));
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank[0];
+ final double[] expected = new double[k + p];
+ final double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;) expected[i] = (k - 1) * (k - alpha * k + alpha * r) / (k * (k - 1 - alpha * (k - 2)));
+ expected[k] = 2 + 2 * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int d = 1; d < p; d++) expected[k + d] = 1 + pow(alpha, d) * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(DoubleBigArrays.wrap(w.previousRank), w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(final int p: new int[] { 10, 50, 100 }) {
+ for(final int k: new int[] { 10, 50, 100 }) {
+ final ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ final it.unimi.dsi.webgraph.ImmutableGraph g = mg.immutableView();
+
+ final PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(final double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(ImmutableGraph.wrap(g));
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank[0];
+ final double[] expected = new double[k + p];
+
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (2 * (k - 1) - 2 * (k - 2) * alpha - alpha * alpha) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 2) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+
+ expected[k - 1] = (2 * (k - 1) - (k - 3) * alpha - alpha * alpha * k) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 1) * (k - 1 - alpha * (k - 2)) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 - pow(alpha, d + (d == 0? p : 0)) / (2 - pow(alpha, p));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(DoubleBigArrays.wrap(w.previousRank), w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(final int p: new int[] { 10, 50, 100 }) {
+ for(final int k: new int[] { 10, 50, 100 }) {
+ final ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ final it.unimi.dsi.webgraph.ImmutableGraph g = mg.immutableView();
+
+ final PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(final double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(ImmutableGraph.wrap(g));
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank[0];
+ final double[] expected = new double[k + p];
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (1 - alpha) * (alpha + k) * (k - 1) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+
+ expected[k - 1] = k * (1 - alpha) * (k - 1 + alpha) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 + (pow(alpha, d + 1) * (1 - alpha) * (k - 1 + alpha)) / ((1 - pow(alpha, p)) * ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2)));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(DoubleBigArrays.wrap(w.previousRank), w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(final int p: new int[] { 10, 50, 100 }) {
+ for(final int k: new int[] { 10, 50, 100 }) {
+ final ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ final it.unimi.dsi.webgraph.ImmutableGraph g = mg.immutableView();
+
+ final PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(final double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(ImmutableGraph.wrap(g));
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank[0];
+ final double[] expected = new double[k + p];
+ for(int i = k + p; i-- != 0;)
+ expected[i] = 1;
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(DoubleBigArrays.wrap(w.previousRank), w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+
+ @Test
+ @Ignore("Needs lots of RAM")
+ public void testBig() throws Exception {
+ final long n = 1L << 31;
+ final double rank = 1. / n;
+
+ final ImmutableGraph g = new ImmutableGraph() {
+ @Override
+ public long numNodes() {
+ return n;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return true;
+ }
+
+ @Override
+ public long outdegree(final long x) {
+ return 1;
+ }
+
+ @Override
+ public long[][] successorBigArray(final long x) {
+ return new long[][] { { (x + 1) % n } };
+ }
+
+ @Override
+ public ImmutableGraph copy() {
+ return this;
+ }
+ };
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = null;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = 0; i < n; i++) assertEquals(DoubleBigArrays.get(pr.rank, i), rank, threshold);
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/bubing/util/MockResponses.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/bubing/util/MockResponses.java
new file mode 100644
index 0000000..c1f2e8b
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/bubing/util/MockResponses.java
@@ -0,0 +1,122 @@
+package it.unimi.dsi.law.bubing.util;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.net.URI;
+import java.util.Random;
+
+import org.apache.http.Header;
+import org.apache.http.ProtocolVersion;
+import org.apache.http.StatusLine;
+import org.apache.http.message.BasicStatusLine;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.DigestBasedDuplicateDetection;
+import it.unimi.dsi.law.warc.util.MetadataHttpResponse;
+
+// RELEASE-STATUS: DIST
+
+/** A class with mock implementations of responses. */
+
+public class MockResponses {
+ /** A response whose content is really generated at random. */
+ static public class MockRandomHttpResponse extends MetadataHttpResponse {
+ private final static int DEFAULT_MAX_CONTENT_LENGTH = 1024;
+ private static int calls = 0;
+ private final byte[] content;
+
+ public MockRandomHttpResponse(final Random random) {
+ this(random, DEFAULT_MAX_CONTENT_LENGTH);
+ }
+
+ @SuppressWarnings("deprecation")
+ public MockRandomHttpResponse(final Random random, int maxContentLength) {
+ uri = BURL.parse("http" + (random.nextBoolean() ? "s" : "") + "://this.is/n" + calls++ + "/test.html");
+ statusLine = new BasicStatusLine(new ProtocolVersion("HTTP", 1, 0), 200, "OK");
+ content = new byte[random.nextInt(maxContentLength) + 1];
+ random.nextBytes(content);
+ headerMap.clear();
+ headerMap.put("content-type", "text/html");
+ }
+
+ public MeasurableInputStream expectedContentAsStream() {
+ return new FastByteArrayInputStream(content);
+ }
+
+ public MeasurableInputStream contentAsStream() {
+ return new FastByteArrayInputStream(content);
+ }
+
+ public boolean fromWarcRecord(WarcRecord wr) throws IOException {
+ throw new UnsupportedOperationException();
+ }
+ }
+
+ /** An HTTP response with given status, possibly some given headers, and a given content, extracted from a string. */
+ public static class MockHttpResponseFromString extends MetadataHttpResponse implements DigestBasedDuplicateDetection {
+ private String content;
+ private boolean isDuplicate;
+ private byte[] digest;
+
+ public MockHttpResponseFromString(final StatusLine statusLine, final Header[] header, final URI uri, final String content) {
+ this.statusLine = statusLine;
+ this.uri = uri;
+ this.headerMap.clear();
+ headerMap.put("content-type", "text/html; charset=iso-8859-1");
+ for (Header h: header) headerMap.put(h.getName(), h.getValue());
+ this.content = content;
+ }
+
+ public MeasurableInputStream contentAsStream() {
+ try {
+ return new FastByteArrayInputStream(content.getBytes("ISO-8859-1"));
+ } catch (UnsupportedEncodingException e) {
+ e.printStackTrace();
+ throw new RuntimeException(e);
+ }
+ }
+
+ public boolean fromWarcRecord(WarcRecord wr) throws IOException {
+ throw new UnsupportedOperationException();
+ }
+
+ public void digest(byte[] digest) {
+ this.digest = digest;
+ }
+
+ public byte[] digest() {
+ return digest;
+ }
+
+ public void isDuplicate(boolean isDuplicate) {
+ this.isDuplicate = isDuplicate;
+ }
+
+ public boolean isDuplicate() {
+ return isDuplicate;
+ }
+ }
+
+
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/BFSTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/BFSTest.java
new file mode 100644
index 0000000..780bb86
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/BFSTest.java
@@ -0,0 +1,55 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertTrue;
+
+import java.util.Collections;
+import java.util.Random;
+
+import org.junit.Test;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class BFSTest {
+ @Test
+ public void testStartPerm() {
+ for (int i = 100; i <= 1000; i += 100) {
+ ImmutableGraph g = new ArrayListMutableGraph(Transform.symmetrize(new ErdosRenyiGraph(i, .02, 0, false))).immutableView();
+ final int[] startPerm = Util.identity(new int[g.numNodes()]);
+ Collections.shuffle(IntArrayList.wrap(startPerm), new Random(0));
+ ImmutableGraph mg = Transform.map(g, startPerm);
+
+
+ int[] perm0 = Util.invertPermutationInPlace(BFS.bfsperm(g, -1, startPerm));
+ int[] perm1 = Util.invertPermutationInPlace(BFS.bfsperm(mg, -1, Util.identity(i)));
+
+ assertTrue(Transform.map(g, perm0).equals(Transform.map(mg, perm1)));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/DFSTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/DFSTest.java
new file mode 100644
index 0000000..47758ff
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/DFSTest.java
@@ -0,0 +1,55 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertTrue;
+
+import java.util.Collections;
+import java.util.Random;
+
+import org.junit.Test;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class DFSTest {
+ @Test
+ public void testStartPerm() {
+ for (int i = 100; i <= 1000; i += 100) {
+ ImmutableGraph g = new ArrayListMutableGraph(Transform.symmetrize(new ErdosRenyiGraph(i, .02, 0, false))).immutableView();
+ final int[] startPerm = Util.identity(new int[g.numNodes()]);
+ Collections.shuffle(IntArrayList.wrap(startPerm), new Random(0));
+ ImmutableGraph mg = Transform.map(g, startPerm);
+
+
+ int[] perm0 = Util.invertPermutationInPlace(DFS.dfsperm(g, startPerm));
+ int[] perm1 = Util.invertPermutationInPlace(DFS.dfsperm(mg, Util.identity(i)));
+
+ assertTrue(Transform.map(g, perm0).equals(Transform.map(mg, perm1)));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/LayeredLabelPropagationTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/LayeredLabelPropagationTest.java
new file mode 100644
index 0000000..07b9d02
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/LayeredLabelPropagationTest.java
@@ -0,0 +1,65 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.Test;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class LayeredLabelPropagationTest {
+
+ @Test
+ public void testStartPerm() throws IOException {
+
+ for(int i = 100; i <= 1000; i += 100) {
+ final XoRoShiRo128PlusRandom random = new XoRoShiRo128PlusRandom(0);
+ final ImmutableGraph g = new ArrayListMutableGraph(Transform.symmetrize(new ErdosRenyiGraph(i, .02, 0,false))).immutableView();
+ final int[] startPerm = IntArrays.shuffle(Util.identity(g.numNodes()), random);
+ final ImmutableGraph mg = new ArrayListMutableGraph(Transform.map(g, startPerm)).immutableView();
+
+ final LayeredLabelPropagation clustering0 = new LayeredLabelPropagation(g, startPerm, 1, 0, true);
+ final LayeredLabelPropagation clustering1 = new LayeredLabelPropagation(mg, Util.identity(g.numNodes()), 1, 0, true);
+ final LayeredLabelPropagation clustering2 = new LayeredLabelPropagation(mg, null, 1, 0, true);
+
+ final double[] gammas = { 1/16., 1/32., 1/64., 1/128., 1/256. };
+ final int[] perm0 = clustering0.computePermutation(gammas, null, Integer.MAX_VALUE);
+ final int[] perm1 = clustering1.computePermutation(gammas, null, Integer.MAX_VALUE);
+ final int[] perm2 = clustering2.computePermutation(gammas, null, Integer.MAX_VALUE);
+
+ assertTrue(Arrays.equals(perm1, perm2));
+
+ assertTrue( Transform.map(g, perm0).equals(Transform.map(mg, perm1)));
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/RemoveHubsTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/RemoveHubsTest.java
new file mode 100644
index 0000000..a820cfd
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/graph/RemoveHubsTest.java
@@ -0,0 +1,138 @@
+package it.unimi.dsi.law.graph;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+
+
+//RELEASE-STATUS: DIST
+
+public class RemoveHubsTest {
+
+ @Test
+ public void testTwoStar() throws IOException {
+ final int n = 100;
+ ArrayListMutableGraph g = new ArrayListMutableGraph();
+ g.addNodes(n);
+ for(int i = 0; i < n-2; i++) {
+ g.addArc(n-2, i);
+ g.addArc(i, n-1);
+ }
+ int[] perm = RemoveHubs.largestOutdegree(g.immutableView());
+ assertEquals(perm[0], n-1);
+ assertEquals(perm[n-1], n-2);
+
+ perm = RemoveHubs.largestIndegree(Transform.transpose(g.immutableView()));
+ assertEquals(perm[n - 1], n - 1);
+
+ perm = RemoveHubs.pageRank(Transform.transpose(g.immutableView()));
+
+ assertEquals(perm[0], n-2);
+ assertEquals(perm[n-1], n-1);
+ }
+
+ @Test
+ public void testStar() throws IOException {
+ final int n = 100;
+ ArrayListMutableGraph g = new ArrayListMutableGraph();
+ g.addNodes(n);
+ for(int i = 0; i < n-1; i++) {
+ g.addArc(n-1, i);
+ g.addArc(i, n-1);
+ }
+
+ int[] perm = RemoveHubs.symPageRank(g.immutableView(), g.immutableView());
+ assertEquals(perm[n-1], n-1);
+
+ char[] slash = new char[(n * (n+1)) / 2 + n];
+ int pos = 0;
+ for(int i = 0; i < n; i++) {
+ for(int j = 0; j < i+1; j++)
+ slash[pos++] = '/';
+ slash[pos++] = '\n';
+ }
+
+ perm = RemoveHubs.url(g.immutableView(), new FastBufferedReader(slash));
+ for(int i = 0; i < n; i++)
+ assertEquals(perm[i], n-1-i);
+
+ double[] fraction = new double[1];
+ fraction[0] = 0.5;
+ int[] cut = RemoveHubs.store(g.immutableView(), g.immutableView(), fraction, perm, null, null);
+ assertEquals(cut[0], n/2-1);
+ }
+
+ @Test
+ public void testUrl() throws IOException {
+ ArrayListMutableGraph g = new ArrayListMutableGraph();
+ g.addNodes(3);
+ StringBuffer bf = new StringBuffer();
+ bf.append("http://www.skwigly.co.uk/banner\n");
+ bf.append("http://www.skwigly.co.uk/\n");
+ bf.append("http://www.skwigly.co.uk/dir\n");
+ int[] perm = RemoveHubs.url(g.immutableView(), new FastBufferedReader(bf.toString().toCharArray()));
+ assertEquals(perm[2], 1);
+ }
+
+ @Test
+ public void testTwoClique() throws IOException {
+ final int n = 100;
+ ArrayListMutableGraph g = new ArrayListMutableGraph();
+ g.addNodes(2*n);
+ for(int i = 0; i < n; i++)
+ for(int j = i+1; j < n; j++) {
+ g.addArc(i, j);
+ g.addArc(j, i);
+ g.addArc(i+n, j+n);
+ g.addArc(j+n, i+n);
+ }
+
+ g.addArc(n-1, n);
+ g.addArc(n, n-1);
+
+ int[] perm = RemoveHubs.labelPropagation(Transform.symmetrize(g.immutableView()));
+
+ assertTrue(perm[2*n-1] == n || perm[2*n - 1] == n-1);
+ assertTrue(perm[2*n-2] == n || perm[2*n - 2] == n-1);
+ }
+
+ @Test
+ public void testRandom() {
+ final int n = 100000;
+ ArrayListMutableGraph g = new ArrayListMutableGraph();
+ g.addNodes(n);
+ int[] perm = RemoveHubs.random(g.immutableView());
+ double sum = 0;
+ for(int i = 0; i < n-1; i++)
+ if(perm[i] > perm[i + 1])
+ sum++;
+ sum /= n-1;
+ assertTrue(sum < 0.6 && sum > 0.4);
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/DominantEigenvectorParallelPowerMethodTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/DominantEigenvectorParallelPowerMethodTest.java
new file mode 100644
index 0000000..9ed191d
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/DominantEigenvectorParallelPowerMethodTest.java
@@ -0,0 +1,270 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.apache.commons.math.linear.Array2DRowRealMatrix;
+import org.apache.commons.math.linear.EigenDecomposition;
+import org.apache.commons.math.linear.EigenDecompositionImpl;
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class DominantEigenvectorParallelPowerMethodTest {
+
+ @Test
+ public void testCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int size: new int[] { 100, 1000, 10000 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ final DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(ArrayListMutableGraph.newDirectedCycle(size).immutableView(), NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ assertEquals(1, dominant.lambda, threshold);
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 1);
+ dominant.norm.normalize(expected, 1);
+ for(int i = dominant.graph.numNodes(); i-- != 0;) assertEquals(expected[i], dominant.rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testClique() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int size: new int[] { 10, 100, 1000 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ final DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(ArrayListMutableGraph.newCompleteGraph(size, false).immutableView(), NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ assertEquals(size - 1, dominant.lambda, threshold);
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 1);
+ dominant.norm.normalize(expected, 1);
+ for(int i = dominant.graph.numNodes(); i-- != 0;) assertEquals(expected[i], dominant.rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-4; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(p + k, p + k);
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = dominant.rank;
+ double lambda = dominant.lambda;
+
+ double ratio = rank[k] / (lambda + 1);
+ assertEquals(ratio * (1 + 1 / (lambda + 1 - k)), rank[k - 1], threshold);
+ for(int i = k - 1; i-- != 0;) assertEquals(ratio / (lambda + 1 - k), rank[i], threshold);
+ for(int d = 1; d < p - 1; d++) assertEquals(ratio * (lambda + 1) / pow(lambda, d), rank[k + d], threshold);
+
+
+ DominantEigenvectorParallelPowerMethod markovian = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ markovian.markovian = true;
+ markovian.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ rank = markovian.rank;
+ Norm.L_1.normalize(rank, 1);
+ assertEquals(1, markovian.lambda, threshold);
+ assertEquals(2. / (k * (k - 1) + p + 2), rank[k], threshold);
+ assertEquals(k / (k * (k - 1) + p + 2.), rank[k - 1], threshold);
+ for(int i = k - 1; i-- != 0;) assertEquals((k - 1) / (k * (k - 1) + p + 2.), rank[i], threshold);
+ for(int d = 1; d < p; d++) assertEquals(1 / (k * (k - 1) + p + 2.), rank[k + d], threshold);
+
+ // Test lambda on symmetric
+ for(int i = 0; i < p; i++) {
+ mg.addArc(k + i, k + (i + 1) % p);
+ m.setEntry(k + i, k + (i + 1) % p, 1);
+ }
+ EigenDecomposition s = new EigenDecompositionImpl(m, 0);
+ dominant = new DominantEigenvectorParallelPowerMethod(mg.immutableView(), NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ assertEquals(s.getRealEigenvalue(0), dominant.lambda, threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-4; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(p + k, p + k);
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = dominant.rank;
+
+ double ratio = rank[0] * k * (k - 2) / ((k - 1) * (k - 1));
+
+ for(int i = k - 2; i-- != 0;) assertEquals(ratio * (k - 1) * (k - 1) / (k * (k - 2)) , rank[i], threshold);
+ for(int d = 0; d < p; d++) assertEquals(0, rank[k + d], threshold);
+
+
+ DominantEigenvectorParallelPowerMethod markovian = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ markovian.markovian = true;
+ markovian.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ rank = markovian.rank;
+ Norm.L_1.normalize(rank, 1);
+ assertEquals(1, markovian.lambda, threshold);
+ ratio = rank[0] * k * (k - 2) / ((k - 1) * (k - 1));
+
+ for(int i = k - 2; i-- != 0;) assertEquals(ratio * (k - 1) * (k - 1) / (k * (k - 2)) , rank[i], threshold);
+ for(int d = 0; d < p; d++) assertEquals(0, rank[k + d], threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-4; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = dominant.rank;
+
+ double ratio = rank[0];
+ for(int i = k - 1; i-- != 0;) assertEquals(ratio * 1, rank[i], threshold);
+ for(int d = 0; d < p; d++) assertEquals(ratio * pow(k - 1 , p - d - 1) / (pow(k - 1, p) - 1), rank[k + d], threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-4; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ ImmutableGraph g = mg.immutableView();
+
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.norm = norm;
+ dominant.shift = shift;
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = dominant.rank;
+
+ double ratio = rank[0];
+ for(int i = k - 1; i-- != 0;) assertEquals(ratio * 1, rank[i], threshold);
+ for(int d = 0; d < p; d++) assertEquals(ratio * 0, rank[k + d], threshold);
+
+ }
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/KatzParallelGaussSeidelTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/KatzParallelGaussSeidelTest.java
new file mode 100644
index 0000000..d13202f
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/KatzParallelGaussSeidelTest.java
@@ -0,0 +1,278 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class KatzParallelGaussSeidelTest {
+
+ @Test
+ public void testCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ for(int size: new int[] { 100, 1000, 10000 }) {
+ final KatzParallelGaussSeidel katz = new KatzParallelGaussSeidel(ArrayListMutableGraph.newDirectedCycle(size).immutableView());
+ katz.alpha = alpha;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = katz.graph.numNodes(); i-- != 0;) assertEquals(1 / (1 - katz.alpha), katz.rank[i], threshold);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testClique() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ for(int size: new int[] { 10, 100, 1000 }) {
+ final KatzParallelGaussSeidel katz = new KatzParallelGaussSeidel(ArrayListMutableGraph.newCompleteGraph(size, false).immutableView());
+ katz.alpha = alpha / (size - 1);
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = katz.graph.numNodes(); i-- != 0;) assertEquals(1 / (1 - katz.alpha * (size - 1)), katz.rank[i], threshold);
+
+ final KatzParallelGaussSeidel katz2 = new KatzParallelGaussSeidel(ArrayListMutableGraph.newCompleteGraph(size, true).immutableView());
+ katz2.alpha = alpha / size;
+ katz2.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = katz.graph.numNodes(); i-- != 0;) assertEquals(1 / (1 - katz2.alpha * size), katz2.rank[i], threshold);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ // Compute dominant eigenvalue
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ double lambda = dominant.lambda;
+
+ PowerSeries w = new PowerSeries(g);
+ w.alpha = .8 / lambda;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ KatzParallelGaussSeidel katz = new KatzParallelGaussSeidel(g);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = katz.rank;
+ double[] expected = new double[rank.length];
+
+ double r = expected[k - 1] = rank[k - 1];
+ for(int i = k - 1; i-- != 0;) expected[i] = (1 + katz.alpha * r) / (1 - katz.alpha * (k - 2));
+ for(int d = 0; d < p; d++) expected[k + d] = 1 / (1 - katz.alpha) + r * pow(katz.alpha, d + 1) / (1 - pow(katz.alpha, p));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+
+ katz.normVector(w.previousRank, w.maxRatio);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ rank = katz.rank;
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ // Compute dominant eigenvalue
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ double lambda = dominant.lambda;
+
+ PowerSeries w = new PowerSeries(g);
+ w.alpha = .8 / lambda;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ KatzParallelGaussSeidel katz = new KatzParallelGaussSeidel(g);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = katz.rank;
+ double[] expected = new double[rank.length];
+
+ for (int i = k - 1; i-- != 0;)
+ expected[i] = -1 / ((1 - katz.alpha) * (katz.alpha * katz.alpha * (k - 1) + katz.alpha * (k - 2) - 1));
+ expected[k - 1] = (katz.alpha * katz.alpha * (k - 1) - katz.alpha - 1) / ((1 - katz.alpha) * (katz.alpha * katz.alpha * (k - 1) + katz.alpha * (k - 2) - 1));
+ for (int d = 0; d < p; d++)
+ expected[k + d] = 1 / (1 - katz.alpha);
+
+ for(int i = 0; i < rank.length; i++)
+ assertEquals(expected[i], rank[i], threshold);
+
+ katz.normVector(w.previousRank, w.maxRatio);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ rank = katz.rank;
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ // Compute dominant eigenvalue
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ double lambda = dominant.lambda;
+
+ PowerSeries w = new PowerSeries(g);
+ w.alpha = .8 / lambda;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ KatzParallelGaussSeidel katz = new KatzParallelGaussSeidel(g);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = katz.rank;
+ double[] expected = new double[rank.length];
+
+ for (int i = k; i-- != 0;)
+ expected[i] = 1 / (1 - (k - 1) * katz.alpha);
+ for (int d = 0; d < p; d++)
+ expected[k + d] = 1 / (1 - katz.alpha) + pow(katz.alpha, d + 1) / ((1 - (k - 1) * katz.alpha) * (1 - pow(katz.alpha, p)));
+
+ for(int i = 0; i < rank.length; i++)
+ assertEquals(expected[i], rank[i], threshold);
+
+ katz.normVector(w.previousRank, w.maxRatio);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ rank = katz.rank;
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ ImmutableGraph g = mg.immutableView();
+
+ // Compute dominant eigenvalue
+ DominantEigenvectorParallelPowerMethod dominant = new DominantEigenvectorParallelPowerMethod(g, NOPLogger.NOP_LOGGER);
+ dominant.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ double lambda = dominant.lambda;
+
+ PowerSeries w = new PowerSeries(g);
+ w.alpha = .8 / lambda;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ KatzParallelGaussSeidel katz = new KatzParallelGaussSeidel(g);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = katz.rank;
+ double[] expected = new double[rank.length];
+
+ for (int i = k; i-- != 0;)
+ expected[i] = 1 / (1 - (k - 1) * katz.alpha);
+ for (int d = 0; d < p; d++)
+ expected[k + d] = 1 / (1 - katz.alpha);
+
+ for(int i = 0; i < rank.length; i++)
+ assertEquals(expected[i], rank[i], threshold);
+
+ katz.normVector(w.previousRank, w.maxRatio);
+ katz.alpha = alpha / lambda;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ rank = katz.rank;
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/LeftSingularVectorParallelPowerMethodTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/LeftSingularVectorParallelPowerMethodTest.java
new file mode 100644
index 0000000..6a27247
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/LeftSingularVectorParallelPowerMethodTest.java
@@ -0,0 +1,418 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.apache.commons.math.linear.Array2DRowRealMatrix;
+import org.apache.commons.math.linear.EigenDecomposition;
+import org.apache.commons.math.linear.EigenDecompositionImpl;
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+//RELEASE-STATUS: DIST
+
+public class LeftSingularVectorParallelPowerMethodTest {
+
+ @Test
+ public void testArc() throws IOException {
+ final ImmutableGraph graph = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final ImmutableGraph transpose = new ArrayListMutableGraph(Transform.transpose(graph)).immutableView();
+
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, transpose, NOPLogger.NOP_LOGGER);
+ lsv.norm = Norm.L_INFINITY;
+ lsv.shift = shift;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(1E-5 / 100));
+
+ assertEquals(0, lsv.rank[0], 1E-5);
+ assertEquals(1, lsv.rank[1], 1E-5);
+ }
+ }
+
+ @Test
+ public void testSalsaConnectedM() throws IOException {
+ final ImmutableGraph graph = new ArrayListMutableGraph(6, new int[][] { { 0, 1 }, { 0, 2 }, { 0, 4 }, { 3, 1 }, { 3, 5 }, { 4, 0 }, { 4, 3 }, { 5, 3 }, { 5, 4 } }).immutableView();
+ final ImmutableGraph transpose = new ArrayListMutableGraph(Transform.transpose(graph)).immutableView();
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, transpose, NOPLogger.NOP_LOGGER);
+ lsv.norm = norm;
+ lsv.salsa = true;
+ lsv.shift = shift;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 20));
+
+ double[] expected = { 1, 2, 1, 2, 2, 1 };
+ norm.normalize(expected, 1);
+ for (int i = 0; i < expected.length; i++) assertEquals(expected[i], lsv.rank[i], threshold);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testSalsaNonconnectedM() throws IOException {
+ final ImmutableGraph graph = new ArrayListMutableGraph(6, new int[][] { { 0, 1 }, { 0, 2 }, { 0, 4 }, { 3, 1 }, { 3, 5 }, { 4, 0 }, { 4, 3 }, { 5, 3 } }).immutableView();
+ final ImmutableGraph transpose = new ArrayListMutableGraph(Transform.transpose(graph)).immutableView();
+
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, transpose, NOPLogger.NOP_LOGGER);
+ lsv.norm = norm;
+ lsv.salsa = true;
+ lsv.shift = shift;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+
+ double[] indegree = { 1. / 3, 2. / 5, 1. / 5, 2. / 3, 1. / 5, 1. / 5 };
+ double[] ccSize = { 2, 4, 4, 2, 4, 4 };
+ double[] expected = new double[indegree.length];
+ for (int i = 0; i < indegree.length; i++) expected[i] = indegree[i] * ccSize[i] / 6.0;
+ lsv.norm.normalize(expected, 1);
+ for (int i = 0; i < indegree.length; i++) assertEquals(expected[i] , lsv.rank[i], threshold);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int size: new int[] { 100, 1000, 10000 }) {
+ final ImmutableGraph bidirectionalCycle = ArrayListMutableGraph.newBidirectionalCycle(size).immutableView();
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(bidirectionalCycle, bidirectionalCycle, NOPLogger.NOP_LOGGER);
+ lsv.norm = norm;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 1);
+ norm.normalize(expected, 1);
+ for(int i = lsv.graph.numNodes(); i-- != 0;) assertEquals(expected[i], lsv.rank[i], threshold);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testClique() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int size: new int[] { 10, 100, 1000 }) {
+ final ImmutableGraph clique = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(clique, clique, NOPLogger.NOP_LOGGER);
+ lsv.norm = norm;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 1);
+ lsv.norm.normalize(expected, 1);
+ for(int i = lsv.graph.numNodes(); i-- != 0;) assertEquals(expected[i], lsv.rank[i], threshold);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testRandomSymmetric() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int size: new int[] { 10, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ // TODO refactor when symmetrize will return a copiable graph
+ final ImmutableGraph graph = new ArrayListMutableGraph(Transform.symmetrize(new ErdosRenyiGraph(size, .3, 0, false))).immutableView();
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, graph, NOPLogger.NOP_LOGGER);
+ lsv.norm = norm;
+ lsv.shift = shift;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(graph.numNodes(), graph.numNodes());
+ for(NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+ EigenDecomposition s = new EigenDecompositionImpl(m, 0);
+ double[] qrDecomp = s.getEigenvector(0).toArray();
+ qrDecomp = norm.normalize(qrDecomp, 1);
+ for (int i = 0; i < graph.numNodes(); i++) qrDecomp[i] = Math.abs(qrDecomp[i]);
+
+ for(int i = lsv.graph.numNodes(); i-- != 0;) assertEquals(qrDecomp[i], lsv.rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testRandom() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int size: new int[] { 10, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ // TODO refactor when symmetrize will return a copiable graph
+ final ImmutableGraph graph = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .3, 0, false)).immutableView();
+ final ImmutableGraph transpose = new ArrayListMutableGraph(Transform.transpose(graph)).immutableView();
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, transpose, NOPLogger.NOP_LOGGER);
+ lsv.norm = norm;
+ lsv.shift = shift;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(graph.numNodes(), graph.numNodes());
+ for(NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+
+ EigenDecomposition s = new EigenDecompositionImpl(m.transpose().multiply(m), 0);
+ double[] qrDecomp = s.getEigenvector(0).toArray();
+ qrDecomp = norm.normalize(qrDecomp, 1);
+ for (int i = 0; i < graph.numNodes(); i++) qrDecomp[i] = Math.abs(qrDecomp[i]);
+
+ for(int i = lsv.graph.numNodes(); i-- != 0;) assertEquals(qrDecomp[i], lsv.rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ ImmutableGraph g = mg.immutableView();
+ ImmutableGraph gt = Transform.transpose(g);
+
+ LeftSingularVectorParallelPowerMethod leftSingular = new LeftSingularVectorParallelPowerMethod(g, gt, NOPLogger.NOP_LOGGER);
+ leftSingular.norm = norm;
+ leftSingular.shift = shift;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = leftSingular.rank;
+
+ double ratio = rank[0];
+ for(int i = k - 1; i-- != 0;) assertEquals(ratio * 1, rank[i], threshold);
+ for(int d = 0; d < p; d++) assertEquals(ratio * 0, rank[k + d], threshold);
+
+ leftSingular.salsa = true;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] expected = new double[rank.length];
+ Arrays.fill(expected, 1);
+ norm.normalize(expected, 1);
+ assertArrayEquals(expected, rank, threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+ ImmutableGraph gt = Transform.transpose(g);
+
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(g.numNodes(), g.numNodes());
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+ double lambda = new EigenDecompositionImpl(m.multiply(m.transpose()), 0).getRealEigenvalue(0);
+
+ for (double threshold = 1E-1; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+
+ LeftSingularVectorParallelPowerMethod leftSingular = new LeftSingularVectorParallelPowerMethod(g, gt, NOPLogger.NOP_LOGGER);
+ leftSingular.norm = norm;
+ leftSingular.shift = shift;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = leftSingular.rank;
+
+ double[] expected = new double[rank.length];
+ for(int i = k - 1; i-- != 0;) expected[i] = lambda - k + 1;
+ expected[k - 1] = (k - 1) * (k - 2);
+ expected[k] = (k - 1) * (k - 1) + lambda * (lambda + 2 * k - 2 - k * k);
+ norm.normalize(expected, 1);
+
+ assertArrayEquals(expected, rank, threshold);
+
+ leftSingular.salsa = true;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = k; i-- != 0;) expected[i] = (k + 1) * (k - 1) / (2. + k * (k - 1));
+ expected[k] = (k + 1) * 2 / (2. + k * (k - 1));
+ for(int d = 1; d < p; d++) expected[k + d] = 1;
+
+ norm.normalize(expected, 1);
+ assertArrayEquals(expected, rank, threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+ ImmutableGraph gt = Transform.transpose(g);
+
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(g.numNodes(), g.numNodes());
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+ double lambda = new EigenDecompositionImpl(m.multiply(m.transpose()), 0).getRealEigenvalue(0);
+
+ for (double threshold = 1E-1; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+
+ LeftSingularVectorParallelPowerMethod leftSingular = new LeftSingularVectorParallelPowerMethod(g, gt, NOPLogger.NOP_LOGGER);
+ leftSingular.norm = norm;
+ leftSingular.shift = shift;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = leftSingular.rank;
+
+ double[] expected = new double[rank.length];
+ for(int i = k - 1; i-- != 0;) expected[i] = lambda * lambda - lambda * (k + 1) + k - 1;
+ expected[k - 1] = (k - 1) * (k - 2) * (lambda - 1);
+ expected[k + 1] = (k - 1) * (k - 2);
+ norm.normalize(expected, 1);
+
+ assertArrayEquals(expected, rank, threshold);
+
+ leftSingular.salsa = true;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = k - 1; i-- != 0;) expected[i] = (k + 1) * (k - 1) / (2. + k * (k - 1));
+ expected[k - 1] = (k + 1) * k / (2. + k * (k - 1));
+ expected[k] = 1;
+ expected[k + 1] = (k + 1) / (2. + k * (k - 1));
+ for(int d = 2; d < p; d++) expected[k + d] = 1;
+
+ norm.normalize(expected, 1);
+ assertArrayEquals(expected, rank, threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+ ImmutableGraph gt = Transform.transpose(g);
+
+ Array2DRowRealMatrix m = new Array2DRowRealMatrix(g.numNodes(), g.numNodes());
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) m.setEntry(curr, s, 1);
+ }
+ double lambda = new EigenDecompositionImpl(m.multiply(m.transpose()), 0).getRealEigenvalue(0);
+
+ for (double threshold = 1E-1; threshold > 1E-10; threshold /= 10) {
+ for(Norm norm : Norm.values()) {
+ for(double shift: new double[] { 0, -.1, -1 }) {
+
+ LeftSingularVectorParallelPowerMethod leftSingular = new LeftSingularVectorParallelPowerMethod(g, gt, NOPLogger.NOP_LOGGER);
+ leftSingular.norm = norm;
+ leftSingular.shift = shift;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ double[] rank = leftSingular.rank;
+
+ double[] expected = new double[rank.length];
+ for(int i = k - 1; i-- != 0;) expected[i] = lambda * lambda - lambda * (k + 1) + k - 1;
+ expected[k - 1] = (k - 1) * (k - 2) * (lambda - 1);
+ expected[k] = lambda * lambda * lambda - lambda * lambda * (k * k - 2 * k + 4) + lambda * (3 * k * k - 7 * k + 6) - (k - 1) * (k - 1);
+ expected[k + 1] = (k - 1) * (k - 2);
+ norm.normalize(expected, 1);
+
+ assertArrayEquals(expected, rank, threshold);
+
+ leftSingular.salsa = true;
+ leftSingular.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for(int i = k - 1; i-- != 0;) expected[i] = (k + 2) * (k - 1) / (4. + k * (k - 1));
+ expected[k - 1] = (k + 2) * k / (4. + k * (k - 1));
+ expected[k] = (k + 2) * 2 / (4. + k * (k - 1));
+ expected[k + 1] = (k + 2) / (4. + k * (k - 1));
+ for(int d = 2; d < p; d++) expected[k + d] = 1;
+
+ norm.normalize(expected, 1);
+ assertArrayEquals(expected, rank, threshold);
+ }
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankGaussSeidelTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankGaussSeidelTest.java
new file mode 100644
index 0000000..992359c
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankGaussSeidelTest.java
@@ -0,0 +1,329 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PageRankGaussSeidelTest {
+ final static String GRAPH_NAME = "test50-.6-7-3-2-10-graph";
+ static double[] exactResult;
+ static double[] preference;
+ static int n;
+ static String baseNameGraph;
+ static String baseNamePreference;
+ static ImmutableGraph g;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(PageRankGaussSeidelTest.class, GRAPH_NAME, false);
+ baseNamePreference = baseNameGraph + "-preferenceVector";
+
+ g = ImmutableGraph.load(baseNameGraph + "T"); // I need the transposed graph!
+ n = g.numNodes();
+ exactResult = new double[n];
+ preference = new double[n];
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ System.out.println("rank without preference vector");
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = null;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;) expected[i] = (k - 1) * (k - alpha * k + alpha * r) / (k * (k - 1 - alpha * (k - 2)));
+ expected[k] = 2 + 2 * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int d = 1; d < p; d++) expected[k + d] = 1 + pow(alpha, d) * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (2 * (k - 1) - 2 * (k - 2) * alpha - alpha * alpha) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 2) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+
+ expected[k - 1] = (2 * (k - 1) - (k - 3) * alpha - alpha * alpha * k) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 1) * (k - 1 - alpha * (k - 2)) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 - pow(alpha, d + (d == 0? p : 0)) / (2 - pow(alpha, p));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+
+ for(int i = k + p; i-- != 0;)
+ expected[i] = 1;
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankGaussSeidel pr = new PageRankGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (1 - alpha) * (alpha + k) * (k - 1) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+
+ expected[k - 1] = k * (1 - alpha) * (k - 1 + alpha) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 + (pow(alpha, d + 1) * (1 - alpha) * (k - 1 + alpha)) / ((1 - pow(alpha, p)) * ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2)));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankParallelGaussSeidelTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankParallelGaussSeidelTest.java
new file mode 100644
index 0000000..dd7403c
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankParallelGaussSeidelTest.java
@@ -0,0 +1,373 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PageRankParallelGaussSeidelTest {
+ final static String GRAPH_NAME = "test50-.6-7-3-2-10-graph";
+ static double[] exactResult;
+ static double[] preference;
+ static int n;
+ static String baseNameGraph;
+ static String baseNamePreference;
+ static ImmutableGraph g;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(PageRankParallelGaussSeidelTest.class, GRAPH_NAME, false);
+ baseNamePreference = baseNameGraph + "-preferenceVector";
+
+ g = ImmutableGraph.load(baseNameGraph + "T"); // I need the transposed graph!
+ n = g.numNodes();
+ exactResult = new double[n];
+ preference = new double[n];
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ System.out.println("rank without preference vector");
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = null;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-w.out", exactResult);
+ System.out.println(Arrays.toString(exactResult));
+ System.out.println(Arrays.toString(pr.rank));
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;) expected[i] = (k - 1) * (k - alpha * k + alpha * r) / (k * (k - 1 - alpha * (k - 2)));
+ expected[k] = 2 + 2 * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int d = 1; d < p; d++) expected[k + d] = 1 + pow(alpha, d) * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(w.previousRank, w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (2 * (k - 1) - 2 * (k - 2) * alpha - alpha * alpha) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 2) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+
+ expected[k - 1] = (2 * (k - 1) - (k - 3) * alpha - alpha * alpha * k) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 1) * (k - 1 - alpha * (k - 2)) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 - pow(alpha, d + (d == 0? p : 0)) / (2 - pow(alpha, p));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(w.previousRank, w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (1 - alpha) * (alpha + k) * (k - 1) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+
+ expected[k - 1] = k * (1 - alpha) * (k - 1 + alpha) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 + (pow(alpha, d + 1) * (1 - alpha) * (k - 1 + alpha)) / ((1 - pow(alpha, p)) * ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2)));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(w.previousRank, w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries w = new PowerSeries(g);
+ w.markovian = true;
+ w.alpha = .8;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelGaussSeidel pr = new PageRankParallelGaussSeidel(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ for(int i = k + p; i-- != 0;)
+ expected[i] = 1;
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+
+ pr.normVector(w.previousRank, w.maxRatio);
+ pr.pseudoRank = true;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+
+ for(int i = 0; i < rank.length; i++) assertEquals(expected[i], rank[i], threshold);
+ }
+ }
+ }
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankParallelPowerSeriesTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankParallelPowerSeriesTest.java
new file mode 100644
index 0000000..6d4b6ec
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankParallelPowerSeriesTest.java
@@ -0,0 +1,348 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PageRankParallelPowerSeriesTest {
+ final static String GRAPH_NAME = "test50-.6-7-3-2-10-graph";
+
+ static double[] exactResult;
+
+ static double[] preference;
+
+ static int n;
+
+ static String baseNameGraph;
+
+ static String baseNamePreference;
+
+ static ImmutableGraph g;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(PageRankParallelPowerSeriesTest.class, GRAPH_NAME, false);
+ baseNamePreference = baseNameGraph + "-preferenceVector";
+
+ g = Transform.transpose(ImmutableGraph.load(baseNameGraph));
+ n = g.numNodes();
+ exactResult = new double[n];
+ preference = new double[n];
+ }
+
+ @Test
+ public void testCycle() throws IOException {
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(ArrayListMutableGraph.newDirectedCycle(10000).immutableView());
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ for (int i = pr.graph.numNodes(); i-- != 0;)
+ assertEquals(pr.rank[i], 1. / pr.n, threshold);
+ }
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ System.out.println("rank without preference vector");
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = null;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+
+
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;) expected[i] = (k - 1) * (k - alpha * k + alpha * r) / (k * (k - 1 - alpha * (k - 2)));
+ expected[k] = 2 + 2 * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int d = 1; d < p; d++) expected[k + d] = 1 + pow(alpha, d) * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (2 * (k - 1) - 2 * (k - 2) * alpha - alpha * alpha) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 2) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+
+ expected[k - 1] = (2 * (k - 1) - (k - 3) * alpha - alpha * alpha * k) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 1) * (k - 1 - alpha * (k - 2)) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 - pow(alpha, d + (d == 0? p : 0)) / (2 - pow(alpha, p));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (1 - alpha) * (alpha + k) * (k - 1) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+
+ expected[k - 1] = k * (1 - alpha) * (k - 1 + alpha) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 + (pow(alpha, d + 1) * (1 - alpha) * (k - 1 + alpha)) / ((1 - pow(alpha, p)) * ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2)));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for(int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankParallelPowerSeries pr = new PageRankParallelPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ for(int i = k + p; i-- != 0;)
+ expected[i] = 1;
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankPowerSeriesTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankPowerSeriesTest.java
new file mode 100644
index 0000000..a882a5d
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankPowerSeriesTest.java
@@ -0,0 +1,329 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PageRankPowerSeriesTest {
+ final static String GRAPH_NAME = "test50-.6-7-3-2-10-graph";
+
+ static double[] exactResult;
+
+ static double[] preference;
+
+ static int n;
+
+ static String baseNameGraph;
+
+ static String baseNamePreference;
+
+ static ImmutableGraph g;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(PageRankPowerSeries.class, GRAPH_NAME, false);
+ baseNamePreference = baseNameGraph + "-preferenceVector";
+
+ g = ImmutableGraph.load(baseNameGraph);
+ n = g.numNodes();
+ exactResult = new double[n];
+ preference = new double[n];
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ System.out.println("rank without preference vector");
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = null;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-w.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-uniform-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-alternate-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testStrongRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-s.out", exactResult);
+ assertEquals("Too much different!", 0.0, Norm.L_1.compute(pr.rank, exactResult), threshold);
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;) expected[i] = (k - 1) * (k - alpha * k + alpha * r) / (k * (k - 1 - alpha * (k - 2)));
+ expected[k] = 2 + 2 * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int d = 1; d < p; d++) expected[k + d] = 1 + pow(alpha, d) * (alpha * r - k) / (k * (2 - pow(alpha, p)));
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+ double r = rank[k - 1] * (k + p);
+
+ expected[k - 1] = r;
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (2 * (k - 1) - 2 * (k - 2) * alpha - alpha * alpha) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 2) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+
+ expected[k - 1] = (2 * (k - 1) - (k - 3) * alpha - alpha * alpha * k) / (2 * (1 - alpha) * (k - 1 + alpha)) -
+ pow(alpha, p + 1) * (k - 1 - alpha * (k - 2)) / (2 * (1 - alpha) * (k - 1 + alpha) * (2 - pow(alpha, p)));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 - pow(alpha, d + (d == 0? p : 0)) / (2 - pow(alpha, p));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+
+ for(int i = k - 1; i-- != 0;)
+ expected[i] = (1 - alpha) * (alpha + k) * (k - 1) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+
+ expected[k - 1] = k * (1 - alpha) * (k - 1 + alpha) / ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2));
+ for(int d = 0; d < p; d++)
+ expected[k + d] = 1 + (pow(alpha, d + 1) * (1 - alpha) * (k - 1 + alpha)) / ((1 - pow(alpha, p)) * ((k - alpha * alpha) * (k - 1) - alpha * k * (k - 2)));
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ ImmutableGraph g = mg.immutableView();
+
+ for(double alpha: new double[] { .25, .50, .75 }) {
+ // Compute index
+ final PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.alpha = alpha;
+ pr.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold));
+ final double[] rank = pr.rank;
+ final double[] expected = new double[k + p];
+
+ for(int i = k + p; i-- != 0;)
+ expected[i] = 1;
+
+ for(int i = expected.length; i-- != 0;) expected[i] /= k + p;
+ assertEquals(0, Norm.L_1.compute(expected, rank), threshold);
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankPushTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankPushTest.java
new file mode 100644
index 0000000..a7d9496
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankPushTest.java
@@ -0,0 +1,124 @@
+package it.unimi.dsi.law.rank;
+
+import java.io.File;
+
+import org.junit.AfterClass;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+/*
+ * Copyright (C) 2010-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.doubles.AbstractDoubleList;
+import it.unimi.dsi.fastutil.doubles.DoubleList;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+
+
+// RELEASE-STATUS: DIST
+
+public class PageRankPushTest {
+ private final static String GRAPH_NAME = "test50-.6-7-3-2-10-graph";
+ private static String baseNameGraph;
+ private static ImmutableGraph graph;
+ private static File file;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(PageRankPushTest.class, GRAPH_NAME, false);
+ file = File.createTempFile(PageRankPushTest.class.getSimpleName(), "graph");
+ BVGraph.store(Transform.filterArcs(Transform.symmetrize(ImmutableGraph.load(baseNameGraph)), Transform.NO_LOOPS, null), file.toString());
+ graph = ImmutableGraph.load(file.toString());
+ }
+
+ @AfterClass
+ public static void tearDown() throws Exception {
+ file.delete();
+ new File(file + BVGraph.PROPERTIES_EXTENSION).delete();
+ new File(file + BVGraph.GRAPH_EXTENSION).delete();
+ new File(file + BVGraph.OFFSETS_EXTENSION).delete();
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ final ImmutableGraph transpose = Transform.transpose(graph);
+ final PageRankParallelGaussSeidel prPGS = new PageRankParallelGaussSeidel(transpose, NOPLogger.NOP_LOGGER);
+ final int n = graph.numNodes();
+
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 1000) {
+ for(boolean l1: new boolean[] { true, false }) {
+ for(boolean fifo: new boolean[] { true, false }) {
+ final PageRankPush pageRankPush = new PageRankPush(graph, NOPLogger.NOP_LOGGER, fifo);
+
+ for (int root = 0; root < n; root++) {
+ final int r = root;
+ final DoubleList preference = new AbstractDoubleList() {
+ @Override
+ public double getDouble(int u) {
+ return u == r ? 1 : 0;
+ }
+
+ @Override
+ public int size() {
+ return n;
+ }
+ };
+
+ // Compute norm vector for all alpha < .99
+ final PowerSeries w = new PowerSeries(transpose, NOPLogger.NOP_LOGGER);
+ w.markovian = true;
+ w.alpha = .999;
+ w.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ prPGS.normVector(w.previousRank, w.maxRatio);
+
+ for(double alpha: new double[] { 0, .25, .5, .75, .99 }) {
+ System.err.println("root=" + root + ", threshold=" + threshold + ", alpha=" + alpha + " fifo=" + fifo + ", l1=" + l1);
+
+ prPGS.preference = preference;
+ prPGS.pseudoRank = true;
+
+ pageRankPush.alpha = prPGS.alpha = alpha;
+ pageRankPush.root = root;
+ pageRankPush.threshold = threshold / 10;
+
+ if (l1) pageRankPush.stepUntil(new PageRankPush.L1NormStoppingCritertion());
+ else pageRankPush.stepUntil(new PageRankPush.EmptyQueueStoppingCritertion());
+
+ final double[] rank = new double[n];
+
+ for (int i = pageRankPush.node2Seen.size(); i-- != 0;) rank[pageRankPush.seen2Node[i]] = pageRankPush.rank[i] / (1 - pageRankPush.backToRoot);
+ prPGS.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / n));
+ Assert.assertEquals(0, Norm.L_1.compute(prPGS.rank, rank), 2 * threshold);
+
+ for (int i = pageRankPush.node2Seen.size(); i-- != 0;) rank[pageRankPush.seen2Node[i]] = pageRankPush.rank[i] / pageRankPush.pNorm;
+ Norm.L_1.normalize(prPGS.rank, 1);
+ Assert.assertEquals(0, Norm.L_1.compute(prPGS.rank, rank), 2 * threshold);
+
+ }
+ }
+ }
+ }
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankWithDerivativesDeepTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankWithDerivativesDeepTest.java
new file mode 100644
index 0000000..b146ee1
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PageRankWithDerivativesDeepTest.java
@@ -0,0 +1,267 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.law.rank.SpectralRanking.DEFAULT_THRESHOLD;
+import static org.junit.Assert.assertEquals;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.law.TestUtil;
+import it.unimi.dsi.law.rank.SpectralRanking.StoppingCriterion;
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PageRankWithDerivativesDeepTest {
+ final static String GRAPH_NAME = "test10-.7-2-2-2-5-graph";
+ // We lose several digits when passing to the derivatives.
+ final static double TEST_THRESHOLD = DEFAULT_THRESHOLD * 30;
+ static double[] exactResult;
+ static double[] preference;
+ static int n;
+ static String baseNameGraph;
+ static String baseNamePreference;
+ static ImmutableGraph g;
+ static StoppingCriterion stop;
+
+ @BeforeClass
+ public static void setUp() throws Exception {
+ baseNameGraph = TestUtil.getTestFile(PageRankWithDerivativesDeepTest.class, GRAPH_NAME, false);
+ baseNamePreference = baseNameGraph + "-preferenceVector";
+
+ g = ImmutableGraph.load(baseNameGraph);
+ n = g.numNodes();
+ exactResult = new double[n];
+ preference = new double[n];
+ stop = new SpectralRanking.NormStoppingCriterion(DEFAULT_THRESHOLD);
+ }
+
+ @Test
+ public void testRank() throws Exception {
+ System.out.println("rank without preference vector");
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = null;
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-wd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-wd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ for (int i = 0; i < n; i++)
+ assertEquals("Not really uniform! ", preference[i], 1.0 / n, DEFAULT_THRESHOLD);
+
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-w.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-wd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-wd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-alternate-w.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-alternate-wd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-alternate-wd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-w.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-wd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-wd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-w.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-wd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-wd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testStrongRankWithUniformPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform preference vector");
+ BinIO.loadDoubles(baseNamePreference + "-uniform.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-s.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-sd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-uniform-sd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testStrongRankWithAlternatePreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform alternate vector");
+ BinIO.loadDoubles(baseNamePreference + "-alternate.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-alternate-s.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-alternate-sd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-alternate-sd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+ @Test
+ public void testStrongRankWith1stHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 1stHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-1stHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-s.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-sd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-1stHalf-sd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+
+
+ @Test
+ public void testStrongRankWith2ndHalfPreferenceVector() throws Exception {
+ System.out.println("Strong rank with uniform 2ndHalf vector");
+ BinIO.loadDoubles(baseNamePreference + "-2ndHalf.bin", preference);
+ PageRankPowerSeries pr = new PageRankPowerSeries(g);
+ pr.order = new int[] { 1, 2 };
+ pr.preference = DoubleArrayList.wrap(preference);
+ pr.stronglyPreferential = true;
+ pr.stepUntil(stop);
+
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-s.out", exactResult);
+ double result[] = pr.rank;
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), DEFAULT_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-sd1.out", exactResult);
+ result = pr.derivative[0];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+
+ TextIO.loadDoubles(baseNamePreference + "-2ndHalf-sd2.out", exactResult);
+ result = pr.derivative[1];
+ assertEquals("Too much different!", 0, Norm.L_1.compute(result, exactResult), TEST_THRESHOLD);
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PowerSeriesTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PowerSeriesTest.java
new file mode 100644
index 0000000..fab73d1
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/PowerSeriesTest.java
@@ -0,0 +1,280 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static java.lang.Math.pow;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PowerSeriesTest {
+
+ @Test
+ public void testCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for (int size : new int[] { 100, 1000, 10000 }) {
+ final PowerSeries katz = new PowerSeries(ArrayListMutableGraph.newDirectedCycle(size).immutableView());
+ katz.alpha = 1. / 2;
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for (int i = katz.graph.numNodes(); i-- != 0;)
+ assertEquals(1. / (1 - katz.alpha), katz.rank[i] / katz.scale, threshold);
+ }
+ }
+ }
+
+ @Test
+ public void testClique() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for (int size : new int[] { 10, 100, 1000 }) {
+ final PowerSeries katz = new PowerSeries(ArrayListMutableGraph.newCompleteGraph(size, false).immutableView());
+ katz.alpha = 1. / (2 * size);
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 10));
+ for (int i = katz.graph.numNodes(); i-- != 0;)
+ assertEquals(1. / (1 - katz.alpha * (size - 1)), katz.rank[i] / katz.scale, threshold);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for (int p : new int[] { 10, 50, 100 }) {
+ for (int k : new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k, new int[][] {});
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for (int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries katz = new PowerSeries(g);
+ final double alpha = 1. / (k + 1);
+ katz.alpha = alpha;
+
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 100));
+ final double[] rank = katz.rank;
+ final double normalization = 1 / katz.scale;
+ for(int i = rank.length; i-- != 0;) katz.rank[i] *= normalization;
+ final double r = rank[k - 1];
+ for (int i = k - 1; i-- != 0;)
+ assertEquals((1 + alpha * r) / (1 - (k - 2) * alpha), rank[i], threshold * normalization);
+ assertEquals(1 / (1 - alpha) + alpha * r / (1 - Math.pow(alpha, p)), rank[k], threshold * normalization);
+ for (int d = 1; d < p; d++)
+ assertEquals(1 / (1 - alpha) + Math.pow(alpha, d + 1) * r / (1 - Math.pow(alpha, p)), rank[k + d], threshold * normalization);
+
+
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+
+ katz.alpha = .5 / (k + 1);
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for (int p : new int[] { 10, 50, 100 }) {
+ for (int k : new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k, new int[][] {});
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for (int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries katz = new PowerSeries(g);
+ final double alpha = 1. / (k + 1);
+ katz.alpha = alpha;
+
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 100));
+ final double[] rank = katz.rank;
+ final double normalization = 1 / katz.scale;
+ for(int i = rank.length; i-- != 0;) katz.rank[i] *= normalization;
+ for (int i = k - 1; i-- != 0;)
+ assertEquals(-1 / ((1 - alpha) * (alpha * alpha * (k - 1) + alpha * (k - 2) - 1)), rank[i], threshold * normalization);
+ assertEquals((alpha * alpha * (k - 1) - alpha - 1) / ((1 - alpha) * (alpha * alpha * (k - 1) + alpha * (k - 2) - 1)), rank[k - 1], threshold * normalization);
+ for (int d = 1; d < p; d++)
+ assertEquals(1 / (1 - alpha), rank[k + d], threshold * normalization);
+
+
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+
+ katz.alpha = .5 / (k + 1);
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for (int p : new int[] { 10, 50, 100 }) {
+ for (int k : new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k, new int[][] {});
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for (int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries katz = new PowerSeries(g);
+ final double alpha = 1. / (k + 1);
+ katz.alpha = alpha;
+
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 100));
+ final double[] rank = katz.rank;
+ final double normalization = 1 / katz.scale;
+ for(int i = rank.length; i-- != 0;) katz.rank[i] *= normalization;
+ for (int i = k; i-- != 0;)
+ assertEquals(1 / (1 - (k - 1) * alpha), rank[i], threshold * normalization);
+ for (int d = 0; d < p; d++)
+ assertEquals(1 / (1 - alpha) + pow(alpha, d + 1) / ((1 - (k - 1) * alpha) * (1 - pow(alpha, p))), rank[k + d], threshold * normalization);
+
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+
+ katz.alpha = .5 / (k + 1);
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws IOException {
+ for (double threshold = 1E-1; threshold > 1E-12; threshold /= 10) {
+ for (int p : new int[] { 10, 50, 100 }) {
+ for (int k : new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k, new int[][] {});
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ // Note the transposition
+ for (int i = 0; i < p; i++) mg.addArc(k + (i + 1) % p, k + i);
+ ImmutableGraph g = mg.immutableView();
+
+ PowerSeries katz = new PowerSeries(g);
+ final double alpha = 1. / (k + 1);
+ katz.alpha = alpha;
+
+ katz.stepUntil(new SpectralRanking.NormStoppingCriterion(threshold / 100));
+ final double[] rank = katz.rank;
+ final double normalization = 1 / katz.scale;
+ for(int i = rank.length; i-- != 0;) katz.rank[i] *= normalization;
+ for (int i = k; i-- != 0;)
+ assertEquals(1 / (1 - (k - 1) * alpha), rank[i], threshold * normalization);
+ for (int d = 0; d < p; d++)
+ assertEquals(1 / (1 - alpha), rank[k + d], threshold * normalization);
+
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+
+ katz.alpha = .5 / (k + 1);
+ katz.stepUntil(PowerSeries.MAX_RATIO_STOPPING_CRITERION);
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ double t = 0;
+ LazyIntIterator successors = nodeIterator.successors();
+ for(int s; (s = successors.nextInt()) != -1;) t += katz.previousRank[s];
+ assertTrue(t / katz.previousRank[curr] < 1 / katz.alpha);
+ }
+ }
+ }
+ }
+ }
+
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/SalsaTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/SalsaTest.java
new file mode 100644
index 0000000..44fd800
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/SalsaTest.java
@@ -0,0 +1,224 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+import it.unimi.dsi.law.util.Norm;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+//RELEASE-STATUS: DIST
+
+public class SalsaTest {
+
+ @Test
+ public void testSalsaConnectedM() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(6, new int[][] { { 0, 1 }, { 0, 2 }, { 0, 4 }, { 3, 1 }, { 3, 5 }, { 4, 0 }, { 4, 3 }, { 5, 3 }, { 5, 4 } }).immutableView();
+ double[] expected = { 1, 2, 1, 2, 2, 1 };
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, Salsa.rank(graph, null), 1E-50);
+ }
+
+ @Test
+ public void testSalsaWithIsolated() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(8, new int[][] { { 0, 1 }, { 0, 2 }, { 0, 4 }, { 3, 1 }, { 3, 5 }, { 4, 0 }, { 4, 3 }, { 5, 3 }, { 5, 4 } }).immutableView();
+ double[] expected = { 1 * (6./8) / 9, 2 * (6./8) / 9, 1 * (6./8) / 9, 2 * (6./8) / 9, 2 * (6./8) / 9, 1 * (6./8) / 9, 0, 0 };
+ assertArrayEquals(expected, Salsa.rank(graph, null), 1E-50);
+ }
+
+ @Test
+ public void testSalsaNonconnectedM() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(6, new int[][] { { 0, 1 }, { 0, 2 }, { 0, 4 }, { 3, 1 }, { 3, 5 }, { 4, 0 }, { 4, 3 }, { 5, 3 } }).immutableView();
+
+ double[] indegree = { 1. / 3, 2. / 5, 1. / 5, 2. / 3, 1. / 5, 1. / 5 };
+ double[] ccSize = { 2, 4, 4, 2, 4, 4 };
+ double[] expected = new double[indegree.length];
+ for (int i = 0; i < indegree.length; i++) expected[i] = indegree[i] * ccSize[i] / 6.0;
+ assertArrayEquals(expected, Salsa.rank(graph, null), 1E-50);
+ }
+
+ @Test
+ public void testCycle() {
+ for(int size: new int[] { 100, 1000, 10000 }) {
+ final ImmutableGraph bidirectionalCycle = ArrayListMutableGraph.newBidirectionalCycle(size).immutableView();
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 1);
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, Salsa.rank(bidirectionalCycle, null), 1E-50);
+ }
+ }
+
+ @Test
+ public void testClique() {
+ for(int size: new int[] { 10, 100, 1000 }) {
+ final ImmutableGraph clique = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 1);
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, Salsa.rank(clique, null), 1E-50);
+ }
+ }
+
+ @Test
+ public void testRandomSymmetric() throws IOException {
+ for(int size: new int[] { 10, 100 }) {
+ // TODO refactor when symmetrize will return a copyable graph
+ final ImmutableGraph graph = new ArrayListMutableGraph(Transform.symmetrize(new ErdosRenyiGraph(size, .3, 0, false))).immutableView();
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, graph, NOPLogger.NOP_LOGGER);
+ lsv.norm = Norm.L_1;
+ lsv.salsa = true;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(1E-15));
+
+ assertArrayEquals(lsv.rank, Salsa.rank(graph, null), 1E-2);
+ }
+ }
+
+ @Test
+ public void testRandom() throws IOException {
+ for(double p: new double[] { 0.1, 0.3, 0.9 }) {
+ for(int size: new int[] { 10, 100 }) {
+ // TODO refactor when symmetrize will return a copiable graph
+ final ImmutableGraph graph = new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false)).immutableView();
+ final ImmutableGraph transpose = Transform.transpose(graph);
+ final LeftSingularVectorParallelPowerMethod lsv = new LeftSingularVectorParallelPowerMethod(graph, transpose, NOPLogger.NOP_LOGGER);
+ lsv.norm = Norm.L_1;
+ lsv.salsa = true;
+ lsv.stepUntil(new SpectralRanking.NormStoppingCriterion(1E-15));
+
+ final double[] rank = Salsa.rank(graph, null);
+ Norm.L_1.normalize(rank, 1);
+ assertArrayEquals(lsv.rank, rank, 1E-2);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ ImmutableGraph g = mg.immutableView();
+
+ double[] rank = Salsa.rank(g, null);
+
+ double[] expected = new double[rank.length];
+ Arrays.fill(expected, 1);
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, rank, 1E-10);
+ }
+ }
+ }
+
+
+ @Test
+ public void testCliqueForwardbridgeCycle() {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ double[] rank = Salsa.rank(g, null);
+ double[] expected = new double[rank.length];
+ for(int i = k; i-- != 0;) expected[i] = (k + 1) * (k - 1) / (2. + k * (k - 1));
+ expected[k] = (k + 1) * 2 / (2. + k * (k - 1));
+ for(int d = 1; d < p; d++) expected[k + d] = 1;
+
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, rank, 1E-10);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ double[] rank = Salsa.rank(g, null);
+
+ double[] expected = new double[rank.length];
+ for(int i = k - 1; i-- != 0;) expected[i] = (k + 1) * (k - 1) / (2. + k * (k - 1));
+ expected[k - 1] = (k + 1) * k / (2. + k * (k - 1));
+ expected[k] = 1;
+ expected[k + 1] = (k + 1) / (2. + k * (k - 1));
+ for(int d = 2; d < p; d++) expected[k + d] = 1;
+
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, rank, 1E-10);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++) {
+ if (i != j) mg.addArc(i, j);
+ }
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ double[] rank = Salsa.rank(g, null);
+
+ double[] expected = new double[rank.length];
+ for(int i = k - 1; i-- != 0;) expected[i] = (k + 2) * (k - 1) / (4. + k * (k - 1));
+ expected[k - 1] = (k + 2) * k / (4. + k * (k - 1));
+ expected[k] = (k + 2) * 2 / (4. + k * (k - 1));
+ expected[k + 1] = (k + 2) / (4. + k * (k - 1));
+ for(int d = 2; d < p; d++) expected[k + d] = 1;
+
+ Norm.L_1.normalize(expected, 1);
+ assertArrayEquals(expected, rank, 1E-10);
+ }
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/SalsinaTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/SalsinaTest.java
new file mode 100644
index 0000000..6eeb29e
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/SalsinaTest.java
@@ -0,0 +1,51 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * DominantEigenvectorParallelPowerMethodTest.java
+ *
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+
+import java.util.Arrays;
+
+import org.junit.Test;
+
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+//RELEASE-STATUS: DIST
+
+public class SalsinaTest {
+
+ @Test
+ public void testSalsinaConnected() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(8, new int[][] { { 0, 1 }, { 2, 1 }, { 2, 3 }, { 4, 5 }, { 5, 6 } }).immutableView();
+ double[] expected = { 0, 2 * 4, 0, 1 * 4, 0, 1 * 3, 1 * 3, 0 };
+ double[] rank = Salsina.rank(graph, false, null);
+ System.err.println(Arrays.toString(rank));
+ assertArrayEquals(expected, rank, 1E-50);
+
+ // Markovian
+ expected = new double[] { 0, (3./2) * 4, 0, (1./2) * 4, 0, 1 * 3, 1 * 3, 0 };
+ rank = Salsina.rank(graph, true, null);
+ System.err.println(Arrays.toString(rank));
+ assertArrayEquals(expected, rank, 1E-50);
+}
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/WindegreeTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/WindegreeTest.java
new file mode 100644
index 0000000..1c40d32
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/rank/WindegreeTest.java
@@ -0,0 +1,50 @@
+package it.unimi.dsi.law.rank;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+
+import java.util.Arrays;
+
+import org.junit.Test;
+
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+//RELEASE-STATUS: DIST
+
+public class WindegreeTest {
+
+ @Test
+ public void test() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(8, new int[][] { { 0, 1 }, { 2, 1 }, { 2, 3 }, { 4, 5 }, { 5, 6 } }).immutableView();
+ float[] coreachable = { 0, 2, 0, 1, 0, 1, 2, 0 };
+ double[] expected = { 0, 2 * 2, 0, 1 * 1, 0, 1 * 1, 1 * 2, 0 };
+ double[] rank = Windegree.rank(graph, coreachable, false, null);
+ System.err.println(Arrays.toString(rank));
+ assertArrayEquals(expected, rank, 1E-50);
+
+ // Markovian
+ expected = new double[] { 0, (1 + 1./2) * 2, 0, (1./2) * 1, 0, 1 * 1, 1 * 2, 0 };
+ rank = Windegree.rank(graph, coreachable, true, null);
+ System.err.println(Arrays.toString(rank));
+ assertArrayEquals(expected, rank, 1E-50);
+}
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/AveragePrecisionCorrelationTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/AveragePrecisionCorrelationTest.java
new file mode 100644
index 0000000..cb4bbc6
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/AveragePrecisionCorrelationTest.java
@@ -0,0 +1,162 @@
+package it.unimi.dsi.law.stat;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import java.io.File;
+import java.io.IOException;
+
+import org.junit.Test;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+
+//RELEASE-STATUS: DIST
+
+public class AveragePrecisionCorrelationTest {
+
+ private final static double[] ordered = { 0.0, 1.0, 2.0, 3.0, 4.0 };
+ private final static double[] reverse = { 4.0, 3.0, 2.0, 1.0, 0.0 };
+ private final static double[] reverseButOne = { 10.0, 9.0, 7.0, 8.0, 6.0 };
+
+ public double compute(double[] v0, double[] v1) {
+ final int length = v0.length;
+ final int[] perm = Util.identity(length);
+ DoubleArrays.radixSortIndirect(perm, v1, true);
+ IntArrays.reverse(perm);
+
+ double p = 0;
+ for(int i = 0; i < length; i++)
+ for(int j = i + 1; j < length; j++)
+ if (v0[perm[i]] > v0[perm[j]]) p += 1.0 / j;
+
+ p /= length - 1;
+ return 2 * p - 1;
+ }
+
+ @Test
+ public void testComputeOrdered() {
+ double expResult = this.compute(ordered, ordered);
+ double result = AveragePrecisionCorrelation.INSTANCE.compute(ordered, ordered);
+ assertEquals(expResult, result, 0.0);
+ }
+
+ @Test
+ public void testComputeWithReverse() {
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(ordered, reverse), 1E-14);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(reverse, ordered), 1E-14);
+ }
+
+ @Test
+ public void testComputeWithReverseButOne() {
+ double expResult = this.compute(ordered, reverseButOne);
+ double result = AveragePrecisionCorrelation.INSTANCE.compute(ordered, reverseButOne);
+ assertEquals(expResult, result, 1E-14);
+ }
+
+ @Test
+ public void testRandom() {
+ final XoRoShiRo128PlusRandom random = new XoRoShiRo128PlusRandom(1);
+ final double[] d = new double[100];
+ for(int i = 100; i-- != 0;) d[i] = i;
+ final double[] e = d.clone();
+ DoubleArrays.shuffle(d, random);
+ DoubleArrays.shuffle(e, random);
+
+ double expResult = this.compute(d, e);
+ double result = AveragePrecisionCorrelation.INSTANCE.compute(d, e);
+ assertEquals(expResult, result, 1E-14);
+ }
+
+ @Test
+ public void test() {
+ double p[] = new double[1000];
+ for(int i = p.length; i-- != 0;) p[i] = Math.random();
+ double q[] = new double[p.length];
+ for(int i = q.length; i-- != 0;) q[i] = Math.random();
+ double expResult = this.compute(p, q);
+ double result = AveragePrecisionCorrelation.INSTANCE.compute(p, q);
+ assertEquals(expResult, result, 0.00000001);
+ }
+
+ @Test
+ public void testInputType() throws IOException {
+ File a = File.createTempFile(AveragePrecisionCorrelationTest.class.getSimpleName(), "a");
+ a.deleteOnExit();
+ File b = File.createTempFile(AveragePrecisionCorrelationTest.class.getSimpleName(), "b");
+ b.deleteOnExit();
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeInts(new int[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.computeInts(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Integer.class, false), 1E-14);
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.computeInts(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Integer.class, false), 1E-14);
+
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.computeLongs(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Long.class, b.toString(), Long.class, false), 1E-14);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.computeLongs(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Long.class, b.toString(), Long.class, false), 1E-14);
+
+ BinIO.storeFloats(new float[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeFloats(new float[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.computeFloats(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Float.class, b.toString(), Float.class, false), 1E-14);
+ BinIO.storeFloats(new float[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.computeFloats(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Float.class, b.toString(), Float.class, false), 1E-14);
+
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeDoubles(new double[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.computeDoubles(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Double.class, b.toString(), Double.class, false), 1E-14);
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.computeDoubles(a.toString(), b.toString(), false), 1E-14);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Double.class, b.toString(), Double.class, false), 1E-14);
+
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeDoubles(new double[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Double.class, false), 1E-14);
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Double.class, false), 1E-14);
+
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Double.class, b.toString(), Long.class, false), 1E-14);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), Double.class, b.toString(), Long.class, false), 1E-14);
+
+ TextIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), String.class, b.toString(), Long.class, false), 1E-14);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, AveragePrecisionCorrelation.INSTANCE.compute(a.toString(), String.class, b.toString(), Long.class, false), 1E-14);
+
+ a.delete();
+ b.delete();
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/KendallTauTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/KendallTauTest.java
new file mode 100644
index 0000000..a4ffb82
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/KendallTauTest.java
@@ -0,0 +1,203 @@
+package it.unimi.dsi.law.stat;
+
+/*
+ * Copyright (C) 2006-2019 Paolo Boldi, Roberto Posenato, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.File;
+import java.io.IOException;
+
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+
+//RELEASE-STATUS: DIST
+
+public class KendallTauTest {
+ private final static double[] ordered = { 0.0, 1.0, 2.0, 3.0, 4.0 };
+ private final static double[] reverse = { 4.0, 3.0, 2.0, 1.0, 0.0 };
+ private final static double[] reverseButOne = { 10.0, 9.0, 7.0, 8.0, 6.0 };
+ private final static double[] allOnes = { 1.0, 1.0, 1.0 };
+ private final static double[] allZeroes = { 0.0, 0.0, 0.0 };
+
+ /** Computes Kendall's τ by brute-force enumeration of all pairs (<var>i</var>, <var>j</var>), <var>i</var>&lt;<var>j</var>.
+ *
+ * @param v0 the first score vector.
+ * @param v1 the second score vector.
+ * @return Kendall's τ.
+ */
+ public static double compute(final double[] v0, final double[] v1) {
+ long sum = 0, u = 0, v = 0, tot = (v0.length * (v0.length - 1L)) / 2;
+ for(int i = 0; i < v0.length; i++) {
+ for(int j = i + 1; j < v0.length; j++) {
+ if (v0[i] == v0[j]) u++;
+ if (v1[i] == v1[j]) v++;
+ if (v0[i] < v0[j] && v1[i] < v1[j] || v0[i] > v0[j] && v1[i] > v1[j]) sum++;
+ else if (v0[i] < v0[j] && v1[i] > v1[j] || v0[i] > v0[j] && v1[i] < v1[j]) sum--;
+ }
+ }
+ return Math.min(1, Math.max(-1, sum / (Math.sqrt((double) (tot - u) * (double) (tot - v)))));
+ }
+
+ @Test
+ public void testComputeOrdered() {
+ double expResult = compute(ordered, ordered); // (10.0 - 0.0) / 10.0;
+ double result = KendallTau.INSTANCE.compute(ordered, ordered);
+ assertEquals(expResult, result, 0.0);
+ }
+
+ @Test
+ public void testComputeWithReverse() {
+ double expResult = compute(ordered, reverse);//(0 - 10.0) / 10.0;
+ double result = KendallTau.INSTANCE.compute(ordered, reverse);
+ assertEquals(expResult, result, 0.0);
+ }
+
+ @Test
+ public void testComputeWithReverseButOne() {
+ double expResult = compute(ordered, reverseButOne);//(1.0 - 9.0) / 10.0;
+ double result = KendallTau.INSTANCE.compute(ordered, reverseButOne);
+ assertEquals(expResult, result, 0.0);
+ }
+
+ @Test
+ public void testRandom() {
+ XoRoShiRo128PlusRandom XoRoShiRo128PlusRandom = new XoRoShiRo128PlusRandom(0);
+ final double v0[] = new double[1000];
+ final double v1[] = new double[1000];
+ for(int i = v0.length; i-- != 0;) {
+ v0[i] = XoRoShiRo128PlusRandom.nextDouble();
+ v1[i] = XoRoShiRo128PlusRandom.nextDouble();
+ }
+ double expResult = compute(v0, v1);
+ double result = KendallTau.INSTANCE.compute(v0, v1);
+ assertEquals(expResult, result, 1E-9);
+ DoubleArrays.reverse(v0);
+ DoubleArrays.reverse(v1);
+ assertEquals(expResult, KendallTau.INSTANCE.compute(v0, v1), 1E-9);
+ }
+
+ @Test
+ public void testRandomWithTies() {
+ XoRoShiRo128PlusRandom XoRoShiRo128PlusRandom = new XoRoShiRo128PlusRandom(0);
+ final double v0[] = new double[1000];
+ final double v1[] = new double[1000];
+ for(int i = v0.length; i-- != 0;) {
+ v0[i] = XoRoShiRo128PlusRandom.nextInt(10);
+ v1[i] = XoRoShiRo128PlusRandom.nextInt(10);
+ }
+ double expResult = compute(v0, v1);
+ double result = KendallTau.INSTANCE.compute(v0, v1);
+ assertEquals(expResult, result, 1E-9);
+ DoubleArrays.reverse(v0);
+ DoubleArrays.reverse(v1);
+ assertEquals(expResult, KendallTau.INSTANCE.compute(v0, v1), 1E-9);
+ }
+
+
+ @Test
+ public void testAllTies() {
+ assertEquals(1.0, KendallTau.INSTANCE.compute(allOnes, allZeroes), 0.0);
+ }
+
+ @Test
+ public void testOneAllTies() {
+ assertTrue(Double.isNaN(KendallTau.INSTANCE.compute(allOnes, new double[] { 0.0, 1.0, 2.0 })));
+ }
+
+ @Test
+ public void testInputType() throws IOException {
+ File a = File.createTempFile(KendallTauTest.class.getSimpleName(), "a");
+ a.deleteOnExit();
+ File b = File.createTempFile(KendallTauTest.class.getSimpleName(), "b");
+ b.deleteOnExit();
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeInts(new int[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.computeInts(a.toString(), b.toString()), 0);
+ assertEquals(-1, KendallTau.INSTANCE.computeInts(a.toString(), b.toString(), true), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Integer.class, Integer.MAX_VALUE), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Integer.class, true, Integer.MAX_VALUE), 0);
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.computeInts(a.toString(), b.toString()), 0);
+ assertEquals(1, KendallTau.INSTANCE.computeInts(a.toString(), b.toString(), true), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Integer.class, Integer.MAX_VALUE), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Integer.class, true, Integer.MAX_VALUE), 0);
+
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.computeLongs(a.toString(), b.toString()), 0);
+ assertEquals(-1, KendallTau.INSTANCE.computeLongs(a.toString(), b.toString(), true), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Long.class, b.toString(), Long.class, Integer.MAX_VALUE), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Long.class, b.toString(), Long.class, true, Integer.MAX_VALUE), 0);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.computeLongs(a.toString(), b.toString()), 0);
+ assertEquals(1, KendallTau.INSTANCE.computeLongs(a.toString(), b.toString(), true), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Long.class, b.toString(), Long.class, Integer.MAX_VALUE), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Long.class, b.toString(), Long.class, true, Integer.MAX_VALUE), 0);
+
+ BinIO.storeFloats(new float[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeFloats(new float[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.computeFloats(a.toString(), b.toString()), 0);
+ assertEquals(-1, KendallTau.INSTANCE.computeFloats(a.toString(), b.toString(), true), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Float.class, b.toString(), Float.class, Integer.MAX_VALUE), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Float.class, b.toString(), Float.class, true, Integer.MAX_VALUE), 0);
+ BinIO.storeFloats(new float[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.computeFloats(a.toString(), b.toString()), 0);
+ assertEquals(1, KendallTau.INSTANCE.computeFloats(a.toString(), b.toString(), true), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Float.class, b.toString(), Float.class, Integer.MAX_VALUE), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Float.class, b.toString(), Float.class, true, Integer.MAX_VALUE), 0);
+
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeDoubles(new double[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.computeDoubles(a.toString(), b.toString()), 0);
+ assertEquals(-1, KendallTau.INSTANCE.computeDoubles(a.toString(), b.toString(), true), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Double.class, b.toString(), Double.class, Integer.MAX_VALUE), 0);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Double.class, b.toString(), Double.class, true, Integer.MAX_VALUE), 0);
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.computeDoubles(a.toString(), b.toString(), true), 0);
+ assertEquals(1, KendallTau.INSTANCE.computeDoubles(a.toString(), b.toString()), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Double.class, b.toString(), Double.class, Integer.MAX_VALUE), 0);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Double.class, b.toString(), Double.class, true, Integer.MAX_VALUE), 0);
+
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeDoubles(new double[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Double.class, Integer.MAX_VALUE), 0);
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Integer.class, b.toString(), Double.class, Integer.MAX_VALUE), 0);
+
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), Double.class, b.toString(), Long.class, Integer.MAX_VALUE), 0);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), Double.class, b.toString(), Long.class, Integer.MAX_VALUE), 0);
+
+ TextIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, KendallTau.INSTANCE.compute(a.toString(), String.class, b.toString(), Long.class, Integer.MAX_VALUE), 0);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, KendallTau.INSTANCE.compute(a.toString(), String.class, b.toString(), Long.class, Integer.MAX_VALUE), 0);
+
+ a.delete();
+ b.delete();
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/WeightedTauTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/WeightedTauTest.java
new file mode 100644
index 0000000..9fd2ebf
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/stat/WeightedTauTest.java
@@ -0,0 +1,292 @@
+package it.unimi.dsi.law.stat;
+
+/*
+ * Copyright (C) 2011-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.law.stat.WeightedTau.HYPERBOLIC_WEIGHER;
+import static it.unimi.dsi.law.stat.WeightedTau.LOGARITHMIC_WEIGHER;
+import static it.unimi.dsi.law.stat.WeightedTau.QUADRATIC_WEIGHER;
+import static org.junit.Assert.assertEquals;
+
+import java.io.File;
+import java.io.IOException;
+
+import org.junit.Test;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.doubles.DoubleArrays;
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+
+//RELEASE-STATUS: DIST
+
+public class WeightedTauTest {
+ private static final Int2DoubleFunction CONSTANT_WEIGHER = new WeightedTau.AbstractWeigher() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int key) {
+ return 1;
+ }
+ };
+ private static final Int2DoubleFunction[] WEIGHER = new Int2DoubleFunction[] { CONSTANT_WEIGHER, HYPERBOLIC_WEIGHER, LOGARITHMIC_WEIGHER, QUADRATIC_WEIGHER };
+ private final double[] ordered = { 0.0, 1.0, 2.0, 3.0, 4.0 };
+ private final double[] reverse = { 4.0, 3.0, 2.0, 1.0, 0.0 };
+ private final double[] reverseButOne = { 10.0, 9.0, 7.0, 8.0, 6.0 };
+ private final double[] allOnes = { 1.0, 1.0, 1.0 };
+ private final double[] allZeroes = { 0.0, 0.0, 0.0 };
+
+ public static double compute(Int2DoubleFunction weigher, boolean multiplicative, double[] v0, double[] v1, int rank[]) {
+ double concordances = 0, discordances = 0, u = 0, v = 0, tot = 0;
+ final int length = v0.length;
+
+ if (rank == null) {
+ rank = Util.identity(length);
+ DoubleArrays.radixSortIndirect(rank, v1, v0, true);
+ IntArrays.reverse(rank);
+ Util.invertPermutationInPlace(rank);
+ }
+
+ for (int i = 0; i < length; i++) {
+ for (int j = i + 1; j < length; j++) {
+ final double weight = multiplicative ? weigher.get(rank[i]) * weigher.get(rank[j]) : weigher.get(rank[i]) + weigher.get(rank[j]);
+ tot += weight;
+ if (v0[i] == v0[j]) u += weight;
+ if (v1[i] == v1[j]) v += weight;
+ if (v0[i] < v0[j] && v1[i] < v1[j] || v0[i] > v0[j] && v1[i] > v1[j]) concordances += weight;
+ else if (v0[i] < v0[j] && v1[i] > v1[j] || v0[i] > v0[j] && v1[i] < v1[j]) discordances += weight;
+ }
+ }
+
+ //System.err.println("u: " + u + " v: " + v + " tot:" + tot + " conc: " + concordances + " disc: " + discordances);
+ return (concordances - discordances) / (Math.sqrt((tot - u) * (tot - v)));
+ }
+
+ @Test
+ public void testComputeOrdered() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ double expResult = compute(weigher, multiplicative, ordered, ordered, null); // (10.0 - 0.0) / 10.0;
+ double result = weightedTau.compute(ordered, ordered, null);
+ assertEquals(expResult, result, 1E-15);
+ }
+ }
+ }
+
+ @Test
+ public void testComputeWithReverse() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ double expResult = compute(weigher, multiplicative, ordered, reverse, null);// (0 - 10.0) / 10.0;
+ double result = weightedTau.compute(ordered, reverse, null);
+ assertEquals(expResult, result, 1E-15);
+
+ expResult = compute(weigher, multiplicative, reverse, ordered, null);// (0 - 10.0) / 10.0;
+ result = weightedTau.compute(reverse, ordered, null);
+ assertEquals(expResult, result, 1E-15);
+ }
+ }
+ }
+
+ @Test
+ public void testComputeWithReverseButOne() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ double expResult = compute(weigher, multiplicative, ordered, reverseButOne, null);// (1.0 - 9.0) / 10.0;
+ double result = weightedTau.compute(ordered, reverseButOne, null);
+ assertEquals(expResult, result, 1E-15);
+
+ expResult = compute(weigher, multiplicative, reverseButOne, ordered, null);// (1.0 - 9.0) / 10.0;
+ result = weightedTau.compute(reverseButOne, ordered, null);
+ assertEquals(expResult, result, 1E-15);
+ }
+ }
+ }
+
+ @Test
+ public void testTies() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ final double[] v0 = { 0.1, 0.1, 0.2 };
+ final double[] v1 = { 0.4, 0.3, 0.3 };
+
+ double expResult = compute(weigher, multiplicative, v0, v1, null);
+ double result = weightedTau.compute(v0, v1, null);
+ assertEquals(expResult, result, 1E-15);
+
+ expResult = compute(weigher, multiplicative, v1, v0, null);
+ result = weightedTau.compute(v1, v0, null);
+ assertEquals(expResult, result, 1E-15);
+ }
+ }
+ }
+
+ @Test
+ public void testRandom() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ XoRoShiRo128PlusRandom XoRoShiRo128PlusRandom = new XoRoShiRo128PlusRandom(0);
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ final double v0[] = new double[1000];
+ final double v1[] = new double[1000];
+
+ for (int i = v0.length; i-- != 0;) {
+ v0[i] = XoRoShiRo128PlusRandom.nextDouble();
+ v1[i] = XoRoShiRo128PlusRandom.nextDouble();
+ }
+ double expResult = compute(weigher, multiplicative, v0, v1, null);
+ double result = weightedTau.compute(v0, v1, null);
+ assertEquals(expResult, result, 1E-10);
+
+ expResult = compute(weigher, multiplicative, v1, v0, null);
+ result = weightedTau.compute(v1, v0, null);
+ assertEquals(expResult, result, 1E-10);
+ }
+ }
+ }
+
+ @Test
+ public void testRandomWithTies() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ XoRoShiRo128PlusRandom XoRoShiRo128PlusRandom = new XoRoShiRo128PlusRandom(0);
+ final double v0[] = new double[1000];
+ final double v1[] = new double[1000];
+ for (int i = v0.length; i-- != 0;) {
+ v0[i] = XoRoShiRo128PlusRandom.nextInt(10);
+ v1[i] = XoRoShiRo128PlusRandom.nextInt(10);
+ }
+
+ double expResult = compute(weigher, multiplicative, v0, v1, null);
+ double result = weightedTau.compute(v0, v1, null);
+ assertEquals(expResult, result, 1E-10);
+
+ expResult = compute(weigher, multiplicative, v1, v0, null);
+ result = weightedTau.compute(v1, v0, null);
+ assertEquals(expResult, result, 1E-10);
+ }
+ }
+ }
+
+ @Test
+ public void testRandomWithTiesAndRank() {
+ for (boolean multiplicative: new boolean[] { false, true}) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ XoRoShiRo128PlusRandom XoRoShiRo128PlusRandom = new XoRoShiRo128PlusRandom(0);
+ WeightedTau weightedTau = new WeightedTau(weigher, multiplicative);
+ final double v0[] = new double[1000];
+ final double v1[] = new double[1000];
+ for (int i = v0.length; i-- != 0;) {
+ v0[i] = XoRoShiRo128PlusRandom.nextInt(10);
+ v1[i] = XoRoShiRo128PlusRandom.nextInt(10);
+ }
+ final int[] rank = Util.identity(v0.length);
+ IntArrays.shuffle(rank, XoRoShiRo128PlusRandom);
+
+ double expResult = compute(weigher, multiplicative, v0, v1, rank);
+ double result = weightedTau.compute(v0, v1, rank);
+ assertEquals(expResult, result, 1E-10);
+
+ expResult = compute(weigher, multiplicative, v1, v0, rank);
+ result = weightedTau.compute(v1, v0, rank);
+ assertEquals(expResult, result, 1E-10);
+ }
+ }
+ }
+
+
+ @Test
+ public void testAllTies() {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ assertEquals(1.0, new WeightedTau(weigher).compute(allOnes, allZeroes, null), 1E-15);
+ }
+ }
+
+ @Test
+ public void testInputType() throws IOException {
+ File a = File.createTempFile(WeightedTauTest.class.getSimpleName(), "a");
+ a.deleteOnExit();
+ File b = File.createTempFile(WeightedTauTest.class.getSimpleName(), "b");
+ b.deleteOnExit();
+ for(boolean reverse: new boolean[] { true, false }) {
+ for (Int2DoubleFunction weigher : WEIGHER) {
+ WeightedTau weightedTau = new WeightedTau(weigher);
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeInts(new int[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.computeInts(a.toString(), b.toString(), reverse), 1E-15);
+ // TODO: main test
+ assertEquals(-1, weightedTau.compute(a.toString(), Integer.class, b.toString(), Integer.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.computeInts(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(1, weightedTau.compute(a.toString(), Integer.class, b.toString(), Integer.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.computeLongs(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(-1, weightedTau.compute(a.toString(), Long.class, b.toString(), Long.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.computeLongs(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(1, weightedTau.compute(a.toString(), Long.class, b.toString(), Long.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ BinIO.storeFloats(new float[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeFloats(new float[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.computeFloats(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(-1, weightedTau.compute(a.toString(), Float.class, b.toString(), Float.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeFloats(new float[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.computeFloats(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(1, weightedTau.compute(a.toString(), Float.class, b.toString(), Float.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeDoubles(new double[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.computeDoubles(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(-1, weightedTau.compute(a.toString(), Double.class, b.toString(), Double.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.computeDoubles(a.toString(), b.toString(), reverse), 1E-15);
+ assertEquals(1, weightedTau.compute(a.toString(), Double.class, b.toString(), Double.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ BinIO.storeInts(new int[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeDoubles(new double[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.compute(a.toString(), Integer.class, b.toString(), Double.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.compute(a.toString(), Integer.class, b.toString(), Double.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ BinIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.compute(a.toString(), Double.class, b.toString(), Long.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.compute(a.toString(), Double.class, b.toString(), Long.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ TextIO.storeDoubles(new double[] { 0, 1, 2, 3, 4 }, a);
+ BinIO.storeLongs(new long[] { 4, 3, 2, 1, 0 }, b);
+ assertEquals(-1, weightedTau.compute(a.toString(), String.class, b.toString(), Long.class, reverse, Integer.MAX_VALUE), 1E-15);
+ BinIO.storeLongs(new long[] { 0, 1, 2, 3, 4 }, b);
+ assertEquals(1, weightedTau.compute(a.toString(), String.class, b.toString(), Long.class, reverse, Integer.MAX_VALUE), 1E-15);
+
+ }
+ }
+ a.delete();
+ b.delete();
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/util/ConsistentHashFunctionTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/ConsistentHashFunctionTest.java
new file mode 100644
index 0000000..8e180b5
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/ConsistentHashFunctionTest.java
@@ -0,0 +1,188 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.Arrays;
+import java.util.Random;
+
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class ConsistentHashFunctionTest {
+
+ @Test
+ public void testRemove() {
+ ConsistentHashFunction<String> chf = new ConsistentHashFunction<String>();
+ final String o0 = "0", o1 = "1", o2 = "2";
+ Random r = new Random(1);
+
+ chf.add(o0, 1);
+ chf.add(o1, 1);
+ assertFalse (chf.add(o1, 1)) ; // To increase coverage
+ chf.remove(o1);
+ for(int i = 0; i < 1000000; i++) assertEquals(o0, chf.hash(r.nextLong()));
+ chf.add(o1, 1);
+ chf.add(o2, 1);
+ chf.remove(o1);
+ chf.remove(o2);
+ for(int i = 0; i < 1000000; i++) assertEquals(o0, chf.hash(r.nextLong()));
+ chf.remove(o2); // To increase coverage
+
+ }
+
+ @Test
+ public void testAdd() {
+ ConsistentHashFunction<String> chf = new ConsistentHashFunction<String>();
+ final String o0 = "0", o1 = "1", o2 = "2";
+ Random r = new Random(1);
+
+ chf.add(o0, 1);
+ chf.add(o1, 1);
+ chf.add(o2, 2);
+
+ boolean found0 = false, found1 = false, found2 = false;
+
+ for(int i = 0; i < 200; i++) {
+ if (chf.hash(r.nextLong()) == o0) found0 = true;
+ if (chf.hash(r.nextLong()) == o1) found1 = true;
+ if (chf.hash(r.nextLong()) == o2) found2 = true;
+ }
+
+ assertTrue(found0);
+ assertTrue(found1);
+ assertTrue(found2);
+ }
+
+
+ @Test
+ public void testStress() {
+ final Random r = new Random(1);
+ int nBucket = 100;
+ ObjectArrayList<String> bucket = new ObjectArrayList<String>();
+ for (int i = 0; i < nBucket; i++) bucket.add(Integer.toString(i));
+
+ ConsistentHashFunction<String> chf = new ConsistentHashFunction<String>();
+
+ for (int i = 0; i < nBucket; i++) chf.add(bucket.get(i), 1);
+
+ for (int t = 0; t < 10; t++) {
+ for (int p = 0; p < 10; p++)
+ assertTrue(bucket.contains(chf.hash(r.nextLong())));
+
+ int removals = Math.min(r.nextInt(5), bucket.size() - 1);
+ for (int k = 0; k < removals; k++) {
+ //System.out.printf("Removing %d/%d\n", k, removals);
+ String x = bucket.remove(r.nextInt(bucket.size()));
+ chf.remove(x);
+ }
+ int additions = r.nextInt(5);
+ for (int k = 0; k < additions; k++) {
+ //System.out.printf("Adding %d/%d\n", k, additions);
+ String x = Integer.toString(new Object().hashCode());
+ bucket.add(x);
+ chf.add(x, 1);
+ }
+
+ if (bucket.size() == 0) {
+ System.out.println("Adding out of emergency");
+ String x = Integer.toString(new Object().hashCode());
+ bucket.add(x);
+ chf.add(x, 1);
+ }
+ assertEquals(bucket.size(), chf.buckets().size());
+ }
+ }
+
+ @Test
+ public void testSecondChance() {
+ final Random r = new Random(1);
+ int nBucket = 1 + r.nextInt(4);
+ ConsistentHashFunction<String> chf = new ConsistentHashFunction<String>();
+ for (int i = 0; i < nBucket; i++) chf.add(Integer.toString(i), 1);
+
+ for (int t = 0; t < 500; t++) {
+ long sample = r.nextLong();
+ Object[] chances = chf.hash(sample, Math.min(chf.buckets().size(), r.nextInt(3) + 2));
+ System.out.println("Chances for " + sample + " are " + Arrays.toString(chances) + " out of " + chf.buckets());
+ for (int i = 0; i < chances.length; i++) {
+ assertEquals(chf.hash(sample) + " != " + chances[i], chf.hash(sample), chances[i]);
+ chf.remove((String)chances[i]);
+ }
+ for (int i = chances.length - 1; i >= 0; i--) {
+ chf.add((String) chances[i], 1);
+ //assertEquals(chf.hash(sample), chances[i]);
+ }
+
+ assertTrue(sample + ": " + Arrays.toString(chances) + " != " + Arrays.toString(chf.hash(sample, chances.length)) + " (size=" + chf.buckets().size() + ")",
+ Arrays.equals(chances, chf.hash(sample, chances.length)));
+ }
+ }
+
+ @Test
+ public void testSpecial() {
+ ConsistentHashFunction<String> chf = new ConsistentHashFunction<String>();
+ long sample = -3599839008849623859L;
+ chf.add("0", 1);
+ chf.add("1", 1);
+ chf.add("2", 1);
+
+ Object[] r;
+ System.out.println(Arrays.toString(r = chf.hash(sample, 3)));
+ for (int i = 0; i < r.length; i++) {
+ assertEquals(chf.hash(sample) + " != " + r[i], chf.hash(sample), r[i]);
+ chf.remove((String)r[i]);
+ }
+ for (int i = r.length - 1; i >= 0; i--) chf.add((String) r[i], 1);
+
+ System.out.println(Arrays.toString(chf.hash(sample, 3)));
+ }
+
+ @Test
+ public void testConsistency() {
+ final Random r = new Random(1);
+ int nBucket = 1 + r.nextInt(4);
+ ObjectArrayList<String> bucket = new ObjectArrayList<String>();
+ for (int i = 0; i < nBucket; i++) bucket.add(Integer.toString(i));
+
+ ConsistentHashFunction<String> chf = new ConsistentHashFunction<String>();
+
+ for (int i = 0; i < nBucket; i++) chf.add(bucket.get(i), 1);
+
+ for (int t = 0; t < 500; t++) {
+ long sample = r.nextLong();
+ String a = chf.hash(sample);
+ String b = "foo";
+ chf.add(b, 1);
+ String c = chf.hash(sample);
+ assertTrue(c == a || c == b);
+ if (c == a) System.out.print("*"); else System.out.print("-");
+ chf.remove(b);
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/util/KahanSummationTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/KahanSummationTest.java
new file mode 100644
index 0000000..8849405
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/KahanSummationTest.java
@@ -0,0 +1,46 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2011-2019 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+
+
+//RELEASE-STATUS: DIST
+
+public class KahanSummationTest {
+ @Test
+ public void testSum() {
+ KahanSummation sum = new KahanSummation();
+ sum.add(1);
+ sum.add(2);
+ sum.add(3);
+ assertEquals(6, sum.value(), 0);
+ }
+
+ @Test
+ public void testDifficult() {
+ KahanSummation sum = new KahanSummation();
+ sum.add(Double.MIN_NORMAL);
+ sum.add(Double.MIN_NORMAL);
+ sum.add(-Double.MIN_NORMAL);
+ assertEquals(Double.MIN_NORMAL, sum.value(), 0);
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/util/NormTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/NormTest.java
new file mode 100644
index 0000000..689ba4e
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/NormTest.java
@@ -0,0 +1,102 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class NormTest {
+
+ private final double[] a = { 0.0, -1.0/2.0, 0.0, 1.0/2.0 };
+ private final double[] b = { -1.0/3.0, 0.0, -1.0/3.0, 0.0 };
+
+ @Test
+ public void testL1ComputeABdouble() {
+ double expResult = 1.0 + 2.0 / 3.0;
+ double result = Norm.L_1.compute(a, b);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testL1ComputeAdouble() {
+ double expResult = 1.0;
+ double result = Norm.L_1.compute(a);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testL1Normalize() {
+ double[] v = { 1, 2, -3, 4.5, -5.5 };
+ double expResult = 1.0;
+ Norm.L_1.normalize(v, expResult);
+ double result = Norm.L_1.compute(v);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testL2ComputeABdouble() {
+ double expResult = Math.sqrt(13.0 / 18.0);
+ double result = Norm.L_2.compute(a, b);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testL2ComputeAdouble() {
+ double expResult = Math.sqrt(1.0 / 2.0);
+ double result = Norm.L_2.compute(a);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testL2Normalize() {
+ double[] v = { 1, 2, -3, 4.5, -5.5 };
+ double expResult = 2.0;
+ Norm.L_2.normalize(v, expResult);
+ double result = Norm.L_2.compute(v);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testLIComputeABdouble() {
+ double expResult = 1 / 2.;
+ double result = Norm.L_INFINITY.compute(a, b);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testLIComputeAdouble() {
+ double expResult = 1 / 2.;
+ double result = Norm.L_INFINITY.compute(a);
+ assertEquals(expResult, result, .0);
+ }
+
+ @Test
+ public void testLINormalize() {
+ double[] v = { 1, 2, -3, 4.5, -5.5 };
+ double expResult = 2.0;
+ Norm.L_INFINITY.normalize(v, expResult);
+ double result = Norm.L_INFINITY.compute(v);
+ assertEquals(expResult, result, .0);
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/util/PrecisionTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/PrecisionTest.java
new file mode 100644
index 0000000..00adc35
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/util/PrecisionTest.java
@@ -0,0 +1,125 @@
+package it.unimi.dsi.law.util;
+
+/*
+ * Copyright (C) 2008-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+
+import org.junit.Test;
+
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class PrecisionTest {
+
+
+ @Test
+ public void testValue() {
+ assertEquals(Math.PI, Precision.truncate(Math.PI, 10), Math.pow(2, -10));
+ assertEquals(Math.PI, Precision.truncate(Math.PI, 13), Math.pow(2, -13));
+ assertEquals(Math.PI, Precision.truncate(Math.PI, 17), Math.pow(2, -17));
+ }
+
+ @Test
+ public void testOnlyFractional() {
+ assertEquals(10, Precision.truncate(10, 0), 0);
+ assertEquals(10, Precision.truncate(10, 1), 0);
+ assertEquals(10, Precision.truncate(10.5, 0), 0);
+ assertEquals(10.5, Precision.truncate(10.5, 1), 0);
+ }
+
+ @Test
+ public void testBin() {
+ double[] v = new double[] { 0.5, 1.5 };
+ Precision.truncate(v, 2);
+ assertEquals(0.5, v[0], Math.pow(2, -2));
+ assertEquals(1.5, v[1], Math.pow(2, -2));
+ v[0] = 0.25;
+ v[1] = 1.25;
+ Precision.truncate(v, 1);
+ assertEquals(0.25, v[0], Math.pow(2, Integer.MAX_VALUE));
+ assertEquals(1.25, v[1], Math.pow(2, Integer.MAX_VALUE));
+ }
+
+
+ private static final BigDecimal TWO = new BigDecimal(2);
+
+ public static double mockTruncate(double value, int numberOfSignificantDigits) {
+ final BigDecimal powerOfTwo = numberOfSignificantDigits < 0 ?
+ BigDecimal.ONE.divide(TWO.pow(- numberOfSignificantDigits)) : TWO.pow(numberOfSignificantDigits);
+ final BigDecimal valueAsBigDecimal = new BigDecimal(value);
+ final BigInteger x = valueAsBigDecimal.multiply(powerOfTwo).toBigInteger();
+ final BigDecimal result = new BigDecimal(x).divide(powerOfTwo);
+ return result.doubleValue();
+ }
+
+ @Test
+ public void testTruncate() {
+ final XoRoShiRo128PlusRandom r = new XoRoShiRo128PlusRandom(0);
+ for (int i = 0; i < 1000; i++) {
+ final double d = Math.atan(r.nextDouble());
+ final int k = r.nextInt(1024);
+ assertEquals(mockTruncate(d, k), Precision.truncate(d, k), 0);
+ }
+
+ for (int i = 0; i < 1000; i++) {
+ final double d = - Math.atan(r.nextDouble());
+ final int k = r.nextInt(1024);
+ assertEquals(mockTruncate(d, k), Precision.truncate(d, k), 0);
+ }
+ }
+
+ @Test
+ public void testNegative() {
+ final XoRoShiRo128PlusRandom r = new XoRoShiRo128PlusRandom(0);
+ for (int i = 0; i < 1000; i++) {
+ final int d = r.nextInt(10000);
+ final int k = r.nextInt(20);
+ final double t = Precision.truncate(d, -k);
+ assertEquals(Math.floor(t), t, 0);
+ assertEquals(d & (-1 << k), (int)t);
+ }
+
+ for (int i = 0; i < 1000; i++) {
+ final int d = - r.nextInt(10000);
+ final int k = r.nextInt(20);
+ final double t = Precision.truncate(d, -k);
+ assertEquals(Math.floor(t), t, 0);
+ assertEquals(mockTruncate(d, -k), t, 0);
+ }
+ }
+
+ @Test
+ public void testUnchanged() {
+ final XoRoShiRo128PlusRandom r = new XoRoShiRo128PlusRandom(0);
+ for (int i = 0; i < 1000; i++) {
+ final double d = 100 * Math.atan(r.nextDouble());
+ assertEquals(d, Precision.truncate(d, Integer.MAX_VALUE), 0);
+ }
+ for (int i = 0; i < 1000; i++) {
+ final double d = -100 * Math.atan(r.nextDouble());
+ assertEquals(d, Precision.truncate(d, Integer.MAX_VALUE), 0);
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/filters/TestFilters.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/filters/TestFilters.java
new file mode 100644
index 0000000..f26831d
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/filters/TestFilters.java
@@ -0,0 +1,172 @@
+package it.unimi.dsi.law.warc.filters;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.net.URI;
+
+import org.junit.Test;
+
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.filters.parser.FilterParser;
+import it.unimi.dsi.law.warc.filters.parser.ParseException;
+
+
+
+//RELEASE-STATUS: DIST
+
+
+/** A class to test {@link AbstractFilter}. */
+
+@SuppressWarnings("deprecation")
+public class TestFilters {
+ public static class StartsWithStringFilter extends AbstractFilter<String> {
+ private String prefix;
+ public StartsWithStringFilter(String prefix) { this.prefix = prefix; }
+ public boolean apply(String x) { return x.startsWith(prefix); }
+ public static StartsWithStringFilter valueOf(String args) {
+ return new StartsWithStringFilter(args);
+ }
+ public String toString() {
+ return toString(prefix);
+ }
+ }
+
+ public static class EndsWithStringFilter extends AbstractFilter<String> {
+ private String suffix;
+ public EndsWithStringFilter(String suffix) { this.suffix = suffix; }
+ public boolean apply(String x) { return x.endsWith(suffix); }
+ public static EndsWithStringFilter valueOf(String args) {
+ return new EndsWithStringFilter(args);
+ }
+ public String toString() {
+ return toString(suffix);
+ }
+ }
+
+ public static class LongerThanStringFilter extends AbstractFilter<String> {
+ private int bound;
+ public LongerThanStringFilter(int bound) { this.bound = bound; }
+ public boolean apply(String x) { return x.length() > bound; }
+ public static LongerThanStringFilter valueOf(String args) {
+ return new LongerThanStringFilter(Integer.parseInt(args));
+ }
+ public String toString() {
+ return toString(String.valueOf(bound));
+ }
+ }
+
+
+ @Test
+ public void testBooleanComposition() {
+ AbstractFilter<String> iniziaConA = new StartsWithStringFilter("a");
+ AbstractFilter<String> finisceConA = new EndsWithStringFilter("a");
+ AbstractFilter<String> finisceConB = new EndsWithStringFilter("b");
+ AbstractFilter<String> lungaPiuDi5 = new LongerThanStringFilter(5);
+
+ Filter<String> composto = Filters.and(iniziaConA, Filters.or(finisceConA, finisceConB), Filters.not(lungaPiuDi5));
+
+ assertTrue(composto.apply("ab"));
+ assertTrue(composto.apply("addb"));
+ assertTrue(composto.apply("adda"));
+ assertFalse(composto.apply("dda"));
+ assertFalse(composto.apply("adddddda"));
+ assertFalse(composto.apply("ad"));
+ }
+
+ @Test
+ public void testParsingTrue() throws ParseException {
+ FilterParser<String> filterParser = new FilterParser<String>(String.class);
+ assertTrue(filterParser.parse("TRUE").apply(new String()));
+ }
+
+ @Test
+ public void testParsing() throws ParseException {
+ FilterParser<String> filterParser = new FilterParser<String>(String.class);
+ Filter<String> composto = filterParser.parse(
+ "it.unimi.dsi.law.warc.filters.TestFilters$StartsWithStringFilter(a)" +
+ " and " +
+ "it.unimi.dsi.law.warc.filters.TestFilters$EndsWithStringFilter(a) " +
+ " or " +
+ "it.unimi.dsi.law.warc.filters.TestFilters$EndsWithStringFilter(b)"
+ );
+ System.out.println("TESTING: " + composto);
+ assertTrue(composto.apply("aa"));
+ assertTrue(composto.apply("bb"));
+ assertFalse(composto.apply("dda"));
+ assertFalse(composto.apply("add"));
+ }
+
+ @Test
+ public void testURLParsing() throws ParseException {
+ FilterParser<URI> filterParser = new FilterParser<URI>(URI.class);
+
+ Filter<URI> filter = filterParser.parse("HostEquals(www.dsi.unimi.it) or (HostEndsWith(.it) and not URLMatchesRegex(.*vigna.*))");
+ System.out.println("TESTING: " + filter);
+ assertTrue(filter.apply(BURL.parse("http://www.dsi.unimi.it/index.php")));
+ assertTrue(filter.apply(BURL.parse("http://www.foo.it/index.php")));
+ assertFalse(filter.apply(BURL.parse("http://www.vigna.foo.it/index.php")));
+ assertFalse(filter.apply(BURL.parse("http://www.foo.com/index.php")));
+
+ filter = filterParser.parse("PathEndsWithOneOf(html,htm,php) and not PathEndsWithOneOf(mahtml)");
+ System.out.println("TESTING: " + filter);
+ assertTrue(filter.apply(BURL.parse("http://www.dsi.unimi.it/index.php")));
+ assertTrue(filter.apply(BURL.parse("http://www.foo.it/index.html")));
+ assertFalse(filter.apply(BURL.parse("http://www.foo.it/index.mahtml")));
+ assertTrue(filter.apply(BURL.parse("http://www.vigna.foo.it/index.PHP?sadmdsak")));
+ assertFalse(filter.apply(BURL.parse("http://www.foo.com/a/b/c/index.jpg")));
+ }
+
+ @Test
+ public void testDuplicateSegments() throws ParseException {
+ FilterParser<URI> filterParser = new FilterParser<URI>(URI.class);
+ Filter<URI> filter = filterParser.parse("DuplicateSegmentsLessThan(3)");
+ System.out.println("TESTING: " + filter);
+ assertFalse(filter.apply(BURL.parse("http://example.com/a/a/a/a/a")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/a/b/a/b/a/-")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/a/b/a/a/a")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/bbb/bbba/f/e")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/l/lc/i/c/l/lc/p/i/c/l/lc/p/l/lc/i/c/l/lc/p/i/c/l/lc/p/i/c/l/lc/p/")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/d/b/c/b/e")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/b/b")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/a/b/a/b/a/")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/a/b/a/b/a/-")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/foo/bar/foo/bar/foo/bar")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/a/b/a/b/c/b/a/")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/a/b/a/b/b/a/")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/b")));
+ assertTrue(filter.apply(BURL.parse("http://a")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/b")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/b/")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/a/b/b/b")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/a/b/a/c/a/c/a/c")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/b/b/a")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/a/d/b/a/d/b/a/d")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/b/a/d/b/a/d/b/a/d/z")));
+ assertTrue(filter.apply(BURL.parse("http://example.com/b/b/a/b/b/a/b/a")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/a/b/b/b")));
+ assertFalse(filter.apply(BURL.parse("http://example.com/c/b/b/b")));
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/DebugInputStream.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/DebugInputStream.java
new file mode 100644
index 0000000..e17f2a4
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/DebugInputStream.java
@@ -0,0 +1,86 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.util.Arrays;
+
+import org.apache.commons.lang.ArrayUtils;
+
+
+//RELEASE-STATUS: DIST
+
+
+/** An input stream that prints out some details for every call, used for debugging purposes. */
+
+@SuppressWarnings("boxing")
+public class DebugInputStream extends InputStream {
+
+ private final String name;
+ private final InputStream is;
+
+ public DebugInputStream(String name, InputStream is) {
+ this.name = name;
+ this.is = is;
+ }
+
+ public int available() throws IOException {
+ final int available = is.available();
+ System.err.printf(name + ": available() -> %d\n", available);
+ return available;
+ }
+
+ public void close() throws IOException {
+ is.close();
+ }
+
+ public void mark(int readlimit) {
+ System.err.printf(name + ": mark(%d)\n", readlimit);
+ is.mark(readlimit);
+ }
+
+ public boolean markSupported() {
+ return is.markSupported();
+ }
+
+ public int read() throws IOException {
+ final int read = is.read();
+ System.err.printf(name + ": read() -> %d\n", read);
+ return read;
+ }
+
+ public int read(byte[] b, int off, int len) throws IOException {
+ final int read = is.read(b, off, len);
+ System.err.printf(name + ": read(-, %d, %d) -> %d, " + Arrays.toString(ArrayUtils.subarray(b, off, read < 0 ? 0 : read)) + "\n", off, len, read);
+ return read;
+ }
+
+ public void reset() throws IOException {
+ System.err.println(name + ": reset()");
+ is.reset();
+ }
+
+ public long skip(long n) throws IOException {
+ final long skip = is.skip(n);
+ System.err.printf(name + ": skip(%d) -> %d\n", n, skip);
+ return skip;
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestBoundedCountingInputStream.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestBoundedCountingInputStream.java
new file mode 100644
index 0000000..3e860ea
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestBoundedCountingInputStream.java
@@ -0,0 +1,147 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+import org.apache.poi.util.IOUtils;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+
+
+
+//RELEASE-STATUS: DIST
+
+
+/** A class to test {@link BoundedCountingInputStream}. */
+
+public class TestBoundedCountingInputStream {
+
+ private static final Random r = new Random(0);
+
+ private static List<byte[]> byteArrays;
+ static {
+ byteArrays = new ArrayList<byte[]>();
+ byte[] b;
+ // Now generates byte buffers from 1 byte up to 64KiB; we shuffle them so that they are not increasing in size...
+ for (int k = 0; k < 10; k++) {
+ b = new byte[1 << k];
+ r.nextBytes(b);
+ byteArrays.add(b);
+ }
+ for (int k = 16; k >= 10; k--) {
+ b = new byte[1 << k];
+ r.nextBytes(b);
+ byteArrays.add(b);
+ }
+ byteArrays.add(new byte[] {});
+ byteArrays.add("This is a short\nnon empty and purely ASCII\nbyte sequence".getBytes());
+ }
+
+ @Test
+ public void testSequentialRead() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ //System.out.println("TESTING SEQUENTIAL READ FOR SIZE " + byteArray.length);
+ BoundedCountingInputStream is = new BoundedCountingInputStream(new FastByteArrayInputStream(byteArray), Long.MAX_VALUE); // Use 1KiB buffer
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ int bbs;
+ while ((bbs = bs.read()) != -1)
+ assertEquals(bbs, is.read());
+ assertEquals(is.read(), -1);
+ is.close();
+ bs.close();
+ assertEquals(byteArray.length, is.position());
+ }
+ }
+
+ @Test
+ public void testBoundedSequentialRead() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ //System.out.println("TESTING SEQUENTIAL READ FOR SIZE " + byteArray.length);
+ long bound = r.nextInt(1 + (int)(byteArray.length * 1.5));
+ BoundedCountingInputStream is = new BoundedCountingInputStream(new FastByteArrayInputStream(byteArray), bound);
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ int bbs;
+ long i = bound;
+ while ((i > 0) && (bbs = bs.read()) != -1) {
+ assertEquals(bbs, is.read());
+ i--;
+ }
+ assertEquals(is.read(), -1);
+ is.close();
+ bs.close();
+ assertEquals(Math.min(bound, byteArray.length), is.position());
+ }
+ }
+
+ @Test
+ public void testReadBulk() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ //System.out.println("TESTING READ BULK FOR SIZE " + byteArray.length);
+ BoundedCountingInputStream is = new BoundedCountingInputStream(new FastByteArrayInputStream(byteArray), Long.MAX_VALUE);
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ // Decide how many reads
+ int reads = r.nextInt(5);
+ for (int t = 0; t < reads; t++) {
+ byte[] bis = new byte[r.nextInt(1 + byteArray.length * 3 / 2)];
+ byte[] bbs = new byte[bis.length];
+ int offset = bis.length < 2 ? 0 : r.nextInt(bis.length / 2);
+ int length = bis.length - offset == 0? 0 : r.nextInt(bis.length - offset);
+ int res1 = IOUtils.readFully(is, bis, offset, length);
+ int res2 = IOUtils.readFully(bs, bbs, offset, length);
+ assertEquals(res1, res2);
+ for (int i = 0; i < Math.max(res1, 0); i++) {
+ assertEquals(bis[offset + i], bbs[offset + i]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testBoundedReadBulk() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ long bound = r.nextInt(1 + (int)(byteArray.length * 1.5));
+ //System.out.println("TESTING READ BULK FOR SIZE " + byteArray.length);
+ BoundedCountingInputStream is = new BoundedCountingInputStream(new FastByteArrayInputStream(byteArray), bound);
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ // Decide how many reads
+ int reads = r.nextInt(5);
+ for (int t = 0; t < reads; t++) {
+ byte[] bis = new byte[r.nextInt(1 + byteArray.length * 3 / 2)];
+ byte[] bbs = new byte[bis.length];
+ int offset = bis.length < 2 ? 0 : r.nextInt(bis.length / 2);
+ int length = bis.length - offset == 0? 0 : r.nextInt(bis.length - offset);
+ length = (int)Math.min(length, bound - is.position());
+ int res1 = IOUtils.readFully(is, bis, offset, length);
+ int res2 = IOUtils.readFully(bs, bbs, offset, length);
+ assertEquals(res1, res2);
+ for (int i = 0; i < Math.max(res1, 0); i++) {
+ assertEquals(bis[offset + i], bbs[offset + i]);
+ }
+ }
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestGZWarcRecord.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestGZWarcRecord.java
new file mode 100644
index 0000000..ae59615
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestGZWarcRecord.java
@@ -0,0 +1,298 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Random;
+
+import org.apache.commons.io.IOUtils;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.bubing.util.MockResponses.MockRandomHttpResponse;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.law.warc.util.AbstractHttpResponse;
+import it.unimi.dsi.law.warc.util.WarcHttpResponse;
+import it.unimi.dsi.logging.ProgressLogger;
+
+
+//RELEASE-STATUS: DIST
+
+
+/** A class to test {@link GZWarcRecord}. */
+
+public class TestGZWarcRecord {
+ private final static Logger LOGGER = LoggerFactory.getLogger(TestGZWarcRecord.class);
+
+ public static final boolean DEBUG = false;
+ private final static Random RND = new Random(0);
+
+ static public class MockGZWarcRecord extends GZWarcRecord {
+ private final static int DEFAULT_MAX_BLOCK_LENGTH = 1024;
+ private static int calls = 0;
+ private final byte[] blockBytes;
+ public MockGZWarcRecord() {
+ TestWarcRecord.rndFill(header, calls++);
+ blockBytes = TestWarcRecord.rndBlock(DEFAULT_MAX_BLOCK_LENGTH);
+ block = new FastByteArrayInputStream(blockBytes);
+ }
+ public MeasurableInputStream expectedBlock() {
+ return new FastByteArrayInputStream(blockBytes);
+ }
+ public MeasurableInputStream actualBlock() {
+ return block; // this will be overwritten by read
+ }
+ }
+
+ final static int NUM_WRS_TESTS = 100;
+
+ @Test
+ public void testRecord() throws IOException, FormatException {
+
+ System.err.print("Test on mock random gzip records: ");
+
+ /* a place to remember written HttpResponses and WarcRecods. */
+
+ final ArrayList<MockGZWarcRecord> writtenRecords = new ArrayList<MockGZWarcRecord>();
+
+ /* write and remember */
+
+ final FastByteArrayOutputStream out = new FastByteArrayOutputStream();
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+ final MockGZWarcRecord writtenRecord = new MockGZWarcRecord();
+ writtenRecord.write(out);
+ writtenRecords.add(writtenRecord);
+ System.err.print("w");
+ }
+ out.close();
+
+ System.err.print("/");
+
+ /* a place to read */
+
+ final MockGZWarcRecord readRecord = new MockGZWarcRecord();
+
+ /* read and compare */
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FastByteArrayInputStream(out.array, 0, out.length));
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+
+ final MockGZWarcRecord writtenRecord = writtenRecords.get(i);
+
+ if (RND.nextBoolean()) { // read
+
+ readRecord.read(in);
+
+ if (DEBUG) System.err.println("\n" + readRecord.header + "\n" + writtenRecord.header + "\n" + readRecord.gzheader + "\n" + writtenRecord.gzheader);
+
+ assertEquals(readRecord.header, writtenRecord.header);
+
+ if (RND.nextBoolean()) { // partial read
+
+ final MeasurableInputStream block = readRecord.actualBlock();
+ readRecord.actualBlock().read(new byte[RND.nextInt((int)(block.length() / 2)) + 1]);
+
+ System.err.print("r");
+
+ } else { // consume block
+
+ IOUtils.contentEquals(writtenRecord.expectedBlock(), readRecord.actualBlock());
+ System.err.print("R");
+
+ }
+
+ if (RND.nextBoolean()) { // checkCRC
+
+ readRecord.checkCRC(in);
+ System.err.print("c");
+
+ } else System.err.print(".");
+
+ } else { // skip
+
+ long length = readRecord.skip(in);
+ assertEquals(writtenRecord.gzheader.compressedSkipLength, length);
+ System.err.print("s");
+
+ }
+
+ }
+ in.close();
+
+ System.err.println(" done.");
+ }
+
+ @Test
+ public void testResponse() throws IOException, FormatException {
+
+ System.err.print("Test on mock random gzip repsonces: ");
+
+ /* a place to remember written HttpResponses and WarcRecods. */
+
+ final ArrayList<MockRandomHttpResponse> writtenResponses = new ArrayList<MockRandomHttpResponse>();
+ final ArrayList<GZWarcRecord> writtenGZRecords = new ArrayList<GZWarcRecord>();
+
+ /* write and remember */
+
+ final FastByteArrayOutputStream out = new FastByteArrayOutputStream();
+
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+ final MockRandomHttpResponse writtenResponse = new MockRandomHttpResponse(RND);
+ final GZWarcRecord writtenRecord = new GZWarcRecord();
+ writtenResponses.add(writtenResponse);
+ writtenResponse.toWarcRecord(writtenRecord);
+ if (RND.nextBoolean()) {
+ writtenRecord.header.anvlFields.clear();
+ writtenRecord.header.anvlFields.put("anvl-test-key", "anvl-test-value");
+ }
+
+ writtenRecord.write(out);
+ writtenGZRecords.add(writtenRecord);
+ System.err.print("w");
+ }
+ out.close();
+
+ System.err.print("/");
+
+ /* a place to read */
+
+ WarcHttpResponse readResponse = new WarcHttpResponse();
+ GZWarcRecord readRecord = new GZWarcRecord();
+
+ /* read and compare */
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FastByteArrayInputStream(out.array, 0, out.length));
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+
+ final GZWarcRecord writtenGZRercord = writtenGZRecords.get(i);
+
+ if (RND.nextBoolean()) { // read
+
+ readRecord.read(in);
+ assertEquals(writtenGZRercord.header, readRecord.header);
+ assertEquals(writtenGZRercord.gzheader, readRecord.gzheader);
+
+ if (RND.nextBoolean()) { // don't consume block
+
+ System.err.print("r");
+
+ } else { // consume block
+
+ final MockRandomHttpResponse writtenResponse = writtenResponses.get(i);
+ readResponse.fromWarcRecord(readRecord);
+ assertTrue(IOUtils.contentEquals(writtenResponse.expectedContentAsStream(), readResponse.contentAsStream()));
+
+ System.err.print("R");
+
+ }
+
+ } else { // skip
+
+ long length = readRecord.skip(in);
+ assertEquals(writtenGZRercord.gzheader.compressedSkipLength, length);
+ System.err.print("s");
+
+ }
+
+ }
+ in.close();
+
+ System.err.println(" done.");
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+ public static void main(String[] arg) throws FileNotFoundException, IOException, FormatException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(TestWarcRecord.class.getName(), "GZWarcRecord performance test.",
+ new Parameter[] {
+ new UnflaggedOption("numPages", JSAP.INTEGER_PARSER, "10000", JSAP.REQUIRED, false, "The number of pages to write."),
+ new UnflaggedOption("maxPageSize", JSAP.INTEGER_PARSER, "1024", JSAP.REQUIRED, false, "The maximum size of page content."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ int numPages = jsapResult.getInt("numPages");
+ int maxPageSize = jsapResult.getInt("maxPageSize");
+
+ File tmp = File.createTempFile("warctest-", ".gz");
+ FastBufferedOutputStream out = new FastBufferedOutputStream(new FileOutputStream(tmp), IO_BUFFER_SIZE);
+
+ GZWarcRecord gzwr = new GZWarcRecord();
+
+ ProgressLogger pl = new ProgressLogger(LOGGER, "responses");
+ pl.start("Generating/Writing gzip mock responses (in '" + tmp + "')...");
+ for (int i = 0; i < numPages; i++) {
+ AbstractHttpResponse r = new MockRandomHttpResponse(RND, maxPageSize);
+ r.toWarcRecord(gzwr);
+ gzwr.write(out);
+ pl.update();
+ }
+ pl.done();
+ out.close();
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FileInputStream(tmp), IO_BUFFER_SIZE);
+ WarcHttpResponse whr = new WarcHttpResponse();
+ pl.start("Reading gzip responses...");
+ gzwr.resetRead();
+ for (int i = 0; i < numPages; i++) {
+ gzwr.read(in);
+ whr.fromWarcRecord(gzwr);
+ pl.update();
+ }
+ pl.done();
+ in.close();
+
+ pl.itemsName = "records";
+ in = new FastBufferedInputStream(new FileInputStream(tmp), IO_BUFFER_SIZE);
+ pl.start("Skipping gzip records...");
+ gzwr.resetRead();
+ for (int i = 0; i < numPages; i++) {
+ gzwr.read(in);
+ whr.fromWarcRecord(gzwr);
+ pl.update();
+ }
+ pl.done();
+ in.close();
+
+ }
+
+
+
+}
+
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestInspectableBufferedInputStream.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestInspectableBufferedInputStream.java
new file mode 100644
index 0000000..ade0626
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestInspectableBufferedInputStream.java
@@ -0,0 +1,327 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.ByteArrayInputStream;
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+import org.apache.poi.util.IOUtils;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+
+
+
+//RELEASE-STATUS: DIST
+
+
+/** A class to test {@link InspectableBufferedInputStream}. */
+
+@SuppressWarnings("boxing")
+public class TestInspectableBufferedInputStream {
+
+ private static final Random r = new Random(0);
+
+ /** A byte array input stream that will return its data in small chunks,
+ * even it could actually return more data.
+ */
+
+ public static class BastardByteArrayInputStream extends FastByteArrayInputStream {
+ private final static int BASTARD_LIMIT = 42;
+
+ public BastardByteArrayInputStream(byte[] array) {
+ super(array);
+ }
+
+ @Override
+ public int read(byte[] buffer, int offset, int length) {
+ return super.read(buffer, offset, length < BASTARD_LIMIT ? length : BASTARD_LIMIT);
+ }
+
+ }
+
+ public static List<byte[]> byteArrays;
+ static {
+ byteArrays = new ArrayList<byte[]>();
+ byte[] b;
+ // Now generates byte buffers from 1 byte up to 64KiB; we shuffle them so that they are not increasing in size...
+ for (int k = 0; k < 10; k++) {
+ b = new byte[1 << k];
+ r.nextBytes(b);
+ byteArrays.add(b);
+ }
+ for (int k = 16; k >= 10; k--) {
+ b = new byte[1 << k];
+ r.nextBytes(b);
+ byteArrays.add(b);
+ }
+ byteArrays.add(new byte[] {});
+ byteArrays.add("This is a short\nnon empty and purely ASCII\nbyte sequence".getBytes());
+ }
+
+
+ @Test
+ public void testSequentialRead() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ System.out.printf("TESTING SEQUENTIAL READ FOR SIZE %d\n", byteArray.length);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ @SuppressWarnings("resource")
+ BastardByteArrayInputStream bs = new BastardByteArrayInputStream(byteArray);
+ int bbs;
+ while ((bbs = bs.read()) != -1)
+ assertEquals(bbs, is.read());
+ assertEquals(is.read(), -1);
+ is.close();
+ }
+ }
+
+ @Test
+ public void testReadFullyTotal() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ System.out.printf("TESTING FULL READ FOR SIZE %d\n", byteArray.length);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ is.fillAndRewind();
+ @SuppressWarnings("resource")
+ BastardByteArrayInputStream bs = new BastardByteArrayInputStream(byteArray);
+ int bbs;
+ while ((bbs = bs.read()) != -1)
+ assertEquals(bbs, is.read());
+ assertEquals(is.read(), -1);
+ assertEquals(is.readBytes(), byteArray.length);
+ is.close();
+ }
+ }
+
+ @Test
+ public void testReadFullyPartial() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ is.truncate(0);
+ if (byteArray.length == 0) continue;
+ int readJust = r.nextInt(byteArray.length);
+ System.out.printf("TESTING FULL READ FOR SIZE %d, LIMITED AT %d\n", byteArray.length, readJust);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ is.fill(readJust);
+ is.rewind();
+ @SuppressWarnings("resource")
+ BastardByteArrayInputStream bs = new BastardByteArrayInputStream(byteArray);
+ int bbs;
+ int read = 0;
+ while (read < readJust && (bbs = bs.read()) != -1) {
+ assertEquals(bbs, is.read());
+ read++;
+ }
+ assertTrue(is.overflowLength() <= readJust);
+ is.close();
+ }
+ }
+
+ @Test
+ public void testMultipleSequentialRead() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ System.out.printf("TESTING MULTIPLE SEQUENTIAL READ FOR SIZE %d (>=total,<=partial): ", byteArray.length);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+
+ // How many read
+ int k = r.nextInt(10);
+ for (int s = 0; s < k; s++) {
+ is.rewind();
+ @SuppressWarnings("resource")
+ BastardByteArrayInputStream bs = new BastardByteArrayInputStream(byteArray);
+ int bbs;
+ if (r.nextDouble() < .5) {
+ System.out.print(">");
+ // Read to EOF
+ while ((bbs = bs.read()) != -1)
+ assertEquals(bbs, is.read());
+ assertEquals(is.read(), -1);
+ } else {
+ // Read only partially
+ int howManyBytes = r.nextInt(byteArray.length + 1);
+ System.out.print("<");
+ for (int t = 0; t < howManyBytes; t++)
+ assertEquals(bs.read(), is.read());
+ }
+ }
+ System.out.println();
+ is.close();
+ }
+ }
+
+ @Test
+ public void testTruncate() throws IOException {
+ // Creates a temporary directory
+ File tempDir = File.createTempFile("mydir", null);
+ tempDir.delete();
+ tempDir.mkdir();
+ //tempDir.deleteOnExit();
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024, tempDir); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ System.out.printf("TESTING truncate() FOR SIZE %d (+=truncation): ", byteArray.length);
+ // Reads the is 100 bytes at a time up to (almost) its end...
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ @SuppressWarnings("resource")
+ BastardByteArrayInputStream bs = new BastardByteArrayInputStream(byteArray);
+ byte[] bis = new byte[100];
+ byte[] bbs = new byte[100];
+ int readFromIs, readFromBis;
+ while ((readFromIs = is.read(bis)) > 0) {
+ int from = 0, howManyLeft = readFromIs;
+ do {
+ readFromBis = bs.read(bbs, from, howManyLeft);
+ from += readFromBis;
+ howManyLeft -= readFromBis;
+ } while (howManyLeft > 0);
+ for (int i = 0; i < readFromIs; i++)
+ assertEquals(bbs[i], bis[i]);
+ }
+ is.close();
+ //if (1 > 0) return;
+ File t = is.overflowFile;
+ if (r.nextDouble() < .9) {
+ long newSize = r.nextInt(1 + (int)(t.length() * 2));
+ System.out.print("+");
+ is.truncate(newSize);
+ assertEquals(tempDir.listFiles().length, 1); // There should be just one single temporary file here
+ System.out.println("t.length()=" + t.length() + ", newSize=" + newSize);
+ assertTrue(t.length() <= newSize);
+ System.out.printf("(%d<=%d)", t.length(), newSize);
+ }
+ System.out.println();
+ }
+ }
+
+ @Test
+ public void testDirectInspection() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ System.out.printf("TESTING DIRECT INSPECTION FOR SIZE %d\n", byteArray.length);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ // Read some random number of bytes
+ int toBeRead = r.nextInt(byteArray.length + 1), read, from;
+ byte[] bis = new byte[toBeRead];
+ from = 0;
+ do {
+ read = is.read(bis, from, toBeRead);
+ if (read < 0) break;
+ from += read;
+ toBeRead -= read;
+ } while (toBeRead > 0);
+ // Test that the first read bytes are ok
+ for (int c = 0; c < toBeRead; c++) assertEquals(bis[c], byteArray[c]);
+ // Now test within the available bytes
+ for (int c = 0; c < is.inspectable; c++) assertEquals(is.buffer[c], byteArray[c]);
+ }
+
+ is.close();
+ }
+
+ @Test
+ public void testReadBulk() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ System.out.printf("TESTING READ BULK FOR SIZE %d (+=read,R=rewind): ", byteArray.length);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ // Decide how many reads
+ int reads = r.nextInt(5);
+ for (int t = 0; t < reads; t++) {
+ System.out.print("+");
+ byte[] bis = new byte[r.nextInt(1 + byteArray.length * 3 / 2)];
+ byte[] bbs = new byte[bis.length];
+ int offset = bis.length < 2? 0 : r.nextInt(bis.length / 2);
+ int length = bis.length - offset == 0? 0 : r.nextInt(bis.length - offset);
+ int res1 = IOUtils.readFully(is, bis, offset, length);
+ int res2 = IOUtils.readFully(bs, bbs, offset, length);
+ if (r.nextDouble() < .1) {
+ // In the 10% of attempts we rewind
+ System.out.print("R");
+ is.rewind();
+ bs = new BastardByteArrayInputStream(byteArray);
+ }
+ assertEquals(res1, res2);
+ for (int i = 0; i < Math.max(res1, 0); i++) {
+ assertEquals(bis[offset + i], bbs[offset + i]);
+ }
+ }
+ System.out.println();
+ }
+ }
+
+ @Test
+ public void testLength() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1024); // Use 1KiB buffer
+ for (byte[] byteArray: byteArrays) {
+ if (byteArray.length == 0) continue; // skip size 0
+ System.out.printf("TESTING LENGTH FOR SIZE %d", byteArray.length);
+ is.connect(new BastardByteArrayInputStream(byteArray));
+ // Choose whether to perform some read or not
+ if (r.nextBoolean())
+ // Choose whether to read it completely or not
+ if (r.nextBoolean()) {
+ is.fill(Long.MAX_VALUE);
+ System.out.printf(", filled\n");
+ }
+ else {
+ int toBeRead = r.nextInt(byteArray.length * 2 + 1), i;
+ for (i = 0; i < toBeRead; i++)
+ if (is.read() < 0) break;
+ System.out.printf(", read %d bytes%s\n", i, i < toBeRead? "" : " (up to eof)");
+ }
+ else System.out.print(", no read\n");
+ assertEquals(byteArray.length, is.length());
+ }
+ is.close();
+ }
+
+ @Test
+ public void testFillAndRewind() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1000); // Use 1KiB buffer
+ is.connect(new ByteArrayInputStream(new byte[3000]));
+ is.fill(2000);
+ is.rewind();
+ final long length = is.length();
+ while(is.read() != -1);
+ assertEquals(length, is.length());
+ is.close();
+ }
+
+ @Test
+ public void testLengthDoesNotAlterState() throws IOException {
+ InspectableBufferedInputStream is = new InspectableBufferedInputStream(1000); // Use 1KiB buffer
+ is.connect(new ByteArrayInputStream(new byte[3000]));
+ is.fill(2000);
+ is.rewind();
+ is.length();
+ assertEquals(0, is.position());
+ is.close();
+ }
+
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestMeasurableSequenceInputStream.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestMeasurableSequenceInputStream.java
new file mode 100644
index 0000000..68d035a
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestMeasurableSequenceInputStream.java
@@ -0,0 +1,153 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Random;
+
+import org.apache.poi.util.IOUtils;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+
+
+
+//RELEASE-STATUS: DIST
+
+
+/** A class to test {@link MeasurableSequenceInputStream}. */
+
+public class TestMeasurableSequenceInputStream {
+
+ private static final Random r = new Random(0);
+
+ private static List<byte[]> byteArrays;
+ static {
+ byteArrays = new ArrayList<byte[]>();
+ byte[] b;
+ // Now generates byte buffers from 1 byte up to 64KiB; we shuffle them so that they are not increasing in size...
+ for (int k = 0; k < 10; k++) {
+ b = new byte[1 << k];
+ r.nextBytes(b);
+ byteArrays.add(b);
+ }
+ for (int k = 16; k >= 10; k--) {
+ b = new byte[1 << k];
+ r.nextBytes(b);
+ byteArrays.add(b);
+ }
+ byteArrays.add(new byte[] {});
+ byteArrays.add("This is a short\nnon empty and purely ASCII\nbyte sequence".getBytes());
+ }
+
+ @Test
+ public void testSequentialRead() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ System.out.println("TESTING SEQUENTIAL, ONE INPUTSTREAM, READ FOR SIZE " + byteArray.length);
+ MeasurableSequenceInputStream is = new MeasurableSequenceInputStream(new FastByteArrayInputStream(byteArray));
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ int bbs;
+ while ((bbs = bs.read()) != -1) {
+ assertEquals(bbs, is.read());
+ assertEquals(bs.position(), is.position());
+ }
+ assertEquals(is.read(), -1);
+ is.close();
+ bs.close();
+ assertEquals(byteArray.length, is.position());
+ }
+ }
+
+ @Test
+ public void testSequentialSequenceRead() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ System.out.println("TESTING SEQUENTIAL, TWO INPUTSTREAMS, READ FOR SIZE " + byteArray.length);
+ MeasurableSequenceInputStream is = new MeasurableSequenceInputStream(null, new FastByteArrayInputStream(byteArray), null, new FastByteArrayInputStream(byteArray), null);
+ byte[] doubleByteArray = new byte[2 * byteArray.length];
+ System.arraycopy(byteArray, 0, doubleByteArray, 0, byteArray.length);
+ System.arraycopy(byteArray, 0, doubleByteArray, byteArray.length, byteArray.length);
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(doubleByteArray);
+ int bbs;
+ while ((bbs = bs.read()) != -1) {
+ assertEquals(bbs, is.read());
+ assertEquals(bs.position(), is.position());
+ }
+ assertEquals(is.read(), -1);
+ is.close();
+ bs.close();
+ assertEquals(doubleByteArray.length, is.position());
+ }
+ }
+
+ @Test
+ public void testReadBulk() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ System.out.println("TESTING READ BULK, ONE INPUTSTREAM, FOR SIZE " + byteArray.length);
+ MeasurableSequenceInputStream is = new MeasurableSequenceInputStream(new FastByteArrayInputStream(byteArray));
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(byteArray);
+ // Decide how many reads
+ int reads = r.nextInt(5);
+ for (int t = 0; t < reads; t++) {
+ byte[] bis = new byte[r.nextInt(1 + byteArray.length * 3 / 2)];
+ byte[] bbs = new byte[bis.length];
+ int offset = bis.length < 2 ? 0 : r.nextInt(bis.length / 2);
+ int length = bis.length - offset == 0? 0 : r.nextInt(bis.length - offset);
+ int res1 = IOUtils.readFully(is, bis, offset, length);
+ int res2 = IOUtils.readFully(bs, bbs, offset, length);
+ assertEquals(res1, res2);
+ assertEquals(bs.position(), is.position());
+ for (int i = 0; i < Math.max(res1, 0); i++) {
+ assertEquals(bis[offset + i], bbs[offset + i]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testReadSequentialBulk() throws IOException {
+ for (byte[] byteArray: byteArrays) {
+ System.out.println("TESTING READ BULK, TWO INPUTSTREAMS, FOR SIZE " + byteArray.length);
+ MeasurableSequenceInputStream is = new MeasurableSequenceInputStream(null, new FastByteArrayInputStream(byteArray), null, new FastByteArrayInputStream(byteArray), null);
+ byte[] doubleByteArray = new byte[2 * byteArray.length];
+ System.arraycopy(byteArray, 0, doubleByteArray, 0, byteArray.length);
+ System.arraycopy(byteArray, 0, doubleByteArray, byteArray.length, byteArray.length);
+ FastByteArrayInputStream bs = new FastByteArrayInputStream(doubleByteArray);
+ // Decide how many reads
+ int reads = r.nextInt(5);
+ for (int t = 0; t < reads; t++) {
+ byte[] bis = new byte[r.nextInt(1 + doubleByteArray.length * 3 / 2)];
+ byte[] bbs = new byte[bis.length];
+ int offset = bis.length < 2 ? 0 : r.nextInt(bis.length / 2);
+ int length = bis.length - offset == 0? 0 : r.nextInt(bis.length - offset);
+ int res1 = IOUtils.readFully(is, bis, offset, length);
+ int res2 = IOUtils.readFully(bs, bbs, offset, length);
+ assertEquals(res1, res2);
+ assertEquals(bs.position(), is.position());
+ for (int i = 0; i < Math.max(res1, 0); i++) {
+ assertEquals(bis[offset + i], bbs[offset + i]);
+ }
+ }
+ }
+ }
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestWarcRecord.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestWarcRecord.java
new file mode 100644
index 0000000..d150858
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/TestWarcRecord.java
@@ -0,0 +1,347 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.ByteArrayOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Date;
+import java.util.Random;
+import java.util.UUID;
+
+import org.apache.commons.io.IOUtils;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.bubing.util.MockResponses.MockRandomHttpResponse;
+import it.unimi.dsi.law.warc.io.WarcRecord.ContentType;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+import it.unimi.dsi.law.warc.io.WarcRecord.RecordType;
+import it.unimi.dsi.law.warc.util.AbstractHttpResponse;
+import it.unimi.dsi.law.warc.util.WarcHttpResponse;
+import it.unimi.dsi.logging.ProgressLogger;
+
+
+//RELEASE-STATUS: DIST
+
+
+/** A class to test {@link WarcRecord}. */
+
+@SuppressWarnings("deprecation")
+public class TestWarcRecord {
+ private final static Logger LOGGER = LoggerFactory.getLogger(TestWarcRecord.class);
+
+ public static final boolean DEBUG = false;
+ private final static Random RND = new Random(0);
+
+ public final static <E extends Enum<E>> E rndEnum(E[] values) {
+ return values[RND.nextInt(values.length)];
+ }
+
+ public final static void rndFill(WarcRecord.Header header, int calls) {
+ header.dataLength = -1;
+ header.recordType = rndEnum(RecordType.values());
+ header.subjectUri = BURL.parse("http" + (RND.nextBoolean() ? "s" : "") + "://this.is/n" + calls + "/test.html");
+ header.creationDate = new Date();
+ header.contentType = rndEnum(ContentType.values());
+ header.recordId = UUID.randomUUID();
+ if (RND.nextBoolean()) {
+ header.anvlFields.clear();
+ header.anvlFields.put("anvl-test-key", "anvl-test-value");
+ }
+ }
+
+ public final static byte[] rndBlock(int maxLen) {
+ byte[] blockBytes = new byte[RND.nextInt(maxLen) + 1];
+ RND.nextBytes(blockBytes);
+ return blockBytes;
+ }
+
+ static public class MockWarcRecord extends WarcRecord {
+ private final static int DEFAULT_MAX_BLOCK_LENGTH = 1024;
+ private static int calls = 0;
+ private final byte[] blockBytes;
+ public MockWarcRecord() {
+ rndFill(header, calls++);
+ blockBytes = rndBlock(DEFAULT_MAX_BLOCK_LENGTH);
+ block = new FastByteArrayInputStream(blockBytes);
+ }
+ public MeasurableInputStream expectedBlock() {
+ return new FastByteArrayInputStream(blockBytes);
+ }
+ public MeasurableInputStream actualBlock() {
+ return block; // this will be overwritten by read
+ }
+ }
+
+ final static int NUM_SIZE_TESTS = 100;
+
+ @Test
+ public void testSize() throws IOException {
+ ByteArrayOutputStream out = new ByteArrayOutputStream();
+ WarcRecord wr = new WarcRecord();
+
+ System.err.print("Size test on mock random repsonces: ");
+ for (int rep = 0; rep < NUM_SIZE_TESTS; rep++) {
+ out.reset();
+ AbstractHttpResponse r = new MockRandomHttpResponse(RND);
+ r.toWarcRecord(wr);
+ if (RND.nextBoolean()) {
+ wr.header.anvlFields.clear();
+ wr.header.anvlFields.put("anvl-test-key", "anvl-test-value");
+ }
+ wr.write(out);
+ byte[] written = out.toByteArray();
+ int i = WarcRecord.WARC_ID.length + 1, size = 0;
+ while (written[i] != ' ') {
+ size = 10 * size + (written[i] - '0');
+ i++;
+ }
+ if (DEBUG) System.out.println(out.toString() + "\nActual size: " + out.size() + "\nParsed size: " + size);
+ assertEquals(out.size(), size);
+ System.err.print(".");
+ }
+ System.err.println(" done.");
+ }
+
+ final static int NUM_WRS_TESTS = 100;
+
+ @Test
+ public void testRecord() throws IOException, FormatException {
+
+ System.err.print("Test on mock random records: ");
+
+ /* a place to remember written HttpResponses and WarcRecods. */
+
+ final ArrayList<MockWarcRecord> writtenRecords = new ArrayList<MockWarcRecord>();
+
+ /* write and remember */
+
+ final FastByteArrayOutputStream out = new FastByteArrayOutputStream();
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+ final MockWarcRecord writtenRecord = new MockWarcRecord();
+ writtenRecord.write(out);
+ writtenRecords.add(writtenRecord);
+ System.err.print("w");
+ }
+ out.close();
+
+ System.err.print("/");
+
+ /* a place to read */
+
+ final MockWarcRecord readRecord = new MockWarcRecord();
+
+ /* read and compare */
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FastByteArrayInputStream(out.array, 0, out.length));
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+
+ final MockWarcRecord writtenRecord = writtenRecords.get(i);
+
+ if (RND.nextBoolean()) { // read
+
+ readRecord.read(in);
+
+ if (DEBUG) System.err.println("\n" + readRecord.header + "\n" + writtenRecord.header);
+
+ assertEquals(readRecord.header, writtenRecord.header);
+
+ if (RND.nextBoolean()) { // don't consume block
+
+ final MeasurableInputStream block = readRecord.actualBlock();
+ readRecord.actualBlock().read(new byte[RND.nextInt((int)(block.length() / 2)) + 1]);
+
+ System.err.print("r");
+
+ } else { // consume block
+
+ IOUtils.contentEquals(writtenRecord.expectedBlock(), readRecord.actualBlock());
+ System.err.print("R");
+
+ }
+
+ } else { // skip
+
+ long length = readRecord.skip(in);
+ assertEquals(writtenRecord.header.dataLength, length);
+
+ System.err.print("s");
+
+ }
+
+ }
+ in.close();
+
+ System.err.println(" done.");
+ }
+
+ @Test
+ public void testResponse() throws IOException, FormatException {
+
+ System.err.print("Test on mock random repsonces: ");
+
+ /* a place to remember written HttpResponses and WarcRecods. */
+
+ final ArrayList<MockRandomHttpResponse> writtenResponses = new ArrayList<MockRandomHttpResponse>();
+ final ArrayList<WarcRecord> writtenRecords = new ArrayList<WarcRecord>();
+
+ /* write and remember */
+
+ final FastByteArrayOutputStream out = new FastByteArrayOutputStream();
+
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+ final MockRandomHttpResponse writtenResponse = new MockRandomHttpResponse(RND);
+ final WarcRecord writtenRecord = new WarcRecord();
+ writtenResponses.add(writtenResponse);
+ writtenResponse.toWarcRecord(writtenRecord);
+ if (RND.nextBoolean()) {
+ writtenRecord.header.anvlFields.clear();
+ writtenRecord.header.anvlFields.put("anvl-test-key", "anvl-test-value");
+ }
+ writtenRecord.write(out);
+ writtenRecords.add(writtenRecord);
+ System.err.print("w");
+ }
+ out.close();
+
+ System.err.print("/");
+
+ /* a place to read */
+
+ WarcHttpResponse readResponse = new WarcHttpResponse();
+ WarcRecord readRecord = new WarcRecord();
+
+ /* read and compare */
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FastByteArrayInputStream(out.array, 0, out.length));
+ for (int i = 0; i < NUM_WRS_TESTS; i++) {
+
+ final WarcRecord writtenRercord = writtenRecords.get(i);
+
+ if (RND.nextBoolean()) { // read
+
+ readRecord.read(in);
+ assertEquals(writtenRercord.header, readRecord.header);
+
+ if (RND.nextBoolean()) { // don't consume block
+
+ System.err.print("r");
+
+ } else { // consume block
+
+ final MockRandomHttpResponse writtenResponse = writtenResponses.get(i);
+ readResponse.fromWarcRecord(readRecord);
+ assertTrue(IOUtils.contentEquals(writtenResponse.expectedContentAsStream(), readResponse.contentAsStream()));
+
+ System.err.print("R");
+
+ }
+
+ } else { // skip
+
+ long length = readRecord.skip(in);
+ assertEquals(writtenRercord.header.dataLength, length);
+ System.err.print("s");
+
+ }
+
+ }
+ in.close();
+
+ System.err.println(" done.");
+ }
+
+ final static int IO_BUFFER_SIZE = 64 * 1024;
+ public static void main(String[] arg) throws FileNotFoundException, IOException, FormatException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(TestWarcRecord.class.getName(), "WarcRecord performance test.",
+ new Parameter[] {
+ new UnflaggedOption("numPages", JSAP.INTEGER_PARSER, "10000", JSAP.REQUIRED, false, "The number of pages to write."),
+ new UnflaggedOption("maxPageSize", JSAP.INTEGER_PARSER, "1024", JSAP.REQUIRED, false, "The maximum size of page content."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ int numPages = jsapResult.getInt("numPages");
+ int maxPageSize = jsapResult.getInt("maxPageSize");
+
+ File tmp = File.createTempFile("warctest-", null);
+ FastBufferedOutputStream out = new FastBufferedOutputStream(new FileOutputStream(tmp), IO_BUFFER_SIZE);
+
+ WarcRecord wr = new WarcRecord();
+
+ ProgressLogger pl = new ProgressLogger(LOGGER, "responses");
+ pl.start("Generating/Writing mock responses...");
+ for (int i = 0; i < numPages; i++) {
+ AbstractHttpResponse r = new MockRandomHttpResponse(RND, maxPageSize);
+ r.toWarcRecord(wr);
+ wr.write(out);
+ pl.update();
+ }
+ pl.done();
+ out.close();
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FileInputStream(tmp), IO_BUFFER_SIZE);
+ WarcHttpResponse whr = new WarcHttpResponse();
+ pl.start("Reading responses...");
+ wr.resetRead();
+ for (int i = 0; i < numPages; i++) {
+ wr.read(in);
+ whr.fromWarcRecord(wr);
+ IOUtils.toByteArray(whr.contentAsStream());
+ pl.update();
+ }
+ pl.done();
+ in.close();
+
+ pl.itemsName = "records";
+ in = new FastBufferedInputStream(new FileInputStream(tmp), IO_BUFFER_SIZE);
+ pl.start("Skipping records...");
+ wr.resetRead();
+ for (int i = 0; i < numPages; i++) {
+ wr.skip(in);
+ pl.update();
+ }
+ pl.done();
+ in.close();
+ }
+
+}
+
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/WarcParallelOutputStreamTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/WarcParallelOutputStreamTest.java
new file mode 100644
index 0000000..d02ce83
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/io/WarcParallelOutputStreamTest.java
@@ -0,0 +1,108 @@
+package it.unimi.dsi.law.warc.io;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+import java.net.URI;
+
+import org.apache.commons.io.IOUtils;
+import org.apache.http.Header;
+import org.apache.http.ProtocolVersion;
+import org.apache.http.message.BasicStatusLine;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.io.NullOutputStream;
+import it.unimi.dsi.law.bubing.util.MockResponses.MockHttpResponseFromString;
+import it.unimi.dsi.law.warc.io.TestGZWarcRecord.MockGZWarcRecord;
+import it.unimi.dsi.law.warc.io.TestWarcRecord.MockWarcRecord;
+import it.unimi.dsi.law.warc.io.WarcRecord.FormatException;
+
+//RELEASE-STATUS: DIST
+
+public class WarcParallelOutputStreamTest {
+ public static final boolean DEBUG = false;
+
+ @Test
+ public void testRecord() throws IOException, FormatException, InterruptedException {
+
+
+ for(boolean gzip: new boolean[] { true, false }) {
+ final FastByteArrayOutputStream out = new FastByteArrayOutputStream();
+ final WarcParallelOutputStream warcParallelOutputStream = new WarcParallelOutputStream(out, gzip);
+ final Thread thread[] = new Thread[4];
+
+ for(int i = 0; i < thread.length; i++)
+ (thread[i] = new Thread(Integer.toString(i)) {
+ public void run() {
+ final int index = Integer.parseInt(getName());
+ for (int i = index * (1000 / thread.length); i < (index + 1) * (1000 / thread.length); i++) {
+ try {
+ final WarcRecord warcRecord = warcParallelOutputStream.acquire();
+ MockHttpResponseFromString response = new MockHttpResponseFromString(new BasicStatusLine(new ProtocolVersion("HTTP", 1, 1), 200, "OK"), new Header[0], URI.create("http://example.com/" + (1000 + i)), "X");
+ response.toWarcRecord(warcRecord);
+ warcParallelOutputStream.release(warcRecord);
+ } catch(Exception e) {}
+ }
+ }
+ }).start();
+
+
+ for(Thread t: thread) t.join();
+ warcParallelOutputStream.close();
+ out.close();
+
+ final WarcRecord warcRecord = new WarcRecord();
+ MockHttpResponseFromString response = new MockHttpResponseFromString(new BasicStatusLine(new ProtocolVersion("HTTP", 1, 1), 200, "OK"), new Header[0], URI.create("http://example.com/" + 1000), "X");
+ response.toWarcRecord(warcRecord);
+ warcRecord.write(new FastBufferedOutputStream(NullOutputStream.getInstance()));
+
+ FastBufferedInputStream in = new FastBufferedInputStream(new FastByteArrayInputStream(out.array, 0, out.length));
+ final boolean found[] = new boolean[1000];
+ if (gzip) {
+ final MockGZWarcRecord readRecord = new MockGZWarcRecord();
+ for (int i = 0; i < 1000; i++) {
+ readRecord.read(in);
+ assertEquals(readRecord.header.dataLength, warcRecord.header.dataLength);
+ found[Integer.parseInt(readRecord.header.subjectUri.getPath().substring(1)) - 1000] = true;
+ IOUtils.contentEquals(warcRecord.block, readRecord.actualBlock());
+ readRecord.checkCRC(in);
+ }
+ }
+ else {
+ final MockWarcRecord readRecord = new MockWarcRecord();
+ for (int i = 0; i < 1000; i++) {
+ readRecord.read(in);
+ assertEquals(readRecord.header.dataLength, warcRecord.header.dataLength);
+ found[Integer.parseInt(readRecord.header.subjectUri.getPath().substring(1)) - 1000] = true;
+ IOUtils.contentEquals(warcRecord.block, readRecord.actualBlock());
+ }
+ }
+ in.close();
+
+ for(int i = 1000; i-- != 0;) assertTrue(Integer.toString(i), found[i]);
+ }
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/parser/TestDigester.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/parser/TestDigester.java
new file mode 100644
index 0000000..d92b221
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/parser/TestDigester.java
@@ -0,0 +1,303 @@
+package it.unimi.dsi.law.warc.parser;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.io.IOException;
+import java.net.URI;
+import java.security.MessageDigest;
+import java.security.NoSuchAlgorithmException;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Random;
+
+import org.apache.http.ProtocolVersion;
+import org.apache.http.StatusLine;
+import org.apache.http.message.BasicStatusLine;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.MeasurableInputStream;
+import it.unimi.dsi.law.bubing.util.BURL;
+import it.unimi.dsi.law.warc.io.WarcRecord;
+import it.unimi.dsi.law.warc.util.AbstractHttpResponse;
+
+
+
+//RELEASE-STATUS: DIST
+
+@SuppressWarnings("deprecation")
+public class TestDigester {
+
+ public static class FakeHttpResponse extends AbstractHttpResponse {
+ MeasurableInputStream in;
+ URI uri;
+ final StatusLine STATUS_LINE = new BasicStatusLine(new ProtocolVersion("HTTP", 1, 0), 200, "OK");
+ protected FakeHttpResponse(URI uri, MeasurableInputStream in) {
+ this.uri = uri;
+ this.in = in;
+ }
+ public int status() { return 200; }
+ public StatusLine statusLine() { return STATUS_LINE; }
+ public Map<String, String> headers() { return new HashMap<String,String>(); }
+ public MeasurableInputStream contentAsStream() throws IOException { return in; }
+ public URI uri() { return uri; }
+ public boolean fromWarcRecord(WarcRecord wr) throws IOException { throw new UnsupportedOperationException(); }
+
+ }
+
+ public final static String document1 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ public final static String document2Like1 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/kxxx.php\";\n" + // Change, not relevant
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<tiTLE id=\"mummu\" special-type=\"liturchi\">Sebastiano Vigna</title>\n" + // Change, not relevant
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring xxxxediqne\"> and not this one\n" + // Change, not relevant
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ public final static String document3Unlike1 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye THIS IS A DIFFERENCE IN THE TEXT bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ public final static String document4Unlike1 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n " + //A SMALL DIFFERENCE: just a whitespace
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ public final static String document5Unlike1 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"a/aFrameSource\">The frame source counts</frame>\n" + // A difference in the source should count!
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ public final static String document6Like5 = // Should be the same as document5Unlike1, if URL of the latter is xxx/a and of this is xxx
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"aFrameSource\">The frame source counts</frame>\n" + // A difference in the source should count!
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ public final static String document7prefix =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n";
+
+ public final static String document7suffix =
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+
+ private static String[] allDocs = { document1, document2Like1, document3Unlike1, document4Unlike1, document5Unlike1, document6Like5 };
+ private static String[] allURLs = { "http://vigna.dsi.unimi.it/xxx/yyy/a.html", "http://vigna.dsi.unimi.it/", "http://vigna.dsi.unimi.it/bbb", "http://vigna.dsi.unimi.it/bbb.php", "http://vigna.dsi.unimi.it/a", "http://vigna.dsi.unimi.it/" };
+
+ @Test
+ public void testDocument1() throws NoSuchAlgorithmException, IOException {
+ HTMLParser parser = new HTMLParser(MessageDigest.getInstance("MD5"));
+
+ byte[][] allDigests = new byte[allDocs.length][];
+
+ for (int i = 0; i < allDocs.length; i++) {
+ allDigests[i] = parser.parse(new FakeHttpResponse(BURL.parse(allURLs[i]), new FastBufferedInputStream(new FastByteArrayInputStream(allDocs[i].getBytes()))), Parser.NULL_LINK_RECEIVER);
+ }
+ assertTrue(Arrays.equals(allDigests[0], allDigests[1]));
+ assertFalse(Arrays.equals(allDigests[0], allDigests[2]));
+ assertFalse(Arrays.equals(allDigests[0], allDigests[3]));
+ assertFalse(Arrays.equals(allDigests[0], allDigests[4]));
+ /* FIXME currently the next test fails because the derelativization feature of the SRC by Digester is not implemented; please
+ * uncomment the following line as soon as it is re-implemented
+ */
+ //assertTrue(Arrays.equals(allDigests[4], allDigests[5]));
+ }
+
+ public void assertSameDigest(String a, String b) throws NoSuchAlgorithmException, IOException {
+ assertDigest(BURL.parse("http://a"), a, BURL.parse("http://a"), b, true);
+ }
+
+ public void assertDifferentDigest(String a, String b) throws NoSuchAlgorithmException, IOException {
+ assertDigest(BURL.parse("http://a"), a, BURL.parse("http://a"), b, false);
+ }
+
+ public void assertDigest(URI prefixa, String a, URI prefixb, String b, boolean equal) throws NoSuchAlgorithmException, IOException {
+ HTMLParser parser = new HTMLParser(MessageDigest.getInstance("MD5"));
+ final byte[] digest0 = parser.parse(new FakeHttpResponse(prefixa, new FastBufferedInputStream(new FastByteArrayInputStream(a.getBytes()))), Parser.NULL_LINK_RECEIVER);
+ final byte[] digest1 = parser.parse(new FakeHttpResponse(prefixb, new FastBufferedInputStream(new FastByteArrayInputStream(b.getBytes()))), Parser.NULL_LINK_RECEIVER);
+ assertEquals(Boolean.valueOf(Arrays.equals(digest0, digest1)), Boolean.valueOf(equal));
+ }
+
+ @Test
+ public void testDifferent() throws NoSuchAlgorithmException, IOException {
+ assertDifferentDigest("a", "b");
+ assertDifferentDigest("<a>", "<i>");
+ assertDifferentDigest("<foo>", "</foo>");
+ assertDifferentDigest("<frame src=a>", "<frame src=b>");
+ assertDifferentDigest("<iframe src=a>", "<iframe src=b>");
+ assertDigest(BURL.parse("http://a"), "x", BURL.parse("http://b"), "x", false);
+ }
+
+ @Test
+ public void testSame() throws NoSuchAlgorithmException, IOException {
+ assertSameDigest("<a b>", "<a c>");
+ assertSameDigest("<foo>", "<bar>");
+ assertSameDigest("<foo >", "<foo >");
+ assertSameDigest("<img src=a>", "<img src=b>");
+ assertSameDigest("<i>ciao mamma</i>", "<I>ciao mamma</I>");
+ assertDigest(BURL.parse("http://a"), "x", BURL.parse("http://a"), "x", true);
+ }
+
+ @Test
+ public void testLongDocument() throws NoSuchAlgorithmException, IOException {
+ Random r = new Random(0);
+ StringBuilder sb = new StringBuilder();
+ for (int i = 0; i < HTMLParser.CHAR_BUFFER_SIZE * (2 + r.nextInt(3)); i++) sb.append((char)(64 + r.nextInt(61)));
+ final String document7 = document7prefix + sb.toString() + document7suffix;
+ assertSameDigest(document7, document7);
+ sb.setCharAt(sb.length() / 2, (char)(sb.charAt(sb.length() / 2) + 1));
+ assertDifferentDigest(document7, sb.toString());
+ }
+
+
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/parser/TestParserUtil.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/parser/TestParserUtil.java
new file mode 100644
index 0000000..169973e
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/parser/TestParserUtil.java
@@ -0,0 +1,191 @@
+package it.unimi.dsi.law.warc.parser;
+
+/*
+ * Copyright (C) 2004-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This library is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU Lesser General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class TestParserUtil {
+
+ private static final String documentNoMeta =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ private static final String documentMetaNeverClosed =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<META" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ private static final String documentMetaNeverClosed2 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<META http-equiv =cacca" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ private static final String documentMetaIsutf_8 =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\" >" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ private static final String documentMetaIsutf_8ButNotClosed =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8" +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ private static final String documentMetaIsutf_8dAndSomething =
+ "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01 Strict//EN\" \"http://www.w3.org/TR/REC-html40/strict.dtd\">\n" +
+ "\n" +
+ "<html>\n" +
+ "<head>\n" +
+ "<style type=\"text/css\">\n" +
+ "@import \"/css/content.php\";\n" +
+ "@import \"/css/layout.php\";\n" +
+ "</style>" +
+ "<meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8d etc\">" +
+ " and something\n maybe on a new line...<META http-equiv =\"content-type\" " +
+ "<title id=\"mamma\" special-type=\"li turchi\">Sebastiano Vigna</title>\n" +
+ "</HEAD>\n" +
+ "<boDY>\n" +
+ "<div id=header>:::Sebastiano Vigna</div>" +
+ "<div id=left>\n" +
+ "<ul id=\"left-nav\">" +
+ "<br>Bye bye baby\n" +
+ "<img SRc=\"but I'm ignoring this one\"> and not this one\n" +
+ "\n\n even whitespace counts \n\n" +
+ "<frame SRC=\"http://www.GOOGLE.com/\">The frame source counts</frame>\n" +
+ "<iframe SRC=\"http://www.GOOGLE.com/\">And so does the iframe source</iframe>\n" +
+ "</body>\n" +
+ "</html>";
+
+ @Test
+ public void testGetCharsetName() {
+ byte[] b;
+ b = documentNoMeta.getBytes();
+ assertEquals(null, HTMLParser.getCharsetName(b, b.length));
+ b = documentMetaNeverClosed.getBytes();
+ assertEquals(null, HTMLParser.getCharsetName(b, b.length));
+ b = documentMetaNeverClosed2.getBytes();
+ assertEquals(null, HTMLParser.getCharsetName(b, b.length));
+ b = documentMetaIsutf_8.getBytes();
+ assertEquals("utf-8", HTMLParser.getCharsetName(b, b.length));
+ b = documentMetaIsutf_8ButNotClosed.getBytes();
+ assertEquals("utf-8", HTMLParser.getCharsetName(b, b.length));
+ b = documentMetaIsutf_8dAndSomething.getBytes();
+ assertEquals("utf-8d", HTMLParser.getCharsetName(b, b.length));
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/util/ByteArrayCharSequenceTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/util/ByteArrayCharSequenceTest.java
new file mode 100644
index 0000000..3a14bf6
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/util/ByteArrayCharSequenceTest.java
@@ -0,0 +1,37 @@
+package it.unimi.dsi.law.warc.util;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+
+//RELEASE-STATUS: DIST
+
+public class ByteArrayCharSequenceTest {
+
+ @Test
+ public void test() {
+ assertEquals(new ByteArrayCharSequence(new byte[] { 48, 49 }).toString(), "01");
+ assertEquals(new ByteArrayCharSequence(new byte[] { 48, 49, 50, 51 }, 1, 2).toString(), "12");
+
+ assertEquals(new ByteArrayCharSequence(new byte[] { 48, 49 }).hashCode(), "01".hashCode());
+ assertEquals(new ByteArrayCharSequence(new byte[] { 48, 49, 50, 51 }, 1, 2).hashCode(), "12".hashCode());
+ }
+}
diff --git a/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/util/MetadataHttpResponseTest.java b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/util/MetadataHttpResponseTest.java
new file mode 100644
index 0000000..cae6759
--- /dev/null
+++ b/third_party/law-2.5.1/test/it/unimi/dsi/law/warc/util/MetadataHttpResponseTest.java
@@ -0,0 +1,44 @@
+package it.unimi.dsi.law.warc.util;
+
+/*
+ * Copyright (C) 2012-2019 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import org.junit.Test;
+
+//RELEASE-STATUS: DIST
+
+public class MetadataHttpResponseTest {
+
+ @Test
+ public void testHeaderMap() {
+ MetadataHttpResponse.HeaderMap headerMap = new MetadataHttpResponse.HeaderMap();
+
+ headerMap.put("header0", "value0a");
+ headerMap.put("header0", "value0b");
+ headerMap.put("header1", "value1");
+ headerMap.put("header2", "value2");
+
+ assertEquals("value0a,value0b", headerMap.get("header0"));
+ assertEquals("value1", headerMap.get("header1"));
+ assertEquals("value2", headerMap.get("header2"));
+ assertEquals(null, headerMap.get("doesNotExist"));
+ assertEquals(3, headerMap.size());
+ }
+}
diff --git a/third_party/webgraph-3.6.1/CHANGES b/third_party/webgraph-3.6.1/CHANGES
new file mode 100644
index 0000000..4b36892
--- /dev/null
+++ b/third_party/webgraph-3.6.1/CHANGES
@@ -0,0 +1,593 @@
+3.6.1
+
+- Removed spurious debug print.
+
+3.6.0
+
+- Java 8-only.
+
+- Fixed obscure bug in ShiftByOneArcListASCIIGraph: if the arc list
+ was specified on the command line (no -1 option) and more than
+ one core was available, the graph would have not been shifted.
+ Thanks to Luca Prigioniero for reporting this bug.
+
+3.5.3
+
+- Fixed lack of implementation of loadMapped() in ImmutableSubgraph.
+ Thanks to Massimo Santini and Pierlauro Sciarelli for reporting this bug.
+
+3.5.2
+
+- Removed last dependency from COLT. Unfortunately, ErdosRenyiGraph now
+ uses a different binomial distribution generator, so generated graphs
+ will be different, even using the same seed.
+
+- New implementations by Michele Borassi of the SumSweep algorithm for
+ diameter, radius and eccentricities of directed and undirected graphs.
+
+3.5.1
+
+- New TopKGeometricCentrality class by Michele Borassi.
+
+3.5.0
+
+- New mechanism for parallel compression based on the notion of "copiable
+ iterators". Implemented in BVGraph and all derivative classes (e.g.,
+ transposed graphs).
+
+- HyperBall now accepts weights on the nodes.
+
+3.4.3
+
+- The family of loadSequential() methods have been deprecated, and
+ replaced in code by loadOffline() or loadMapped().
+
+3.4.2
+
+- Fixed dependencies.
+
+3.4.1
+
+- Significantly improved performance of HyperBall on graphs with a highly
+ skewed (e.g., heavy-tailed) outdegree distribution (e.g., transposed web
+ graphs).
+
+- Fixed wrong estimation of memory used.
+
+- Now ConnectedComponents writes results using "wcc" instead of "scc".
+
+3.4.0
+
+- Fixed problem with obsolete BitSet used to store buckets in Stats.
+
+- New parallel classes to compute geometric centralities and betweenness.
+
+3.3.3
+
+- Regressed to fastutil's quicksort calls in case of array fragments. Java
+ 7's Arrays.sort() has a memory bug that was killing the performance of a
+ number of methods.
+
+3.3.2
+
+- We now distribute SpeedTest, hoping to improve the quality of benchmarks
+ in the literature.
+
+3.3.1
+
+- Adapted to new DSI utilities.
+
+3.3.0
+
+- HyperBall sports a new adaptive decomposition scheme that
+ is based on the number of arcs to be scanned, rather than
+ on the number of nodes.
+
+- Fixed bug in the computation of the buckets. If you have used the new
+ iterative implementation of Tarjan's algorithm
+ (StronglyConnectedComponents) to compute buckets please recompute them.
+
+3.2.1
+
+- New iterative implementation of Tarjan's algorithm.
+
+- HyperBall can now compute Nieminen's centrality.
+
+3.2.0
+
+- New selectable upper bound for EFGraph makes it possible to build
+ "fake" graphs in whcih successors are greater than or equal to
+ the number of nodes (this was already possible with BVGraph). Useful
+ for incremental graph construction.
+
+- New IncrementalImmutableSequentialGraph adapter, which provides an
+ inversion of control for storing graphs: you supply, one at a time,
+ the successor list of each node.
+
+3.1.0
+
+- New HyperBall implementation of the HyperANF idea. It computes several
+ kind of geometric centrality and once in systolic local mode uses time
+ proportional to the number of edges causing a modification, setting in
+ practice the expected run time to the theoretical bound O(m log n).
+
+- Now terminal nodes have closeness equal to zero (terminal nodes already
+ had Lin's centrality equal to one).
+
+- The DecimalFormat object used to print data is has now a fixed US locale.
+
+- New EFGraph implementation (backported from the big version) using the
+ Elias-Fano representation of monotone sequences. Compression is not so
+ good, but successor enumeration is blazingly fast and the implementation
+ returns a skippable iterator which provides constant-time search of
+ nodes by lower bound.
+
+- Both BVGraph and EFGraph have outdegree caching and exact unwrapping
+ of successorArray(). This should bring performance improvements.
+
+3.0.9
+
+- We switched to SLF4J for logging.
+
+3.0.8
+
+- Now Webgraph includes an adapter towards the Jung graph analysis
+ framework. The main method of the class can write immutable graphs
+ into Pajek format.
+
+3.0.7
+
+- Now ScatteredArcsASCIIGraph accepts a translation function from
+ node identifiers to node numbers.
+
+3.0.5
+
+- New ImmutableGraph.outdegrees() method that exposes the degrees of a graph
+ as an IntIterator.
+
+- RandomGraph removed.
+
+- ErdosRenyiGraph and almost all transformed graphs now support copy().
+
+- New Transform.NodeClassFilter.
+
+3.0.2
+
+- A new class performs parallel breadth-first visits. NeighbourhoodFunction
+ now uses it.
+
+- A new class DoubleSweepFringeDiameter computes heuristically the
+ diameter of symmetric graphs using parallel breadth-first visits.
+
+- A new class ConnectedComponents computes the connected components of
+ symmetric graphs using parallel breadth-virst visits.
+
+- Fixed in ScatteredArcsASCIIGraph a bug (inherited from fastutil) that
+ was generating spurious nodes for graphs with more than 100 million
+ nodes.
+
+- Moved unit tests to Junit4.
+
+- Fixed bug in mapOffline() that was causing some spurious zero-degree
+ nodes to be part of the graph if a tail of nodes of the original graph
+ was erased.
+
+- Fixed rare race condition in HyperANF external executions using less
+ than 64 registers.
+
+- New method to compute the median of all distances.
+
+- HyperApproximateNeighbourhoodFunction now handles graphs that do not
+ implement numArcs().
+
+- Major revamp of ImmutableSubgraph, which now uses an additional integer
+ per supergraph node, but it's much faster.
+
+- New subclass of ImmutabeSubgraph that automatically builds a subgraph
+ formed by nodes with outdegree in a specified range.
+
+- ErdosRenyiGraph is now an ImmutableGraph.
+
+- New class for checks.
+
+- HyperApproximateNeighbourhoodFunction can now be reused by calling
+ init(seed).
+
+3.0.1
+
+- We now try to adapt classes from the big version if possible.
+
+- Offset deltas can now be long (in case you have really crazy graphs).
+
+3.0
+
+- WARNING: This release has minor binary incompatibilities with previous
+ releases, mainly due to the move from the interface
+ it.unimi.dsi.util.LongBigList to the now standard
+ it.unimi.dsi.fasutil.longs.LongBigList. It is part of a parallel release
+ of fastutil, the DSI Utilities, Sux4J, MG4J, WebGraph, etc. that were
+ all modified to fit the new interface, and that prepare the way for our
+ "big" versions, that is, supporting >2^31 entries in arrays (simulated),
+ elements in lists, terms, documents, nodes, etc. Please read our (short)
+ "Moving Java to Big Data" document (JavaBig.pdf) for details.
+
+- We now require Java 6.
+
+- New mapOffline method for mapping large graphs.
+
+- All offline transformation methods now compress their batches; the
+ resulting batch size is comparable to the size of the BVGraph
+ representation with copying disabled.
+
+- Now we never resort to the ImmutableGraph implementation of
+ NodeIterator when iterating over an ImmutableSubgraph if
+ the supergraph does not implement random access. We used to
+ do it when the graph was very sparse, but not checking for
+ random access was not a good idea.
+
+- The computation of the ratio w.r.t. the information-theoretical
+ lower bound (associated to the key "compratio" in the property file of a
+ graph) was wrong and has been fixed.
+
+- A number of classes deal with exact and approximate computation of
+ the neighbourhood function of a graph, its distance density function,
+ and derived values. See our paper about HyperANF (this stuff was actually
+ introduced in 2.4.5, but we forgot to mention).
+
+- Fixed an occasional infinite loop in ErdosRenyiGraph.
+
+- New class for reading scattered arc lists (ids need to be contiguous).
+
+- BVGraph.store() now sets up the node iterator before starting the
+ progress logger. This should provide more sensible estimates of time to
+ completion in case of offline methods.
+
+- BVGraph node iterators have now a finalize() method that will close the
+ underlying bit stream (and thus possibly the underlying file handle)
+ when the iterator is no longer used.
+
+- HyperApproximateNeighbourhoodFunction would not work with an offline
+ graph even with a single thread (thanks to Lars Backstrom for reporting
+ this bug).
+
+- HyperApproximateNeighbourhoodFunction supports now 16 (-l4) or 32 (-l5)
+ registers per counter.
+
+- Fixed old-standing synchronization bug in
+ HyperApproximateNeighbourhoodFunction.
+
+- New static method NeighbourhoodFunction.harmonicDiameter().
+
+2.4.5
+
+- WebGraph is now distributed under the GNU General Public License 3.
+
+- WARNING: A small modification to the coding makes it possible to
+ compress graphs with more than 1B nodes (always up to 2B nodes). This,
+ however, means that such graphs will not be readable by previous
+ versions, which will crash. We felt this was not such a big issue, as
+ such graphs were not previously compressible at all, so the version
+ number has not been bumped.
+
+- StronglyConnectedComponents no longer uses a separate thread to
+ set the stack size. The process was not guaranteed (by contract)
+ to set the stack size at all. The computation now runs in the main
+ thread, and we suggest using suitable JVM options to set a large
+ stack size.
+
+- BVGraph now computes a wealth of statistical data related to the
+ behaviour of the compression algorithm.
+
+- A number of classes deal with estimating efficiently the neighbourhood
+ function of a graph, the effective diameter, and the spid
+ (shortest-paths index of dispersion).
+
+- A caching mechanism has been put in place to make offsets loading
+ orderds of magnitudes faster. You can generate a cached, serialised
+ EliasFanoMonotoneLongBigList with the option -L of BVGraph, and then it
+ will be loaded instead of scanning the offsets file.
+
+- Fixed bug in the definition of in/out trees in ArrayListMutableGraph.
+
+- Now Stats computes loops.
+
+- BitStreamArcLabelledGraph was not supporting offset steps any longer,
+ but constructors and static methods still made it possible to pass
+ an offset step. This has been fixed.
+
+- Some residual documentation about offset steps has been removed.
+
+- A new cutoff option makes it possible to eliminate from a graph
+ generated by a map operation on the command line (see Transform) all
+ nodes whose index is too large. This is useful in conjunction with maps
+ that quotient a graph (e.g., to get just large strongly connected
+ components).
+
+2.4.4
+
+- The empty Formatter constructor was causing problems on localised systems.
+ Now we use Locale.ROOT.
+
+- offsetStep > 1 no longer exist. It is not necessary with the new Elias-Fano
+ offset list.
+
+- Speed improvements in random access to a BVGraph.
+
+- Fixed semantics of ImmutableGraph.successorArray(): implementations are
+ now forced to return a new array at each call. All implementations in
+ WebGraph are now compliant.
+
+- Now nodeIterator(int) in ImmutableSequentialGraph is implemented so that
+ it calls nodeIterator() and then skips to the desired node.
+
+- Fixed bad bug in UnionImmutableGraph: the node for which the cache was
+ active was not set by successors().
+
+- We now output some basic, exponentially binned stats for the distribution
+ of successor gaps and residual gaps. From these data we also compute an
+ approximation of the average gap for successors and for residuals.
+
+- We now record how much space is used by every component of the compression
+ algorithm.
+
+- Following some research, the default minimum interval length in BVGraph
+ is now 4.
+
+2.4.3
+
+- Fixed ArrayListMutableGraph.addNodes() (thanks to Erik Lumer for
+ finding and fixing this bug).
+
+- New options to shift the output of ASCII graphs.
+
+- RemappedImmutableGraph.successorArray(x) was providing the same array on every
+ call, thus making the inherited successors(x) method unusable to scan in
+ parallel different lists. Fixed (now it returns a copy of the array, instead).
+
+- New random transformation that permutes randomly a graph.
+
+2.4.2
+
+- Transform was not derelativising underlying-graph filenames.
+
+- New classes to support flexible filtering of arc-labelled
+ graphs. See the new action "larcfilter" of Transform and
+ the interface LabelledArcFilter.
+
+- StronglyConnectedComponents now uses a filter for labelled arcs, in case
+ you want to compute components of a subgraph.
+
+- Fixed old bug in StronglyConnectedComponents: the renumber
+ option was not working.
+
+- New Transform.compose() transformation that composes graphs
+ (i.e., it computes the graph represented by the product
+ of the Boolean matrices representing two graphs). You can
+ even compose labelled graphs by providing a semiring to
+ compose labels.
+
+- Now label files can be longer than 2GiB.
+
+2.4.1
+
+- Fixed stupid null-pointer bug in BitStreamImmutableArcLabelledGraph.
+
+2.4
+
+- WARNING: There are more general relabelling strategies, but older
+ code must be slightly refactored.
+
+- Now BitStreamArcLabelledImmutableGraph supports contextual labels.
+ They accept an additional directory as context, to resolve relative
+ names.
+
+2.3
+
+- Fixed bug in BitStreamArcLabelledImmutableGraph: labels longer
+ than 2Gi would have caused overflows.
+
+- The new pointer loading system has been extended to arc-labelled graphs,
+ too.
+
+2.2
+
+- New pointer loading system based on succinct representations. Now on
+ typical web graphs pointers occupy 8-9 [sic] bits per element, thus
+ almost halving the memory footprint.. The performance drop is about
+ 10-15% (measured in ns/link on an Opteron) for reference chains of length
+ 3 (and it decreases for shorter chains).
+
+- New greyPerm transform to just get the permutation.
+
+- ArcLabelledImmutableGraph now strengthens the implementation of
+ nodeIterator() based on the random-access methods.
+
+- Fixed lack of checks in integer key labels.
+
+- New defensive check in BVGraph against badly implemented ImmutableGraphs.
+
+2.1
+
+- WARNING: Refactored to be based on dsiutils and Sux4J. This will cause
+ some incompatibilities, in particular with loggers.
+
+- Moved DocumentSequenceImmutableGraph to LAW, to avoid dependency
+ on MG4J and vice versa.
+
+2.0
+
+- WARNING: WebGraph 2+ is not fully compatible with previous versions, and
+ requires some minor refactoring: due to the new lazy architecture, the
+ semantics of successors() has radically changed; in particular, a
+ LazyIntIterator is returned instead of an IntIterator. Please refer to
+ the ImmutableGraph documentation.
+
+- New customised class parser that will prepend it.unimi.dsi.webgraph.
+ and it.unimi.dsi.webgraph.labelling. to classes specified on
+ the command line (at last!).
+
+- New ArcListASCIIGraph that specifies one arc per line and guesses
+ the number of nodes. A special implementation can be used when
+ nodes are numbered from one.
+
+- New --spec switch that makes it possible to specify graphs as
+ class names with arguments. Most useful to turn MG4J's document
+ sequences into graphs using a VirtualDocumentResolver.
+
+- Slightly relaxed contract for numNodes() (to make ArcListASCIIGraph
+ conforming).
+
+- New classes for union and transposition of labelled graphs. Transform
+ has been adapted to use automatically BitStreamArcLabelledImmutableGraph
+ to save arc-labelled graph, but the class is settable.
+
+- Arc-labelled graphs must expose a prototype of their labels.
+
+- New store() suggested methods for arc-labelled graphs.
+
+- New Stats class for computing basic statistical data.
+
+- Very, very, old bug in BVGraph has been fixed. nodeIterator(from)
+ with from>1 was not working properly. Thanks to Francesco Zumpano
+ and Pierluigi Origlia for finding this bug.
+
+- New example class to interface your data with arc-labelled graph classes.
+
+- Integer labels have a public value fields.
+
+- Load methods of BVGraph now look for an offsetstep property to set
+ the offset step externally.
+
+- New extension for label offsets (.labeloffsets) and new property
+ key for the underlying graph (underlyinggraph). Watch out!
+
+- New relabelling wrapper to change the labels of a graph.
+
+- New class implementing a variant of the Tarjan algorithm.
+
+- All standard extensions and property keys are now defined by string constants.
+
+- New algo package. We start with strongly connected components.
+
+1.7
+
+- Brand new ArcLabelledGraph
+
+- Deprecated classes and methods have been removed.
+
+- Revamped OutdegreeStats class.
+
+- New loadOnce() method for loading graphs on-the-fly. Very useful for
+ generating an ASCIIGraph to standard output can compressing it without
+ actually storing it.
+
+- New randomAccess() method that tells you whether a
+ graph supports random access.
+
+- A number of new packages containing unit tests.
+
+- Fixed bug in ImmutableSubgraph: the property subgraphnodes
+ was not actually read.
+
+- Implemented a workaround for bug #6478546 (you can't do read() on large
+ arrays when you have a lot of heap--bizarre, isn't it?).
+
+1.6
+
+- Most load() static methods now override the return type and
+ declare the actual returned type, usually more specific (e.g.,
+ BVGraph.load() returns a BVGraph).
+
+- Graphs can now be transposed with an offline method. It is
+ slower than the in-memory method, but it can transpose arbitrarily
+ large graphs.
+
+1.5
+
+- IMPORTANT: WebGraph requires now Java 5.
+
+- New ArrayListMutableGraph class that makes it easy to create
+ dynamically graphs, and then exposes them as an ImmutableGraph.
+
+- New documentation and example on how to import your data in
+ WebGraph.
+
+- All code moved from ProgressMeter to ProgressLogger. All old
+ methods are deprecated.
+
+- Command line parsing entirely handled by JSAP.
+
+- The default maximum reference count for BVGraph is now 3.
+
+- ASCIIGraph has been revamped to be usable to convert offline
+ large graphs.
+
+- The basename property was never used, and it is no longer saved.
+
+1.4.1
+
+- New method writeOffsets() and corresponding -O option in BVGraph
+ which writes the offsets of a graph computing them from the graph
+ representation (.graph file). This allows to distribute directly
+ just the .graph and the .properties files.
+
+- Incompatible ImmutableSubgraph, with more (hopefully) sensible
+ method names.
+
+1.4
+
+- Now various classes use the ImmutableGraph reflection methods.
+
+- New ImmutableSubgraph class for storing and manipulating subgraphs
+ holding just a reference to the node subset.
+
+- New Transform static container with common constructions, and
+ computation of Gray code ordering.
+
+- Fixed lack of error message when accessing randomly successor
+ in a sequentially loaded BVGraph.
+
+1.2.4
+
+- The graph class name is now obtained using getName(), and
+ kluges have been placed that make also old graphs work.
+
+- New explicit convention for storing the graph class name in a property file.
+
+- New static methods in ImmutableGraph that load a graph using reflection
+ and the convention above.
+
+- Fixed lack of check or null pm.
+
+- Fixed lack of loadOffline() method in BVGraph (causing infinite recursion).
+
+1.2.2
+
+- Aligned usage of iterators with fastutil 3.1.
+
+1.2.1
+
+- Fixed a stupid bug (in one case we forgot to reallocate a new
+ FastMultiByteArrayInputStream).
+
+- Fixed another stupid bug (using a standard, memory-stored
+ graph would have not worked!).
+
+1.2
+
+- BVGraph now supports graphs larger than 2 GiB (in fact, up to 256 PiB)
+ using (transparently) FastMultiByteArrayInputStream.
+
+1.1
+
+- The return type of the load method has been changed to ImmutableGraph,
+ so to make it possible to override it in subclasses. This might require some
+ additional type casting in existing code.
+
+1.0r2
+
+- Updated to new fastutil class set.
+
+1.0
+
+- First public release.
diff --git a/third_party/webgraph-3.6.1/COPYING b/third_party/webgraph-3.6.1/COPYING
new file mode 100644
index 0000000..94a9ed0
--- /dev/null
+++ b/third_party/webgraph-3.6.1/COPYING
@@ -0,0 +1,674 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+ The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works. By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users. We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors. You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+ To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights. Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received. You must make sure that they, too, receive
+or can get the source code. And you must show them these terms so they
+know their rights.
+
+ Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+ For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software. For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+ Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so. This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software. The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable. Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products. If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+ Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary. To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 0. Definitions.
+
+ "This License" refers to version 3 of the GNU General Public License.
+
+ "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+ "The Program" refers to any copyrightable work licensed under this
+License. Each licensee is addressed as "you". "Licensees" and
+"recipients" may be individuals or organizations.
+
+ To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy. The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+ A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+ To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy. Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+ To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies. Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+ An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License. If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+ 1. Source Code.
+
+ The "source code" for a work means the preferred form of the work
+for making modifications to it. "Object code" means any non-source
+form of a work.
+
+ A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+ The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form. A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+ The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities. However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work. For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+ The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+ The Corresponding Source for a work in source code form is that
+same work.
+
+ 2. Basic Permissions.
+
+ All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met. This License explicitly affirms your unlimited
+permission to run the unmodified Program. The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work. This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+ You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force. You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright. Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+ Conveying under any other circumstances is permitted solely under
+the conditions stated below. Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+ No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+ When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+ 4. Conveying Verbatim Copies.
+
+ You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+ You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+ 5. Conveying Modified Source Versions.
+
+ You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+ a) The work must carry prominent notices stating that you modified
+ it, and giving a relevant date.
+
+ b) The work must carry prominent notices stating that it is
+ released under this License and any conditions added under section
+ 7. This requirement modifies the requirement in section 4 to
+ "keep intact all notices".
+
+ c) You must license the entire work, as a whole, under this
+ License to anyone who comes into possession of a copy. This
+ License will therefore apply, along with any applicable section 7
+ additional terms, to the whole of the work, and all its parts,
+ regardless of how they are packaged. This License gives no
+ permission to license the work in any other way, but it does not
+ invalidate such permission if you have separately received it.
+
+ d) If the work has interactive user interfaces, each must display
+ Appropriate Legal Notices; however, if the Program has interactive
+ interfaces that do not display Appropriate Legal Notices, your
+ work need not make them do so.
+
+ A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit. Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+ 6. Conveying Non-Source Forms.
+
+ You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+ a) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by the
+ Corresponding Source fixed on a durable physical medium
+ customarily used for software interchange.
+
+ b) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by a
+ written offer, valid for at least three years and valid for as
+ long as you offer spare parts or customer support for that product
+ model, to give anyone who possesses the object code either (1) a
+ copy of the Corresponding Source for all the software in the
+ product that is covered by this License, on a durable physical
+ medium customarily used for software interchange, for a price no
+ more than your reasonable cost of physically performing this
+ conveying of source, or (2) access to copy the
+ Corresponding Source from a network server at no charge.
+
+ c) Convey individual copies of the object code with a copy of the
+ written offer to provide the Corresponding Source. This
+ alternative is allowed only occasionally and noncommercially, and
+ only if you received the object code with such an offer, in accord
+ with subsection 6b.
+
+ d) Convey the object code by offering access from a designated
+ place (gratis or for a charge), and offer equivalent access to the
+ Corresponding Source in the same way through the same place at no
+ further charge. You need not require recipients to copy the
+ Corresponding Source along with the object code. If the place to
+ copy the object code is a network server, the Corresponding Source
+ may be on a different server (operated by you or a third party)
+ that supports equivalent copying facilities, provided you maintain
+ clear directions next to the object code saying where to find the
+ Corresponding Source. Regardless of what server hosts the
+ Corresponding Source, you remain obligated to ensure that it is
+ available for as long as needed to satisfy these requirements.
+
+ e) Convey the object code using peer-to-peer transmission, provided
+ you inform other peers where the object code and Corresponding
+ Source of the work are being offered to the general public at no
+ charge under subsection 6d.
+
+ A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+ A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling. In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage. For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product. A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+ "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source. The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+ If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information. But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+ The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed. Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+ Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+ 7. Additional Terms.
+
+ "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law. If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+ When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it. (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.) You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+ Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+ a) Disclaiming warranty or limiting liability differently from the
+ terms of sections 15 and 16 of this License; or
+
+ b) Requiring preservation of specified reasonable legal notices or
+ author attributions in that material or in the Appropriate Legal
+ Notices displayed by works containing it; or
+
+ c) Prohibiting misrepresentation of the origin of that material, or
+ requiring that modified versions of such material be marked in
+ reasonable ways as different from the original version; or
+
+ d) Limiting the use for publicity purposes of names of licensors or
+ authors of the material; or
+
+ e) Declining to grant rights under trademark law for use of some
+ trade names, trademarks, or service marks; or
+
+ f) Requiring indemnification of licensors and authors of that
+ material by anyone who conveys the material (or modified versions of
+ it) with contractual assumptions of liability to the recipient, for
+ any liability that these contractual assumptions directly impose on
+ those licensors and authors.
+
+ All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10. If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term. If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+ If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+ Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+ 8. Termination.
+
+ You may not propagate or modify a covered work except as expressly
+provided under this License. Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+ However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+ Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License. If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+ 9. Acceptance Not Required for Having Copies.
+
+ You are not required to accept this License in order to receive or
+run a copy of the Program. Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance. However,
+nothing other than this License grants you permission to propagate or
+modify any covered work. These actions infringe copyright if you do
+not accept this License. Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+ 10. Automatic Licensing of Downstream Recipients.
+
+ Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License. You are not responsible
+for enforcing compliance by third parties with this License.
+
+ An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations. If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+ You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License. For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+ 11. Patents.
+
+ A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based. The
+work thus licensed is called the contributor's "contributor version".
+
+ A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version. For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+ In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement). To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+ If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients. "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+ If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+ A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License. You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+ Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+ 12. No Surrender of Others' Freedom.
+
+ If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all. For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+ 13. Use with the GNU Affero General Public License.
+
+ Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work. The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+ 14. Revised Versions of this License.
+
+ The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation. If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+ If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+ Later license versions may give you additional or different
+permissions. However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+ 15. Disclaimer of Warranty.
+
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. Limitation of Liability.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+ 17. Interpretation of Sections 15 and 16.
+
+ If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+ If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+ <program> Copyright (C) <year> <name of author>
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+ You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+ The GNU General Public License does not permit incorporating your program
+into proprietary programs. If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License. But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/third_party/webgraph-3.6.1/JavaBig.pdf b/third_party/webgraph-3.6.1/JavaBig.pdf
new file mode 100644
index 0000000..514ffc8
Binary files /dev/null and b/third_party/webgraph-3.6.1/JavaBig.pdf differ
diff --git a/third_party/webgraph-3.6.1/build.properties b/third_party/webgraph-3.6.1/build.properties
new file mode 100644
index 0000000..b505ca3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/build.properties
@@ -0,0 +1,38 @@
+version=3.6.1
+
+build.sysclasspath=last
+
+jar.base=/usr/share/java
+javadoc.base=/usr/share/javadoc
+
+dist=dist
+src=src
+test=test
+slow=slow
+reports=reports
+coverage=coverage
+checkstyle=checkstyle
+docs=docs
+build=build
+instrumented=instr
+
+# Whenever it necessary to add new jar to the project, the following
+# data must be updated:
+
+# 1) the list of local javadocs
+# 2) the list of remote javadocs
+# 3) the list of javadocs referenced by the javadoc target
+# 4) the list of jars in the fingbugs target
+
+j2se.apiurl=http://docs.oracle.com/javase/8/docs/api/
+fastutil.apiurl=http://fastutil.di.unimi.it/docs/
+dsiutils.apiurl=http://dsiutils.di.unimi.it/docs/
+sux4j.apiurl=http://sux4j.di.unimi.it/docs/
+jsap.apiurl=http://www.martiansoftware.com/jsap/doc/javadoc/
+junit.apiurl=http://junit.sourceforge.net/javadoc_40/
+slf4j.apiurl=http://www.slf4j.org/apidocs/
+commons-configuration.apiurl=http://commons.apache.org/configuration/apidocs/
+commons-io.apiurl=http://commons.apache.org/proper/commons-io/javadocs/api-release/
+commons-lang.apiurl=http://commons.apache.org/proper/commons-lang/javadocs/api-release/
+commons-collections.apiurl=http://commons.apache.org/proper/commons-collections/javadocs/api-release/
+guava.apiurl=http://google.github.io/guava/releases/22.0/api/docs/
diff --git a/third_party/webgraph-3.6.1/build.xml b/third_party/webgraph-3.6.1/build.xml
new file mode 100644
index 0000000..b57dfcd
--- /dev/null
+++ b/third_party/webgraph-3.6.1/build.xml
@@ -0,0 +1,306 @@
+<project name="webgraph" default="jar" basedir="." xmlns:ivy="antlib:org.apache.ivy.ant" xmlns:artifact="antlib:org.apache.maven.artifact.ant">
+
+ <property name="build.sysclasspath" value="ignore"/>
+ <property name="jars.dir" value="${basedir}/jars"/>
+ <property file="build.properties"/>
+
+ <property environment="env"/>
+
+ <property name="ivy.pom.version" value="${version}" />
+ <condition property="ivy.settings.file" value="${env.LOCAL_IVY_SETTINGS}"><isset property="env.LOCAL_IVY_SETTINGS"/></condition>
+
+ <taskdef resource="org/apache/ivy/ant/antlib.xml" uri="antlib:org.apache.ivy.ant"/>
+
+ <target name="ivy-setupjars" description="Downloads dependencies with ivy and generate report">
+ <ivy:retrieve symlink="true" sync="true" pattern="${jars.dir}/[conf]/[artifact].[ext]"/>
+ <ivy:report todir="${dist}/ivy-report"/>
+ </target>
+
+ <target name="ivy-clean" description="Cleans ivy cache, jars dir and ivy installation">
+ <delete dir="${jars.dir}"/>
+ </target>
+
+ <target name="ivy-pom" description="Creates POM">
+ <ivy:resolve/>
+ <ivy:deliver deliverpattern="${dist}/ivy.xml" pubrevision="${version}" status="release"/>
+ <ivy:makepom ivyfile="${dist}/ivy.xml" templatefile="pom-model.xml" pomfile="pom.xml">
+ <dependency group="ch.qos.logback" artifact="logback-classic.jar" optional="true"/>
+ </ivy:makepom>
+ </target>
+
+ <path id="compile.classpath">
+ <fileset dir="${jars.dir}/compile"/>
+ </path>
+ <path id="test.classpath">
+ <fileset dir="${jars.dir}/test"/>
+ </path>
+ <path id="project.classpath">
+ <fileset dir="${jars.dir}/runtime"/>
+ </path>
+
+ <!-- ************************************** WARNING: MAVEN SH*T ************************************** -->
+
+ <!-- define Maven coordinates -->
+ <property name="groupId" value="it.unimi.dsi" />
+ <property name="artifactId" value="webgraph" />
+ <property name="version" value="${version}" />
+
+ <!-- define artifacts' name, which follows the convention of Maven -->
+ <property name="maven-jar" value="${dist}/lib/${artifactId}-${version}.jar" />
+ <property name="maven-javadoc-jar" value="${dist}/lib/${artifactId}-${version}-javadoc.jar" />
+ <property name="maven-sources-jar" value="${dist}/lib/${artifactId}-${version}-sources.jar" />
+
+ <!-- defined maven snapshots and staging repository id and url -->
+ <property name="maven-snapshots-repository-id" value="sonatype-nexus-snapshots" />
+ <property name="maven-snapshots-repository-url" value="https://oss.sonatype.org/content/repositories/snapshots/" />
+ <property name="maven-staging-repository-id" value="sonatype-nexus-staging" />
+ <property name="maven-staging-repository-url" value="https://oss.sonatype.org/service/local/staging/deploy/maven2/" />
+
+ <target name="dist" depends="compile,javadoc" description="generate the distribution">
+
+ <!-- build the main artifact -->
+ <jar jarfile="${maven-jar}" basedir="${build}" />
+
+ <!-- build the javadoc artifact (from symbolic link created in init) -->
+ <jar jarfile="${maven-javadoc-jar}">
+ <fileset dir="${dist}/javadoc" />
+ </jar>
+
+ <!-- build the sources artifact -->
+ <jar jarfile="${maven-sources-jar}">
+ <fileset dir="." includes="CHANGES,COPYING,build.xml,build.properties,ivy.xml,${src}/**/*.java,${src}/**/*.html,${test}/**/*.java,${slow}/**/*.java"/>
+ </jar>
+ </target>
+
+ <target name="deploy" depends="dist,ivy-pom" description="deploy snapshot version to Maven snapshot repository">
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-deploy-plugin:2.6:deploy-file" />
+ <arg value="-Durl=${maven-snapshots-repository-url}" />
+ <arg value="-DrepositoryId=${maven-snapshots-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-jar}" />
+ </artifact:mvn>
+ </target>
+
+ <target name="stage" depends="dist,ivy-pom" description="deploy release version to Maven staging repository">
+ <!-- sign and deploy the main artifact -->
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-gpg-plugin:1.3:sign-and-deploy-file" />
+ <arg value="-Durl=${maven-staging-repository-url}" />
+ <arg value="-DrepositoryId=${maven-staging-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-jar}" />
+ <arg value="-Pgpg" />
+ </artifact:mvn>
+
+ <!-- sign and deploy the sources artifact -->
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-gpg-plugin:1.3:sign-and-deploy-file" />
+ <arg value="-Durl=${maven-staging-repository-url}" />
+ <arg value="-DrepositoryId=${maven-staging-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-sources-jar}" />
+ <arg value="-Dclassifier=sources" />
+ <arg value="-Pgpg" />
+ </artifact:mvn>
+
+ <!-- sign and deploy the javadoc artifact -->
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-gpg-plugin:1.3:sign-and-deploy-file" />
+ <arg value="-Durl=${maven-staging-repository-url}" />
+ <arg value="-DrepositoryId=${maven-staging-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-javadoc-jar}" />
+ <arg value="-Dclassifier=javadoc" />
+ <arg value="-Pgpg" />
+ </artifact:mvn>
+ </target>
+
+ <!-- ************************************** END OF MAVEN SH*T ************************************** -->
+
+ <property name="subdir" value=""/>
+
+ <!-- ************ SOURCE ********************* -->
+ <target name="init">
+ <available property="ivy.set.up" file="${jars.dir}"/>
+ <fail message="It appears that Ivy has not been set up properly. Please run &quot;ant ivy-setupjars&quot; and try again." unless="ivy.set.up"/>
+ <mkdir dir="${dist}"/>
+ <mkdir dir="${docs}"/>
+ <mkdir dir="${build}"/>
+ <mkdir dir="${reports}"/>
+ <mkdir dir="${coverage}"/>
+ <mkdir dir="${instrumented}"/>
+ <symlink link="${dist}/javadoc" resource="../${docs}" overwrite="true"/>
+ </target>
+
+ <target name="compile" depends="init" description="Compile sources (without tests)">
+ <javac srcdir="${src}" debug="on" optimize="on" destdir="${build}" encoding="UTF-8" source="1.8" target="1.8" classpathref="compile.classpath">
+ <compilerarg value="-Xlint:all"/>
+ </javac>
+ </target>
+
+ <target name="compile-tests" depends="init" description="Compile sources (with tests)">
+ <javac srcdir="${src}:${test}:${slow}" debug="on" optimize="on" destdir="${build}" encoding="UTF-8" source="1.8" target="1.8" classpathref="test.classpath">
+ <compilerarg value="-Xlint:all"/>
+ </javac>
+ </target>
+
+ <target name="jar" depends="compile" description="Creates jar (without tests)">
+ <jar jarfile="webgraph-${version}.jar">
+ <fileset dir="${build}"/>
+ </jar>
+ </target>
+
+ <target name="jar-tests" depends="compile-tests" description="Creates jar (with tests)">
+ <jar jarfile="webgraph-${version}.jar">
+ <fileset dir="${build}"/>
+ </jar>
+ </target>
+
+ <!-- ************ JAVADOC ********************* -->
+ <target name="javadoc" description="Generates documentation">
+ <delete dir="${docs}"/>
+ <mkdir dir="${docs}"/>
+ <javadoc destdir="${docs}"
+ encoding="UTF-8"
+ sourcepath="${src}"
+ packagenames="it.unimi.dsi.webgraph.*"
+ private="off"
+ overview="${src}/overview.html"
+ source="1.8"
+ windowtitle="WebGraph ${version}"
+ classpathref="compile.classpath">
+ <link href="${j2se.apiurl}"/>
+ <link href="${fastutil.apiurl}"/>
+ <link href="${dsiutils.apiurl}"/>
+ <link href="${sux4j.apiurl}"/>
+ <link href="${slf4j.apiurl}"/>
+ <link href="${jsap.apiurl}"/>
+ <link href="${junit.apiurl}"/>
+ <link href="${commons-io.apiurl}"/>
+ <link href="${commons-lang.apiurl}"/>
+ <link href="${commons-configuration.apiurl}"/>
+ <link href="${commons-collections.apiurl}"/>
+ </javadoc>
+ </target>
+
+ <target name="junit" depends="instrument" description="Runs JUnit tests">
+
+ <junit printsummary="yes" fork="yes" haltonfailure="off" haltonerror="off">
+ <classpath>
+ <path refid="test.classpath" />
+ <pathelement location="${instrumented}/classes"/>
+ <pathelement location="${build}"/>
+ <pathelement location="${src}"/>
+ <pathelement location="${test}"/>
+ <pathelement location="${slow}"/>
+ </classpath>
+
+ <assertions><enable/></assertions>
+
+ <jvmarg value="-Demma.coverage.out.file=${coverage}/coverage.emma" />
+ <jvmarg value="-Demma.coverage.out.merge=true" />
+
+ <jvmarg value="-Xmx1G" />
+ <jvmarg value="-Xss1G" />
+
+ <formatter type="xml"/>
+ <formatter type="plain"/>
+
+ <batchtest fork="yes" todir="${reports}">
+ <fileset dir="${instrumented}/classes">
+ <include name="it/unimi/dsi/webgraph/**/*Test.class"/>
+ <exclude name="it/unimi/dsi/webgraph/**/*SlowTest.class"/>
+ <exclude name="it/unimi/dsi/webgraph/test/*"/>
+ </fileset>
+ </batchtest>
+ </junit>
+
+ <junitreport todir="reports">
+ <fileset dir="reports">
+ <include name="TEST-*.xml"/>
+ </fileset>
+ <report todir="reports/html"/>
+ </junitreport>
+
+ <emma>
+ <report sourcepath="${src}" >
+ <fileset file="${coverage}/*a"/>
+ <html outfile="coverage.html" />
+ <xml outfile="${coverage}/coverage.xml" />
+ </report>
+ </emma>
+ </target>
+
+ <target name="junit-slow" depends="instrument" description="Runs JUnit tests">
+
+ <junit printsummary="yes" fork="yes" haltonfailure="off" haltonerror="off">
+ <classpath>
+ <path refid="test.classpath" />
+ <pathelement location="${instrumented}/classes"/>
+ <pathelement location="${build}"/>
+ <pathelement location="${src}"/>
+ <pathelement location="${test}"/>
+ <pathelement location="${slow}"/>
+ </classpath>
+
+ <jvmarg value="-Demma.coverage.out.file=${coverage}/coverage.emma" />
+ <jvmarg value="-Demma.coverage.out.merge=true" />
+ <jvmarg value="-Xmx1G" />
+
+ <formatter type="xml"/>
+ <formatter type="plain"/>
+
+ <batchtest fork="yes" todir="${reports}">
+ <fileset dir="${instrumented}/classes">
+ <include name="it/unimi/dsi/webgraph/**/*SlowTest.class"/>
+ </fileset>
+ </batchtest>
+ </junit>
+
+ <junitreport todir="reports">
+ <fileset dir="reports">
+ <include name="TEST-*.xml"/>
+ </fileset>
+ <report todir="reports/html"/>
+ </junitreport>
+
+ <emma>
+ <report sourcepath="${src}" >
+ <fileset file="${coverage}/*a"/>
+ <html outfile="coverage.html" />
+ <xml outfile="${coverage}/coverage.xml" />
+ </report>
+ </emma>
+ </target>
+
+
+ <target name="instrument" depends="compile-tests" description="Generate instrumented classes">
+ <taskdef resource="emma_ant.properties" classpathref="test.classpath"/>
+
+ <emma>
+ <instr mode="fullcopy"
+ outdir="${instrumented}"
+ merge="no"
+ metadatafile="${coverage}/metadata.emma"
+ instrpath="${build}"
+ >
+ <filter excludes="*Test*"/>
+ </instr>
+ </emma>
+ </target>
+
+ <!-- ************ CLEAN ********************* -->
+ <target name="clean">
+ <delete dir="${dist}"/>
+ <delete dir="${build}"/>
+ <delete dir="${reports}"/>
+ <delete dir="${coverage}"/>
+ <delete dir="${instrumented}"/>
+ <delete dir="${docs}"/>
+ <delete>
+ <fileset dir="." includes="*.jar"/>
+ </delete>
+ </target>
+</project>
+
diff --git a/third_party/webgraph-3.6.1/ivy.xml b/third_party/webgraph-3.6.1/ivy.xml
new file mode 100644
index 0000000..16d226b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/ivy.xml
@@ -0,0 +1,27 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<ivy-module version="2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ant.apache.org/ivy/schemas/ivy.xsd">
+ <info organisation="it.unimi.dsi" module="webgraph"/>
+
+ <configurations defaultconf="compile" defaultconfmapping="*->default">
+ <conf name="compile"/>
+ <conf name="runtime" extends="compile"/>
+ <conf name="test" extends="runtime"/>
+ </configurations>
+
+ <dependencies>
+
+ <dependency org="it.unimi.dsi" name="fastutil" rev="latest.release" />
+ <dependency org="it.unimi.dsi" name="sux4j" rev="latest.release" />
+ <dependency org="it.unimi.dsi" name="dsiutils" rev="latest.release" />
+ <dependency org="net.sf.jung" name="jung-api" rev="latest.release"/>
+ <dependency org="net.sf.jung" name="jung-io" rev="latest.release"/>
+ <dependency org="com.martiansoftware" name="jsap" rev="latest.release"/>
+ <dependency org="junit" name="junit" rev="latest.release" conf="test"/>
+ <dependency org="emma" name="emma" rev="latest.release" conf="test"/>
+ <dependency org="emma" name="emma_ant" rev="latest.release" conf="test"/>
+
+ <dependency org="ch.qos.logback" name="logback-classic" rev="latest.release" conf="runtime"/>
+ <dependency org="commons-configuration" name="commons-configuration" rev="latest.release"/>
+ <dependency org="org.apache.commons" name="commons-lang3" rev="latest.release"/>
+ </dependencies>
+</ivy-module>
diff --git a/third_party/webgraph-3.6.1/pom-model.xml b/third_party/webgraph-3.6.1/pom-model.xml
new file mode 100644
index 0000000..ac4a94b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/pom-model.xml
@@ -0,0 +1,36 @@
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+ <groupId>it.unimi.dsi</groupId>
+ <artifactId>${ivy.pom.artifactId}</artifactId>
+ <packaging>jar</packaging>
+ <name>WebGraph</name>
+ <version>${ivy.pom.version}</version>
+ <description>WebGraph is a framework to study the web graph. It provides simple ways to manage very large graph, exploiting modern compression techniques.</description>
+ <url>http://webgraph.dsi.unimi.it/</url>
+ <licenses>
+ <license>
+ <name>GNU General Public License Version 3+</name>
+ <url>http://www.gnu.org/licenses/gpl.html</url>
+ <distribution>repo</distribution>
+ </license>
+ </licenses>
+ <scm>
+ <connection>scm:git://github.com/vigna/WebGraph.git</connection>
+ <url>https://github.com/vigna/WebGraph</url>
+ </scm>
+ <developers>
+
+ <developer>
+ <id>boldi</id>
+ <name>Paolo Boldi</name>
+ <email>boldi@dsi.unimi.it</email>
+ </developer>
+
+ <developer>
+ <id>vigna</id>
+ <name>Sebastiano Vigna</name>
+ <email>vigna@dsi.unimi.it</email>
+ </developer>
+
+ </developers>
+</project>
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/ConnectedComponentsSlowTest.java b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/ConnectedComponentsSlowTest.java
new file mode 100644
index 0000000..1532526
--- /dev/null
+++ b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/ConnectedComponentsSlowTest.java
@@ -0,0 +1,36 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.webgraph.algo.ConnectedComponentsTest;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class ConnectedComponentsSlowTest extends WebGraphTestCase {
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = Transform.symmetrize(ImmutableGraph.load(path));
+ ConnectedComponentsTest.sameComponents(g);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/StronglyConnectedComponentsSlowTest.java b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/StronglyConnectedComponentsSlowTest.java
new file mode 100644
index 0000000..53efd39
--- /dev/null
+++ b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/StronglyConnectedComponentsSlowTest.java
@@ -0,0 +1,27 @@
+package it.unimi.dsi.webgraph;
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.algo.StronglyConnectedComponentsTarjan;
+import it.unimi.dsi.webgraph.algo.StronglyConnectedComponents;
+import it.unimi.dsi.webgraph.algo.StronglyConnectedComponentsTest;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+public class StronglyConnectedComponentsSlowTest extends WebGraphTestCase {
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ final StronglyConnectedComponentsTarjan componentsRecursive = StronglyConnectedComponentsTarjan.compute(g, true, new ProgressLogger());
+ final StronglyConnectedComponents componentsIterative = StronglyConnectedComponents.compute(g, true, new ProgressLogger());
+ assertEquals(componentsRecursive.numberOfComponents, componentsIterative.numberOfComponents);
+ StronglyConnectedComponentsTest.sameComponents(g.numNodes(), componentsRecursive, componentsIterative);
+ deleteGraph(path);
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/algo/EstimateEffectiveDiameterSlowTest.java b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/algo/EstimateEffectiveDiameterSlowTest.java
new file mode 100644
index 0000000..c575c96
--- /dev/null
+++ b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/algo/EstimateEffectiveDiameterSlowTest.java
@@ -0,0 +1,43 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class EstimateEffectiveDiameterSlowTest extends WebGraphTestCase {
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ final HyperBall hyperBall = new HyperBall(g, 8, 0);
+ hyperBall.run(Integer.MAX_VALUE, -1);
+ assertEquals(NeighbourhoodFunction.effectiveDiameter(.9, HyperBallSlowTest.cnr2000NF), NeighbourhoodFunction.effectiveDiameter(.9, hyperBall.neighbourhoodFunction.toDoubleArray()), 1);
+ hyperBall.close();
+ deleteGraph(path);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/algo/HyperBallSlowTest.java b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/algo/HyperBallSlowTest.java
new file mode 100644
index 0000000..2941eee
--- /dev/null
+++ b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/algo/HyperBallSlowTest.java
@@ -0,0 +1,75 @@
+package it.unimi.dsi.webgraph.algo;
+
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class HyperBallSlowTest extends WebGraphTestCase {
+
+ /** The true (i.e., exactly computed by {@link NeighbourhoodFunction}) neighbourhood function of <code>cnr-2000</code>. */
+ public final static double[] cnr2000NF = { 325557.0, 3454267.0, 3.4531824E7, 1.5878699E8, 6.83926525E8, 1.190460703E9, 1.604430414E9, 2.35307782E9, 2.997067429E9, 3.968809803E9, 5.058079643E9,
+ 6.421976049E9, 8.284517654E9, 1.0243847731E10, 1.2607757915E10, 1.5228803201E10, 1.7747396141E10, 1.9909476778E10, 2.221766255E10, 2.4379845882E10, 2.6311779701E10, 2.8107451664E10,
+ 2.9665243165E10, 3.0951071763E10, 3.218581841E10, 3.3215135972E10, 3.4149034335E10, 3.4932882223E10, 3.5364851538E10, 3.5931189753E10, 3.6281498738E10, 3.6560429256E10, 3.6817190941E10,
+ 3.6998241145E10, 3.7125032189E10, 3.7214125718E10, 3.7278637339E10, 3.7317211025E10, 3.7344441435E10, 3.7363743739E10, 3.7376116159E10, 3.7386091516E10, 3.7393988067E10, 3.7401055259E10,
+ 3.740755634E10, 3.7413358276E10, 3.7418706947E10, 3.7423579858E10, 3.7427946736E10, 3.7431862349E10, 3.7435354797E10, 3.7438438086E10, 3.7441057447E10, 3.7443233065E10, 3.7445170896E10,
+ 3.7446818612E10, 3.7448244469E10, 3.7449425939E10, 3.745045924E10, 3.7451366966E10, 3.7452151719E10, 3.7452841271E10, 3.7453422635E10, 3.7453918161E10, 3.7454357668E10, 3.7454740726E10,
+ 3.7455030057E10, 3.745523956E10, 3.7455417775E10, 3.7455555869E10, 3.7455655899E10, 3.7455728404E10, 3.7455776324E10, 3.7455807203E10, 3.7455827683E10, 3.7455839892E10, 3.7455845502E10,
+ 3.7455848208E10, 3.7455850151E10, 3.745585096E10, 3.7455851388E10, 3.7455851633E10, 3.7455851773E10, 3.7455851833E10, 3.7455851843E10 };
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ int correct[] = new int[cnr2000NF.length];
+ final int limit = cnr2000NF.length;
+ for(int log2m: new int[] { 4, 7 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int attempt = 0; attempt < 10; attempt++) {
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? Transform.transpose(g) : null, log2m, null, 0, 0, 0, attempt % 2 != 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ for(int i = 1; i < limit; i++) {
+ System.err.println("log2m: " + log2m + " attempt: " + attempt + " round: " + i);
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ HyperBallTest.assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ HyperBallTest.assertRelativeError(sequentialCurrent, current, HyperBallTest.THRESHOLD);
+ if (Math.abs(cnr2000NF[i] - current) <= cnr2000NF[i] * 2 * rsd) correct[i]++;
+ }
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ for(int i = 1; i < limit; i++) assertTrue(correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ deleteGraph(path);
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.graph b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.graph
new file mode 100644
index 0000000..94cd2ac
Binary files /dev/null and b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.graph differ
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.graph-txt.gz b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.graph-txt.gz
new file mode 100644
index 0000000..09dcae2
Binary files /dev/null and b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.graph-txt.gz differ
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.offsets b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.offsets
new file mode 100644
index 0000000..55f5ca0
Binary files /dev/null and b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.offsets differ
diff --git a/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.properties b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.properties
new file mode 100644
index 0000000..73f0f31
--- /dev/null
+++ b/third_party/webgraph-3.6.1/slow/it/unimi/dsi/webgraph/cnr-2000.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Mon Apr 03 14:58:33 CEST 2006
+bitspernode=35.15
+arcs=3216152
+nodes=325557
+graphclass=it.unimi.dsi.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=3.56
+avgdist=1.74
+compressionflags=
+version=0
+avgref=1.38
+zetak=3
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ASCIIGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ASCIIGraph.java
new file mode 100644
index 0000000..b2f1fa2
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ASCIIGraph.java
@@ -0,0 +1,321 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.BufferedReader;
+import java.io.FileOutputStream;
+import java.io.FileReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.io.StreamTokenizer;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+/** An {@link ImmutableGraph} that corresponds to graphs stored in a human-readable
+ * ASCII format where each line contains the list of successors of a given node.
+ *
+ * <p>The file format is as follows: the graph is stored in a file named <code><var>basename</var>.graph-txt</code>.
+ * The first line contains the number of nodes, <var>n</var>. Then, <var>n</var> lines follow, the <var>i</var>-th
+ * line containing the successors of node <var>i</var> in increasing order
+ * (nodes are numbered from 0 to <var>n</var>&minus;1).
+ * Successors are separated by a single space.
+ *
+ * <P>Contrarily to other classes, the load methods of this class <strong>do not always return instances of this class</strong>.
+ * In particular, {@link #loadOffline(CharSequence)} and {@link #loadOnce(InputStream)} <em>will</em> return an instance of this class for
+ * offline access. The instance will not provide random access, but sequential access will be backed by
+ * the original text file and only one array of successor will be loaded in core memory at any time.
+ *
+ * <p>The {@link #load(CharSequence)} method, on the other hand, will return an instance of
+ * {@link it.unimi.dsi.webgraph.ArrayListMutableGraph} built by copying an offline instance of this class.
+ *
+ * <h2>Using {@link ASCIIGraph} to convert your data</h2>
+ *
+ * <p>A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs. Suppose you
+ * create the following file, named <code>example.graph-txt</code>:
+ * <pre>
+ * 2
+ * 1
+ * 0 1
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -g ASCIIGraph example bvexample
+ * </pre>
+ * will produce a compressed graph in {@link it.unimi.dsi.webgraph.BVGraph} format
+ * with basename <code>bvexample</code>. Even more convenient is the {@link #loadOnce(InputStream)}
+ * method, which reads from an input stream an ASCII graph and exposes it for a single traversal. It
+ * can be used, for instance, with the main method of {@link it.unimi.dsi.webgraph.BVGraph} to
+ * generate somehow an ASCII graph and store it in compressed form on the fly. The previous
+ * example could be then rewritten as
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -1 -g ASCIIGraph dummy bvexample &lt;example.graph-txt
+ * </pre>
+ */
+
+
+public class ASCIIGraph extends ImmutableSequentialGraph {
+ /** The standard extension of an ASCII graph. */
+ private static final String ASCII_GRAPH_EXTENSION = ".graph-txt";
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(ASCIIGraph.class);
+
+ /** Number of nodes. */
+ private final int n;
+ /** The file containing the graph, or <code>null</code> for a read-once ASCII graph. */
+ private final CharSequence graphFile;
+ /** A fast buffered reader containing the description of an ASCII graph (except for the number of nodes) for a read-once ASCII graph; <code>null</code>, otherwise. */
+ private final FastBufferedReader fbr;
+
+ protected ASCIIGraph(final CharSequence graphFile) throws NumberFormatException, IOException {
+ this.graphFile = graphFile;
+
+ final BufferedReader bufferedReader = new BufferedReader(new FileReader(graphFile.toString() + ASCII_GRAPH_EXTENSION));
+ n = Integer.parseInt(bufferedReader.readLine());
+ bufferedReader.close();
+ fbr = null;
+ if (n < 0) throw new IllegalArgumentException("Number of nodes must be nonnegative");
+ }
+
+ /** Creates a read-once ASCII graph. Instances created using this constructor can be
+ * only accessed using a single call to {@link #nodeIterator(int)}.
+ *
+ * @param is an input stream containing an ASCII graph.
+ */
+
+ public ASCIIGraph(final InputStream is) throws NumberFormatException, IOException {
+ graphFile = null;
+ fbr = new FastBufferedReader(new InputStreamReader(is, "ASCII"));
+ n = Integer.parseInt(fbr.readLine(new MutableString()).toString());
+ if (n < 0) throw new IllegalArgumentException("Number of nodes must be nonnegative");
+ }
+
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ if (from < 0 || from > n) throw new IllegalArgumentException();
+ try {
+ final FastBufferedReader fbr = this.fbr != null ? this.fbr : new FastBufferedReader(new FileReader(graphFile + ASCII_GRAPH_EXTENSION));
+ final MutableString s = new MutableString();
+ // We skip up to from, but we skip the first line only if this is not a read-once scan (in that case the constructor has read the first line).
+ for (int i = from + (this.fbr != null ? 0 : 1); i-- != 0;)
+ fbr.readLine(s);
+
+ final StreamTokenizer st = new StreamTokenizer(fbr);
+ st.eolIsSignificant(true);
+ st.parseNumbers();
+
+ return new NodeIterator() {
+ int i = from;
+
+ IntArrayList successors = new IntArrayList();
+
+ @Override
+ public boolean hasNext() {
+ return i < n;
+ }
+
+ @Override
+ public int[] successorArray() {
+ return successors.elements();
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ successors.clear();
+ int tokenType, dep;
+
+ try {
+ do {
+ tokenType = st.nextToken();
+ if (tokenType == StreamTokenizer.TT_NUMBER) {
+ successors.add(dep = (int)st.nval);
+ if (dep < 0 || dep >= n)
+ throw new IOException("The value " + dep + " is not a node index at line " + st.lineno());
+ }
+ else if (tokenType != StreamTokenizer.TT_EOL) {
+ throw new IOException("Unexpected token " + st.toString());
+ }
+ } while (tokenType != StreamTokenizer.TT_EOL);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ return i++;
+ }
+
+ @Override
+ public int outdegree() {
+ return successors.size();
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+
+ };
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public NodeIterator[] splitNodeIterators(final int howMany) {
+ NodeIterator[] result = new NodeIterator[howMany];
+ result[0] = nodeIterator();
+ Arrays.fill(result, 1, result.length, NodeIterator.EMPTY);
+ return result;
+ }
+
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return loadOffline(basename);
+ }
+
+ @Deprecated
+ public static ASCIIGraph loadSequential(CharSequence basename, ProgressLogger unused) throws IOException {
+ return loadOffline(basename, unused);
+ }
+
+ public static ASCIIGraph loadOffline(CharSequence basename) throws IOException {
+ return loadOffline(basename, (ProgressLogger)null);
+ }
+
+ public static ASCIIGraph loadOffline(CharSequence basename, ProgressLogger unused) throws IOException {
+ return new ASCIIGraph(basename);
+ }
+
+ public static ASCIIGraph loadMapped(CharSequence basename) throws IOException {
+ return loadOffline(basename);
+ }
+
+ public static ASCIIGraph loadMapped(CharSequence basename, ProgressLogger unused) throws IOException {
+ return loadOffline(basename);
+ }
+
+ public static ASCIIGraph loadOnce(final InputStream is) throws IOException {
+ return new ASCIIGraph(is);
+ }
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(basename, (ProgressLogger)null);
+ }
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger unused) throws IOException {
+ return new ArrayListMutableGraph(loadOffline(basename)).immutableView();
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename, @SuppressWarnings("unused") ProgressLogger unused) throws IOException {
+ store(graph, basename);
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename) throws IOException {
+ store(graph, 0, basename);
+ }
+
+ public static void store(ImmutableGraph graph, final int shift, CharSequence basename) throws IOException {
+ final PrintStream ps = new PrintStream(new FastBufferedOutputStream(new FileOutputStream(basename + ASCII_GRAPH_EXTENSION)), false, Charsets.US_ASCII.toString());
+ int n = graph.numNodes();
+ LazyIntIterator successors;
+
+ ps.println(n);
+ for (NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ successors = nodeIterator.successors();
+ while (d-- != 0) ps.print((successors.nextInt() + shift) + " ");
+ ps.println();
+ }
+ ps.close();
+ }
+
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException, ClassNotFoundException, InstantiationException {
+ String sourceBasename, destBasename;
+ Class<?> graphClass;
+
+ SimpleJSAP jsap = new SimpleJSAP(ASCIIGraph.class.getName(), "Reads a graph with a given basename, or a given spec, and writes it out in ASCII format with another basename",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph"),
+ new FlaggedOption("shift", JSAP.INTEGER_PARSER, null, JSAP.NOT_REQUIRED, 'S', "shift", "A shift that will be added to each node index."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a spec of the form ImmutableGraphClass(arg,arg,...)."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph, or a source spec if --spec was given; it is immaterial when --once is specified."),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph"),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ graphClass = jsapResult.getClass("graphClass");
+ sourceBasename = jsapResult.getString("sourceBasename");
+ destBasename = jsapResult.getString("destBasename");
+ final boolean spec = jsapResult.getBoolean("spec");
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ if (graphClass != null && spec) {
+ System.err.println("Options --graphClass and --spec are incompatible");
+ return;
+ }
+
+ ImmutableGraph graph;
+ if (!spec)
+ graph = graphClass != null
+ ? (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class, ProgressLogger.class).invoke(null, sourceBasename, pl)
+ : ImmutableGraph.loadOffline(sourceBasename, pl);
+ else
+ graph = ObjectParser.fromSpec(sourceBasename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ if (jsapResult.userSpecified("shift")) ASCIIGraph.store(graph, jsapResult.getInt("shift"), destBasename);
+ else ASCIIGraph.store(graph, destBasename);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/AbstractLazyIntIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/AbstractLazyIntIterator.java
new file mode 100644
index 0000000..09bb852
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/AbstractLazyIntIterator.java
@@ -0,0 +1,33 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract implementation of a lazy integer iterator, implementing {@link #skip(int)}
+ * by repeated calls to {@link LazyIntIterator#nextInt() nextInt()}. */
+
+public abstract class AbstractLazyIntIterator implements LazyIntIterator {
+
+ @Override
+ public int skip(final int n) {
+ int i;
+ for(i = 0; i < n && nextInt() != -1; i++);
+ return i;
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ArcListASCIIGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ArcListASCIIGraph.java
new file mode 100644
index 0000000..735db3f
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ArcListASCIIGraph.java
@@ -0,0 +1,355 @@
+package it.unimi.dsi.webgraph;
+
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.io.StreamTokenizer;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.logging.ProgressLogger;
+
+
+/** An {@link ImmutableGraph} that corresponds to graphs stored in a human-readable
+ * ASCII format were each line contains an arc.
+ *
+ * <p>The file format is very simple: each line contains an arc specified as two nodes
+ * separated by whitespace (but we suggest exactly one TAB character). Sources must be in increasing
+ * order, but targets can be in any order. The {@linkplain #ArcListASCIIGraph(InputStream, int) constructor}
+ * provides an additional parameter, called <em>shift</em>, which will be added to
+ * all node indices. The default is 0, but for lists that number nodes starting from 1
+ * it can be set to -1. Actually, the class {@link ShiftedByOneArcListASCIIGraph} can be used in place
+ * of this class for setting the shift to -1 without specifying additional parameters.
+ *
+ * <P>Contrarily to other classes, the load methods of this class <strong>do not always return instances of this class</strong>.
+ * In particular, {@link #loadOnce(InputStream)} <em>will</em> return an instance of this class for
+ * read-once access. The instance will not provide offline or random access, but read-once access will be backed by
+ * the original input stream and only the successors of a single node will be loaded in core memory at any time.
+ *
+ * <p>The {@link #load(CharSequence)} method, on the other hand, will return an instance of
+ * {@link it.unimi.dsi.webgraph.ArrayListMutableGraph} built by copying an offline instance of this class.
+ *
+ * <h2>Using {@link ArcListASCIIGraph} to convert your data</h2>
+ *
+ * <p>A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by arc lists. Suppose you
+ * create the following file, named <code>example.arcs</code>:
+ * <pre>
+ * 0 1
+ * 1 2
+ * 2 1
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -g ArcListASCIIGraph example.arcs bvexample
+ * </pre>
+ * will produce a compressed graph in {@link it.unimi.dsi.webgraph.BVGraph} format
+ * with basename <code>bvexample</code>. Even more convenient, and extremely
+ * more efficient, is the {@link #loadOnce(InputStream)}
+ * method, which reads from an input stream an arc-list ASCII graph and exposes it for a single traversal. It
+ * can be used, for instance, with the main method of {@link it.unimi.dsi.webgraph.BVGraph} to
+ * generate somehow an arc-list ASCII graph and store it in compressed form on the fly. The previous
+ * example could be then rewritten as
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -1 -g ArcListASCIIGraph dummy bvexample &lt;example.arcs
+ * </pre>
+ *
+ */
+
+
+public class ArcListASCIIGraph extends ImmutableSequentialGraph {
+ private final static boolean DEBUG = false;
+ private static final Logger LOGGER = LoggerFactory.getLogger(ArcListASCIIGraph.class);
+
+ /** Number of nodes. */
+ private int n;
+ /** A fast buffered reader containing the description of an ASCII graph (except for the number of nodes) for a read-once ASCII graph; <code>null</code>, otherwise. */
+ private final FastBufferedReader fbr;
+ /** The shift. All node numbers will be shifted by this value. */
+ private final int shift;
+
+ /** Creates a read-once arc-list ASCII graph. Instances created using this constructor can be
+ * only accessed using a single call to {@link #nodeIterator(int)}.
+ *
+ * @param is an input stream containing an arc-list ASCII graph.
+ */
+
+ public ArcListASCIIGraph(final InputStream is, final int shift) throws NumberFormatException, IOException {
+ this.shift = shift;
+ fbr = new FastBufferedReader(new InputStreamReader(is, "ASCII"));
+ n = -1;
+ }
+
+ @Override
+ public int numNodes() {
+ if (n == -1) throw new UnsupportedOperationException("The number of nodes is unknown (you need to complete a traversal)");
+ return n;
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ if (from < 0) throw new IllegalArgumentException();
+ try {
+ final StreamTokenizer st = new StreamTokenizer(fbr);
+ st.eolIsSignificant(true);
+ st.parseNumbers();
+
+ return new NodeIterator() {
+ /** The maximum node index we ever saw. */
+ int maxNodeSeen;
+ int following = -1;
+ int curr = -1;
+ boolean eof;
+
+ IntArrayList successors = new IntArrayList();
+
+ {
+ fillNextLine();
+ // ALERT: WRONG! This skips from lines, but does not skip up to node from!
+ for(int i = 0; i < from; i++) nextInt();
+ }
+
+
+ private void ensureNumberToken() {
+ if (st.ttype != StreamTokenizer.TT_NUMBER || st.nval != (int)st.nval) throw new IllegalArgumentException("Expected integer, found " + st.toString());
+ if ((int)st.nval + shift < 0) throw new IllegalArgumentException("Integer plus shift is negative: " + st.toString());
+ }
+
+ private void fillNextLine() throws IOException {
+ if (eof) return;
+ if (DEBUG) System.err.println("Filling next line (curr = " + curr + ", following = " + following +")");
+ successors.clear();
+ if (following == -1) {
+ while(st.nextToken() == StreamTokenizer.TT_EOL); // Skip empty lines
+ ensureNumberToken();
+ }
+ if (following > (int)st.nval + shift) throw new IllegalArgumentException("Source nodes must be sorted");
+ following = (int)st.nval + shift;
+ if (following > maxNodeSeen) maxNodeSeen = following;
+
+ if (DEBUG) System.err.println("New following node: " + following);
+ st.nextToken();
+ ensureNumberToken();
+ int successor = (int)st.nval + shift;
+ if (DEBUG) System.err.println("Adding successor " + successor);
+ successors.add(successor);
+ if (successor > maxNodeSeen) maxNodeSeen = successor;
+ st.nextToken();
+
+ for(;;) {
+ final int nextToken = st.nextToken();
+ if (nextToken == StreamTokenizer.TT_EOF) {
+ eof = true;
+ n = maxNodeSeen + 1;
+ break;
+ }
+ if (nextToken == StreamTokenizer.TT_EOL) continue; // Skip empty lines
+ ensureNumberToken();
+ if ((int)st.nval + shift != following) {
+ if (following > (int)st.nval + shift) throw new IllegalArgumentException("Source nodes must be sorted");
+ if (DEBUG) System.err.println("New source (" + (int)st.nval + "), breaking the loop...");
+ if ((int)st.nval + shift > maxNodeSeen) maxNodeSeen = (int)st.nval + shift;
+ break;
+ }
+ st.nextToken();
+ ensureNumberToken();
+ successor = (int)st.nval + shift;
+ if (DEBUG) System.err.println("Adding successor " + successor);
+ successors.add(successor);
+ if (successor > maxNodeSeen) maxNodeSeen = successor;
+ st.nextToken();
+ }
+
+ IntArrays.quickSort(successors.elements(), 0, successors.size());
+ }
+
+ @Override
+ public boolean hasNext() {
+ return curr < maxNodeSeen;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (curr == -1) throw new IllegalStateException();
+ return curr == following ? successors.elements() : IntArrays.EMPTY_ARRAY;
+ }
+
+ @Override
+ public final int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ if (++curr > following) try {
+ fillNextLine();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return curr;
+ }
+
+ @Override
+ public int outdegree() {
+ if (curr == -1) throw new IllegalStateException();
+ return curr == following ? successors.size() : 0;
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+
+ };
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public NodeIterator[] splitNodeIterators(final int howMany) {
+ NodeIterator[] result = new NodeIterator[howMany];
+ result[0] = nodeIterator();
+ Arrays.fill(result, 1, result.length, NodeIterator.EMPTY);
+ return result;
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ArcListASCIIGraph loadOnce(final InputStream is) throws IOException {
+ return new ArcListASCIIGraph(is, 0);
+ }
+
+ public static ArcListASCIIGraph loadOnce(final InputStream is, final int shift) throws IOException {
+ return new ArcListASCIIGraph(is, shift);
+ }
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(basename, null);
+ }
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger unused) throws IOException {
+ return new ArrayListMutableGraph(loadOnce(new FastBufferedInputStream(new FileInputStream(basename.toString())))).immutableView();
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename, @SuppressWarnings("unused") ProgressLogger unused) throws IOException {
+ store(graph, basename);
+ }
+
+
+ public static void store(final ImmutableGraph graph, final CharSequence basename) throws IOException {
+ store(graph, basename, 0);
+ }
+
+ /** Stores an arc-list ASCII graph with a given shift.
+ *
+ * @param graph a graph to be stored.
+ * @param basename the name of the output file.
+ * @param shift a shift that will be added to each node; note that is the <em>opposite</em> of the shift that will
+ * have to be used to load the generated file.
+ */
+
+ public static void store(final ImmutableGraph graph, final CharSequence basename, final int shift) throws IOException {
+ final PrintStream ps = new PrintStream(new FastBufferedOutputStream(new FileOutputStream(basename.toString())), false, Charsets.US_ASCII.toString());
+ int d, s;
+ int[] successor;
+ for (NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ s = nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ successor = nodeIterator.successorArray();
+ for(int i = 0; i < d; i++) ps.println((s + shift) + "\t" + (successor[i] + shift));
+ }
+ ps.close();
+ }
+
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException {
+ String sourceBasename, destBasename;
+ Class<?> graphClass;
+
+ SimpleJSAP jsap = new SimpleJSAP(ArcListASCIIGraph.class.getName(), "Reads a graph with a given basename and writes it out in ASCII format with another basename",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph"),
+ new FlaggedOption("shift", JSAP.INTEGER_PARSER, null, JSAP.NOT_REQUIRED, 'S', "shift", "A shift that will be added to each node index."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph"),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph"),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ graphClass = jsapResult.getClass("graphClass");
+ sourceBasename = jsapResult.getString("sourceBasename");
+ destBasename = jsapResult.getString("destBasename");
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ final ImmutableGraph graph = graphClass != null
+ ? (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class, ProgressLogger.class).invoke(null, sourceBasename, pl)
+ : ImmutableGraph.loadOffline(sourceBasename, pl);
+ if (jsapResult.userSpecified("shift")) ArcListASCIIGraph.store(graph, destBasename, jsapResult.getInt("shift"));
+ else ArcListASCIIGraph.store(graph, destBasename);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ArrayListMutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ArrayListMutableGraph.java
new file mode 100644
index 0000000..581350f
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ArrayListMutableGraph.java
@@ -0,0 +1,408 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2006-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.fastutil.objects.ObjectArrays;
+import it.unimi.dsi.lang.MutableString;
+
+import java.util.ConcurrentModificationException;
+
+/** A very simple mutable graph class based on {@link it.unimi.dsi.fastutil.ints.IntArrayList}s.
+ *
+ * <p>When creating examples for test cases or everyday usage, this class offers practical constructors.
+ * For instance, a 3-cycle is easily built as
+ * <pre>
+ * new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 }, { 2, 0 } })
+ * </pre>
+ *
+ * <p>Moreover, methods like {@link #addNodes(int)} and {@link #addArc(int, int)} allow to change
+ * the graph structure after construction, and several static factory methods provides ready-made
+ * common graphs (see, e.g., {@link #newCompleteBinaryIntree(int)}).
+ *
+ * <p>A mutable graph is <em>not</em> an {@link it.unimi.dsi.webgraph.ImmutableGraph}. However,
+ * it is possible to obtain an {@linkplain #immutableView() immutable view} of a mutable graph.
+ * The view is valid until the exposed mutable graph is modified. A modification counter is used
+ * to cause a <em>fail-fast</em> behaviour in case the immutable view is used after modifications.
+ *
+ * <p><strong>Warning</strong>: obtaining a {@link it.unimi.dsi.webgraph.NodeIterator} and using it
+ * while modifying the graph will lead to unpredictable results.
+ */
+
+public class ArrayListMutableGraph {
+ /** Current number of nodes. */
+ protected int n;
+ /** Current number of arcs. */
+ protected long m;
+ /** Current list of successor lists. The backing array might be longer than {@link #n}. */
+ protected IntArrayList successors[];
+
+ private final static IntArrayList[] EMPTY_INTARRAYLIST_ARRAY = {};
+
+ /** Guarantees that a node index is valid.
+ *
+ * @param x a node index.
+ */
+ protected void ensureNode(final int x) {
+ if (x < 0) throw new IllegalArgumentException("Illegal node index " + x);
+ if (x >= n) throw new IllegalArgumentException("Node index " + x + " is larger than graph order (" + n + ")");
+ }
+
+ /** Creates a new empty mutable graph. */
+ public ArrayListMutableGraph() {
+ successors = EMPTY_INTARRAYLIST_ARRAY;
+ }
+
+ /** Creates a new disconnected mutable graph with specified number of nodes.
+ * @param numNodes the number of nodes in the graph.
+ */
+ public ArrayListMutableGraph(final int numNodes) {
+ n = numNodes;
+ successors = new IntArrayList[n];
+ for(int i = n; i-- != 0;) successors[i] = new IntArrayList();
+ }
+
+ /** Creates a new mutable graph using a given number of nodes and a given list of arcs.
+ *
+ * @param numNodes the number of nodes in the graph.
+ * @param arc an array of arrays of length 2, specifying the arcs; no sanity checks are performed..
+ */
+ public ArrayListMutableGraph(final int numNodes, final int[][] arc) {
+ this(numNodes);
+ m = arc.length;
+ // Sanitize
+ for(int i = arc.length; i-- != 0;) {
+ if (arc[i].length != 2) throw new IllegalArgumentException("The arc of index " + i + " has length " + arc[i].length);
+ if (arc[i][0] < 0 || arc[i][1] < 0 || arc[i][0] >= numNodes || arc[i][1] >= numNodes) throw new IllegalArgumentException("The arc of index " + i + " (" + arc[i][0] + ", " + arc[i][1] + ") is illegal");
+ }
+ for(int i = 0; i < arc.length; i++) successors[arc[i][0]].add(arc[i][1]);
+ }
+
+ /** Creates a new mutable graph copying a given immutable graph.
+ *
+ * <p>This method will not invoke {@link ImmutableGraph#numNodes()}, but rather just create a {@link NodeIterator} and exhaust it.
+ *
+ * @param g an immutable graph.
+ */
+ public ArrayListMutableGraph(final ImmutableGraph g) {
+ this();
+ int d, s = -1;
+ long numArcs = 0;
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ s = nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ numArcs += d;
+ successors = ObjectArrays.grow(successors, s + 1);
+ successors[s] = new IntArrayList(nodeIterator.successorArray(), 0, d);
+ }
+ n = s + 1;
+ m = numArcs;
+ }
+
+ /** Creates a new mutable graph using a given number of nodes and a given arc filter.
+ *
+ * @param numNodes the number of nodes in the graph.
+ * @param arcFilter an arc filter which will specify which arcs go into the graph.
+ */
+ public ArrayListMutableGraph(final int numNodes, final Transform.ArcFilter arcFilter) {
+ this(numNodes);
+ for(int i = n; i-- != 0;) {
+ for(int j = 0; j < n; j++)
+ if (arcFilter.accept(i, j)) {
+ successors[i].add(j);
+ m++;
+ }
+ }
+ }
+
+
+ /** Returns a new mutable graph containing a directed cycle.
+ *
+ * @param numNodes the number of nodes in the cycle.
+ */
+ public static ArrayListMutableGraph newDirectedCycle(final int numNodes) {
+ return new ArrayListMutableGraph(numNodes, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(final int i, final int j) {
+ return (i + 1) % numNodes == j;
+ }
+ });
+ }
+
+ /** Returns a new mutable graph containing a bidirectional cycle.
+ *
+ * @param numNodes the number of nodes in the cycle.
+ */
+ public static ArrayListMutableGraph newBidirectionalCycle(final int numNodes) {
+ return new ArrayListMutableGraph(numNodes, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(final int i, final int j) {
+ return (i + 1) % numNodes == j || (j + 1) % numNodes == i;
+ }
+ });
+ }
+
+ /** Returns a new mutable graph containing a complete graph.
+ *
+ * @param numNodes the number of nodes in the graph.
+ * @param loops true if you want loops, too.
+ */
+ public static ArrayListMutableGraph newCompleteGraph(final int numNodes, final boolean loops) {
+ return new ArrayListMutableGraph(numNodes, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(final int i, final int j) {
+ return i != j || loops;
+ }
+ });
+ }
+
+ /** Returns a new mutable graph containing a complete binary in-tree of given height.
+ *
+ * <strong>Warning</strong>: starting from version 1.7, the spurious loop
+ * at the root has been removed.
+ *
+ * @param height the height of the tree (0 for the root only).
+ */
+ public static ArrayListMutableGraph newCompleteBinaryIntree(final int height) {
+ return new ArrayListMutableGraph((1 << (height + 1)) - 1, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(final int i, final int j) {
+ return i != j && (i - 1) / 2 == j;
+ }
+ });
+ }
+
+ /** Returns a new mutable graph containing a complete binary out-tree of given height.
+ *
+ * <strong>Warning</strong>: starting from version 1.7, the spurious loop
+ * at the root has been removed.
+ *
+ * @param height the height of the tree (0 for the root only).
+ */
+ public static ArrayListMutableGraph newCompleteBinaryOuttree(final int height) {
+ return new ArrayListMutableGraph((1 << (height + 1)) - 1, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(final int i, final int j) {
+ return i != j && (j - 1) / 2 == i;
+ }
+ });
+ }
+
+ private static class ImmutableView extends ImmutableGraph {
+ /** Cached number of nodes. */
+ private final int n;
+ /** Cached number of arcs. */
+ private final long m;
+ /** Cached successors. */
+ private final IntArrayList[] successors;
+ /** A reference to the mutable graph we expose. */
+ private final ArrayListMutableGraph g;
+
+ public ImmutableView(final ArrayListMutableGraph g) {
+ this.g = g;
+ this.n = g.n;
+ this.m = g.m;
+ this.successors = g.successors;
+ }
+ @Override
+ public ImmutableView copy() { return this; };
+ private void ensureUnmodified() { if (g.modificationCount != g.lastModificationCount) throw new ConcurrentModificationException(); }
+ @Override
+ public int numNodes() { ensureUnmodified(); return n; }
+ @Override
+ public int outdegree(final int x) { ensureUnmodified(); return successors[x].size(); }
+ @Override
+ public long numArcs() { ensureUnmodified(); return m; }
+ @Override
+ public boolean randomAccess() { return true; }
+ @Override
+ public int[] successorArray(final int x) { ensureUnmodified(); return successors[x].toIntArray(); }
+ @Override
+ public LazyIntIterator successors(final int x) { ensureUnmodified(); return LazyIntIterators.lazy(successors[x].iterator()); }
+ }
+
+ /** A cached copy of the immutable view, if it has ever been requested. */
+ protected ImmutableView immutableView;
+ /** The current modification count. */
+ protected int modificationCount = 0;
+ /** The modification count at the last call to {@link #immutableView()}. */
+ protected int lastModificationCount = -1;
+
+ /** Returns an immutable view of this mutable graph.
+ *
+ * <P>The view can be used until this mutable graph is modified. Attempt to use
+ * the view after modifying this mutable graph will cause a {@link ConcurrentModificationException}.
+ * After modification, a new call to this method will return a new immutable view.
+ *
+ * @return an immutable view of this mutable graph.
+ */
+ public ImmutableGraph immutableView() {
+ if (modificationCount != lastModificationCount) {
+ for(int i = n; i-- != 0;) IntArrays.quickSort(successors[i].elements(), 0, successors[i].size());
+ immutableView = new ImmutableView(this);
+ }
+ lastModificationCount = modificationCount;
+ return immutableView;
+ }
+
+ public int numNodes() {
+ return n;
+ }
+
+ public int outdegree(final int x) {
+ ensureNode(x);
+ return successors[x].size();
+ }
+
+ public long numArcs() {
+ return m;
+ }
+
+ public int[] successorArray(final int x) {
+ ensureNode(x);
+ return successors[x].toIntArray();
+ }
+
+ public IntIterator successors(final int x) {
+ ensureNode(x);
+ return successors[x].iterator();
+ }
+
+ /** Adds the given number of nodes, numbering them from {@link #numNodes()} onwards. The new nodes have no successors.
+ *
+ * @param numNewNodes the number of new nodes.
+ */
+ public void addNodes(final int numNewNodes) {
+ if (numNewNodes != 0) {
+ modificationCount++;
+ final int newN = n + numNewNodes;
+ successors = ObjectArrays.ensureCapacity(successors, newN, n);
+ while(n < newN) successors[n++] = new IntArrayList();
+ }
+ }
+
+ /** Removes the given node. All arcs incident on the node are removed, too.
+ *
+ * @param x the node to be removed.
+ */
+ public void removeNode(final int x) {
+ ensureNode(x);
+ modificationCount++;
+ System.arraycopy(successors, x + 1, successors, x, --n - x);
+ int t;
+ for(int i = n; i-- != 0;)
+ for(int j = successors[i].size(); j-- != 0;) {
+ t = successors[i].getInt(j);
+ if (t == x) successors[i].removeInt(j);
+ else if (t > x) successors[i].set(j, t - 1);
+ }
+ }
+
+ /** Adds the given arc.
+ *
+ * @param x the start of the arc.
+ * @param y the end of the arc.
+ */
+ public void addArc(final int x, final int y) {
+ ensureNode(x);
+ ensureNode(y);
+ if (successors[x].indexOf(y) != -1) throw new IllegalArgumentException("Node " + y + " is already a successor of node " + x);
+ modificationCount++;
+ successors[x].add(y);
+ m++;
+ }
+
+ /** Removes the given arc.
+ *
+ * @param x the start of the arc.
+ * @param y the end of the arc.
+ */
+ public void removeArc(final int x, final int y) {
+ ensureNode(x);
+ ensureNode(y);
+ final int pos = successors[x].indexOf(y);
+ if (pos == -1) throw new IllegalArgumentException("Node " + y + " is not a successor of node " + x);
+ modificationCount++;
+ successors[x].removeInt(pos);
+ m--;
+ }
+
+ /** Compare this mutable graph to another object.
+ *
+ * @return true iff the given object is a mutable graph the same size, and
+ * the successor list of every node of this graph is equal to the successor list of the corresponding node of <code>o</code>.
+ */
+
+ @Override
+ public boolean equals(final Object o) {
+ if (! (o instanceof ArrayListMutableGraph)) return false;
+ final ArrayListMutableGraph g = (ArrayListMutableGraph) o;
+ int n = numNodes();
+ if (n != g.numNodes()) return false;
+ int[] s, t;
+ int d;
+ while(n-- != 0) {
+ if ((d = outdegree(n)) != g.outdegree(n)) return false;
+ s = successorArray(n);
+ t = g.successorArray(n);
+ while(d-- != 0) if (s[d] != t[d]) return false;
+ }
+
+ return true;
+ }
+
+ /** Returns a hash code for this mutable graph.
+ *
+ * @return a hash code for this mutable graph.
+ */
+
+ @Override
+ public int hashCode() {
+ int n = numNodes(), h = -1;
+ int[] s;
+ int d;
+ for(int i = 0; i < n; i++) {
+ h = h * 31 + i;
+ s = successorArray(i);
+ d = outdegree(i);
+ while(d-- != 0) h = h * 31 + s[d];
+ }
+
+ return h;
+ }
+
+ @Override
+ public String toString() {
+ MutableString ms = new MutableString();
+ IntIterator ii;
+
+ ms.append("Nodes: " + numNodes() + "\nArcs: " + numArcs() + "\n");
+ for (int i = 0; i < numNodes(); i++) {
+ ms.append("Successors of " + i + " (degree " + outdegree(i) + "):");
+ ii = successors(i);
+ while (ii.hasNext())
+ ms.append(" " + ii.nextInt());
+ ms.append("\n");
+ }
+ return ms.toString();
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/BVGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/BVGraph.java
new file mode 100644
index 0000000..395ef9d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/BVGraph.java
@@ -0,0 +1,2440 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.io.ByteBufferInputStream;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.NullOutputStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.InvocationTargetException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.math.RoundingMode;
+import java.nio.channels.FileChannel.MapMode;
+import java.text.DecimalFormat;
+import java.text.NumberFormat;
+import java.util.Arrays;
+import java.util.Formatter;
+import java.util.Locale;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorCompletionService;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Throwables;
+import com.google.common.util.concurrent.ThreadFactoryBuilder;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+/** An immutable graph represented using the techniques described in
+ * &ldquo;<a href="http://vigna.dsi.unimi.it/papers.php#BoVWFI">The WebGraph Framework I: Compression Techniques</a>&rdquo;, by Paolo Boldi and
+ * Sebastiano Vigna, in <i>Proc&#46; of the Thirteenth World&ndash;Wide Web
+ * Conference</i>, pages 595&minus;601, 2004, ACM Press.
+ *
+ * <P>This class provides a flexible and configurable way to store and
+ * access web graphs in a compressed form. Its main method can load an
+ * {@link ImmutableGraph} and compress it. The resulting compressed {@link
+ * BVGraph} is described by a <em>graph file</em> (with extension
+ * <code>.graph</code>), an <em>offset file</em> (with extension
+ * <code>.offsets</code>) and a <em>property file</em> (with extension
+ * <code>.properties</code>). The latter, not surprisingly, is a Java property file.
+ * Optionally, an <em>offset big-list file</em> (with extension
+ * <code>.obl</code>) can be created to load graphs faster.
+ *
+ * <p>As a rule of thumb, random access is faster using {@link #successors(int)}, whereas
+ * while iterating using a {@link NodeIterator} it is better to use {@link NodeIterator#successorArray()}.
+ *
+ * <h2>Parallel compression</h2>
+ *
+ * <p>Starting with version 3.5.0, this classes uses {@link ImmutableGraph#splitNodeIterators(int)} to compress
+ * a graph using parallel compression threads. The number of parallel threads can be set at construction time, or
+ * using the property {@value it.unimi.dsi.webgraph.ImmutableGraph#NUMBER_OF_THREADS_PROPERTY} from the command line; this approach
+ * is useful with classes such as {@link Transform}.
+ *
+ * <p>Parallel compression requires the creation of possibly large temporary files. It might be necessary
+ * to set the property {@code java.io.tmpdir} to a suitable directory if you experience disk-full errors during compression.
+ *
+ * <h2>The Graph File</h2>
+ *
+ * <P>This class stores a graph as an <a href="http://dsiutils.dsi.unimi.it/docs/it/unimi/dsi/io/InputBitStream.html">bit stream</a>. The bit stream format
+ * depends on a number of parameters and encodings that can be mixed
+ * orthogonally. The parameters are:
+ *
+ * <ul>
+ *
+ * <li>the <em>window size</em>, a nonnegative integer;
+ * <li>the <em>maximum reference count</em>, a positive integer (it is meaningful only when the window is nonzero);
+ * <li>the <em>minimum interval length</em>, an integer larger than or equal to two, or 0, which is interpreted as infinity.
+ *
+ * </ul>
+ *
+ * <H3>Successor Lists</H3>
+ *
+ * <P>The graph file is a sequence of successor lists, one for each node.
+ * The list of node <var>x</var> can be thought of as a sequence of natural numbers (even though, as we will
+ * explain later, this sequence is further coded suitably as a sequence of bits):
+ * <OL STYLE="list-style-type: lower-alpha">
+ * <LI>The outdegree of the node; if it is zero, the list ends here.
+ * <LI>If the window size is not zero, the <em>reference part</em>, that is:
+ * <OL><LI>a nonnegative integer, the <em>reference</em>, which never exceeds the window size; if the reference
+ * is <var>r</var>, the list of successors will be specified as a modified version of the list of successors
+ * of <var>x</var>&minus;<var>r</var>; if <var>r</var> is 0, then the list of successors will be specified
+ * explicitly;
+ * <LI>if <var>r</var> is nonzero:
+ * <OL STYLE="list-style-type: lower-roman">
+ * <LI>a natural number <var>b</var>, the <em>block count</em>;
+ * <LI>a sequence of <var>b</var> natural numbers <var>B</var><sub>1</sub>, &hellip;, <var>B</var><sub>b</sub>, called the <em>copy-block list</em>; only the
+ * first number can be zero.
+ * </OL>
+ *
+ * </OL>
+ * <LI>Then comes the <em>extra part</em>, specifying some more entries that the list of successors contains (or all of them, if
+ * <var>r</var> is zero), that is:
+ * <OL>
+ * <LI>If the minimum interval length is finite,
+ * <OL STYLE="list-style-type: lower-roman">
+ * <LI>an integer <var>i</var>, the <em>interval count</em>;
+ * <LI>a sequence of <var>i</var> pairs, whose first component is the left extreme of an interval,
+ * and whose second component is the length of the interval (i.e., the number of integers contained in the interval).
+ * </OL>
+ * <li>Finally, the list of <em>residuals</em>, which contain all successors not specified by previous methods.
+ * </OL>
+ * </OL>
+ *
+ * <P>The above data should be interpreted as follows:
+ * <ul>
+ * <li>The reference part, if present (i.e., if both the window size and the reference are strictly positive), specifies
+ * that part of the list of successors of node <var>x</var>&minus;<var>r</var> should be copied; the successors of
+ * node <var>x</var>&minus;<var>r</var> that should be copied are described in the copy-block list; more precisely, one should copy
+ * the first <var>B</var><sub>1</sub> entries of this list, discard the next <var>B</var><sub>2</sub>, copy
+ * the next <var>B</var><sub>3</sub> etc. (the last remaining elements of the list of successors will be copied if <var>b</var> is
+ * even, and discarded if <var>b</var> is odd).
+ * <li>The extra part specifies additional successors (or all of them, if the reference part is absent); the extra part is not present
+ * if the number of successors that are to be copied according to the reference part already coincides with the outdegree of <var>x</var>;
+ * the successors listed in the extra part are given in two forms:
+ * <ul>
+ * <li>some of them are specified as belonging to (integer) intervals, if the minimum interval length is finite;
+ * the interval count indicates how many intervals,
+ * and the intervals themselves are listed as pairs (left extreme, length);
+ * <li>the residuals are the remaining "scattered" successors.
+ * </ul>
+ * </ul>
+ *
+ *
+ * <H3>How Successor Lists Are Coded</H3>
+ *
+ * <P>As we said before, the list of integers corresponding to each successor list should be coded into a sequence of bits.
+ * This is (ideally) done in two phases: we first modify the sequence in a suitable manner (as explained below) so to obtain
+ * another sequence of integers (some of them might be negative). Then each single integer is coded, using a coding that can
+ * be specified as an option; the integers that may be negative are first turned into natural numbers using {@link Fast#int2nat(int)}.
+ *
+ * <OL>
+ * <LI>The outdegree of the node is left unchanged, as well as the reference and the block count;
+ * <LI>all blocks are decremented by 1, except for the first one;
+ * <LI>the interval count is left unchanged;
+ * <LI>all interval lengths are decremented by the minimum interval length;
+ * <LI>the first left extreme is expressed as its difference from <var>x</var> (it will be negative if the first extreme is
+ * less than <var>x</var>); the remaining left extremes are expressed as their distance from the previous right extreme
+ * plus 2 (e.g., if the interval is [5..11] and the previous one was [1..3], then the left extreme 5 is expressed as
+ * 5-(3+2)=5-5=0);
+ * <LI>the first residual is expressed as its difference from <var>x</var> (it will be negative if the first residual is
+ * less than <var>x</var>); the remaining residuals are expressed as decremented differences from the previous residual.
+ * </OL>
+ *
+ * <H2>The Offset File</H2>
+ *
+ * <P>Since the graph is stored as a bit stream, we must have some way to know where each successor list starts.
+ * This information is stored in the offset file, which contains the bit offset of each successor list (in particular,
+ * the offset of the first successor list will be zero). As a commodity, the offset file contains an additional
+ * offset pointing just after the last successor list (providing, as a side-effect, the actual bit length of the graph file).
+ * Each offset (except for the first one) is stored as a suitably coded difference from the previous offset.
+ *
+ * <p>The list of offsets can be additionally stored as a serialised {@link EliasFanoMonotoneLongBigList}
+ * using a suitable command-line option. If the serialised big list is detected, it is loaded instead of parsing the offset list.
+ *
+ * <H2>The Property File</H2>
+ *
+ * <P>This file contains self-explaining entries that are necessary to correctly decode the graph and offset files, and
+ * moreover give some statistical information about the compressed graph (e.g., the number of bits per link).
+ * <dl>
+ * <dt><code>nodes</code>
+ * <dd>the number of nodes of the graph.
+ * <dt><code>nodes</code>
+ * <dd>the number of arcs of the graph.
+ * <dt><code>version</code>
+ * <dd>a version number.
+ * <dt><code>graphclass</code>
+ * <dd>the name of the class that should load this graph ({@link ImmutableGraph} convention).
+ * <dt><code>bitsperlink</code>
+ * <dd>the number of bits per link (overall graph size in bits divided by the number of arcs).
+ * <dt><code>bitspernode</code>
+ * <dd>the number of bits per node (overall graph size in bits divided by the number of nodes).
+ * <dt><code>compratio</code>
+ * <dd>the ratio between the graph size and the information-theoretical lower bound (the binary logarithm of the number of subsets of size <code>arcs</code> out of a universe of <code>nodes</code><sup>2</sup> elements).
+ * <dt><code>compressionflags</code>
+ * <dd>flags specifying the codes used for the components of the compression algorithm.
+ * <dt><code>zetak</code>
+ * <dd>if &zeta; codes are selected for residuals, the parameter <var>k</var>.
+ * <dt><code>windowsize</code>
+ * <dd>the window size.
+ * <dt><code>maxref</code>
+ * <dd>the maximum reference count.
+ * <dt><code>minintervallength</code>
+ * <dd>the minimum length of an interval.
+ * <dt><code>avgdist</code>
+ * <dd>the average distance of a reference.
+ * <dt><code>avgref</code>
+ * <dd>the average length of reference chains.
+ * <dt><code>bitsfor*</code>
+ * <dd>number of bits used by a specific compoenent of the algorithm (the sum is the number of bits used to store the graph).
+ * <dt><code>avgbitsfor*</code>
+ * <dd>number of bits used by a specific compoenent of the algorithm, divided by the number of nodes (the sum is the number of bits per node).
+ * <dt><code>*arcs</code>
+ * <dd>the number of arcs stored by each component of the algorithm (the sum is the number of arcs).
+ * <dt><code>*expstats</code>
+ * <dd>frequencies of the floor of the logarithm of successor gaps and residual gaps, separated by a comma; the statistics include the gap between each node
+ * and its first successor, after it has been passed through {@link Fast#int2nat(int)}, but discarding zeroes (which happen in
+ * very rare circumstance, and should be considered immaterial).
+ * <dt><code>*avg[log]gap</code>
+ * <dd>the average of the gaps (or of their logarithm) of successors and residuals: note that this data is computed from the exponential statistics above, and
+ * thus it is necessarily approximate.
+ * <dd>
+ * </dl>
+ *
+ * <H2>How The Graph File Is Loaded Into Memory</H2>
+ *
+ * <P>The natural way of using a graph file is to load it into a byte array and
+ * then index its bits using the suitable offset. This class will use a byte
+ * array for graphs smaller than {@link Integer#MAX_VALUE} bytes,
+ * and a {@link it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream}
+ * otherwise: in the latter case, expect a significant slowdown (as
+ * an {@link it.unimi.dsi.io.InputBitStream} can wrap directly
+ * a byte array).
+ *
+ * <P>Offsets are loaded using an {@link EliasFanoMonotoneLongBigList},
+ * which occupies exponentially less space than the graph itself (unless
+ * your graph is pathologically sparse). There is of course a cost involved in
+ * accessing the list with respect to accessing an array of longs.
+ *
+ * <p>Note that by default the {@link EliasFanoMonotoneLongBigList} instance is
+ * created from scratch using the file of offsets. This is a long and tedious
+ * process, in particular with large graphs. The main method of this class
+ * has an option that will generate such a list once for all and serialise it in a file with
+ * extension <code>.obl</code>. The list will be quickly deserialised
+ * if its modification date is later than that of the offset file.
+ *
+ * <H2>Not Loading the Graph File at All</H2>
+ *
+ * <P>For some applications (such as transposing a graph) it is not necessary to load the graph
+ * file in memory. Since this class is able to enumerate the links of a graph without using random
+ * access, it is possible not to load in memory any information at all, and obtain iterators that
+ * directly read from the graph file. To obtain this effect, you must call {@link #loadOffline(CharSequence)}.
+ *
+ * <H2>Memory&ndash;Mapping a Graph</H2>
+ *
+ * <p>Another interesting alternative is memory mapping. When using {@link BVGraph#loadMapped(CharSequence)},
+ * the graph will be mapped into memory, and the offsets loaded. The graph will provide random access and behave
+ * as if it was loaded into memory, but of course the access will be slower.
+ */
+
+@SuppressWarnings("resource")
+public class BVGraph extends ImmutableGraph implements CompressionFlags {
+ private static final Logger LOGGER = LoggerFactory.getLogger(BVGraph.class);
+ /** The offset step parameter corresponding to sequential load. */
+ public static final int SEQUENTIAL = 0;
+ /** The offset step parameter corresponding to offline load. */
+ public static final int OFFLINE = -1;
+
+ /** The standard extension for the graph bit stream. */
+ public static final String GRAPH_EXTENSION = ".graph";
+ /** The standard extension for the graph-offsets bit stream. */
+ public static final String OFFSETS_EXTENSION = ".offsets";
+ /** The standard extension for the cached {@link LongBigList} containing the graph offsets. */
+ public static final String OFFSETS_BIG_LIST_EXTENSION = ".obl";
+ /** The standard extension for the stream of node outdegrees. */
+ public static final String OUTDEGREES_EXTENSION = ".outdegrees";
+ /** The buffer size we use for most operations. */
+ private static final int STD_BUFFER_SIZE = 1024 * 1024;
+ /** The buffer size we use when writing from multiple threads. */
+ private static final int MULTITHREAD_BUFFER_SIZE = 16 * 1024 * 1024;
+
+ /** This number classifies the present graph format. When new features require introducing binary incompatibilities,
+ this number is bumped so to ensure that old classes do not try to read graphs they cannot understand. */
+ public final static int BVGRAPH_VERSION = 0;
+
+ /** The initial length of an array that will contain a successor list. */
+ protected static final int INITIAL_SUCCESSOR_LIST_LENGTH = 1024;
+
+ /** A special value for {@link #minIntervalLength} interpreted as meaning that the minimum interval length is infinity. */
+ public static final int NO_INTERVALS = 0;
+
+ /** The basename of the graph. This may be <code>null</code>, but trying to load the graph with an offset
+ * step of -1 will cause an exception. */
+ protected CharSequence basename;
+
+ /** The number of nodes of the graph. */
+ protected int n;
+
+ /** The number of arcs of the graph. */
+ protected long m;
+
+ /** When {@link #offsetType} is not -1, whether this graph is directly loaded into
+ * {@link #graphMemory}, or rather wrapped in a {@link it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream}
+ * specified by {@link #graphStream}. */
+ protected boolean isMemory;
+
+ /** When {@link #offsetType} is not -1, whether this graph is directly loaded into
+ * {@link #graphMemory}, or rather memory-mapped. */
+ protected boolean isMapped;
+
+ /** The byte array storing the compressed graph, if {@link #isMemory} is true and {@link #offsetType} is not -1.
+ *
+ * <P>This variable is loaded with a copy of the graph file, or with
+ * a rearrangement of the latter, depending on whether {@link #offsetType} is smaller than or equal to one. If
+ * {@link #offsetType} is -1, this variable is <code>null</code>, and node iterators are generated by opening
+ * streams directly on the graph file. */
+ protected byte graphMemory[];
+
+ /** The multi-byte array input stream storing the compressed graph, if {@link #isMemory} is false, {@link #isMapped} is false and {@link #offsetType} is not -1.
+ *
+ * <P>It is loaded with a copy of the graph file. If
+ * {@link #offsetType} is -1, this variable is <code>null</code>, and node iterators are generated by opening
+ * streams directly on the graph file. */
+ protected FastMultiByteArrayInputStream graphStream;
+
+ /** The memory-mapped input stream storing the compressed graph, if {@link #isMapped} is true.
+ *
+ * <P>It is loaded with a copy of the graph file. If
+ * {@link #offsetType} is -1, this variable is <code>null</code>, and node iterators are generated by opening
+ * streams directly on the graph file. */
+ protected ByteBufferInputStream mappedGraphStream;
+
+ /** This variable is <code>null</code> iff {@link #offsetType} is zero or less
+ * (implying that offsets have not been loaded). Otherwise, it is an
+ * Elias&ndash;Fano monotone list containing the pointers of
+ * the bit streams of one each {@link #offsetType} nodes. */
+ protected LongBigList offsets;
+
+ /** The offset type: 2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file. */
+ protected int offsetType;
+
+ /** If not -1, the node whose degree is cached in {@link #cachedOutdegree}. */
+ protected int cachedNode = Integer.MIN_VALUE;
+ /** If {@link #cachedNode} is not {@link Integer#MIN_VALUE}, its cached outdegree. */
+ protected int cachedOutdegree;
+ /** If {@link #cachedNode} is not {@link Integer#MIN_VALUE}, the position immediately after the coding of the outdegree of {@link #cachedNode}. */
+ protected long cachedPointer;
+
+ /** The maximum reference count. */
+ protected int maxRefCount = DEFAULT_MAX_REF_COUNT;
+
+ /** Default backward reference maximum length. */
+ public final static int DEFAULT_MAX_REF_COUNT = 3;
+
+ /** The window size. Zero means no references. */
+ protected int windowSize = DEFAULT_WINDOW_SIZE;
+
+ /** Default window size. */
+ public final static int DEFAULT_WINDOW_SIZE = 7;
+
+ /** The minimum interval length. */
+ protected int minIntervalLength = DEFAULT_MIN_INTERVAL_LENGTH;
+
+ /** Default minimum interval length. */
+ public final static int DEFAULT_MIN_INTERVAL_LENGTH = 4;
+
+ /** The value of <var>k</var> for &zeta;<sub><var>k</var></sub> coding (for residuals). */
+ protected int zetaK = DEFAULT_ZETA_K;
+
+ /** Default value of <var>k</var>. */
+ public final static int DEFAULT_ZETA_K = 3;
+
+ /** Flag: write outdegrees using &gamma; coding (default). */
+ public static final int OUTDEGREES_GAMMA = GAMMA;
+
+ /** Flag: write outdegrees using &delta; coding. */
+ public static final int OUTDEGREES_DELTA = DELTA;
+
+ /** Flag: write copy-block lists using &gamma; coding (default). */
+ public static final int BLOCKS_GAMMA = GAMMA << 4;
+
+ /** Flag: write copy-block lists using &delta; coding. */
+ public static final int BLOCKS_DELTA = DELTA << 4;
+
+ /** Flag: write residuals using &gamma; coding. */
+ public static final int RESIDUALS_GAMMA = GAMMA << 8;
+
+ /** Flag: write residuals using &zeta;<sub><var>k</var></sub> coding (default). */
+ public static final int RESIDUALS_ZETA = ZETA << 8;
+
+ /** Flag: write residuals using &delta; coding. */
+ public static final int RESIDUALS_DELTA = DELTA << 8;
+
+ /** Flag: write residuals using variable-length nibble coding. */
+ public static final int RESIDUALS_NIBBLE = NIBBLE << 8;
+
+ /** Flag: write residuals using Golomb coding. */
+ public static final int RESIDUALS_GOLOMB = GOLOMB << 8;
+
+ /** Flag: write references using &gamma; coding. */
+ public static final int REFERENCES_GAMMA = GAMMA << 12;
+
+ /** Flag: write references using &delta; coding. */
+ public static final int REFERENCES_DELTA = DELTA << 12;
+
+ /** Flag: write references using unary coding (default). */
+ public static final int REFERENCES_UNARY = UNARY << 12;
+
+ /** Flag: write block counts using &gamma; coding (default). */
+ public static final int BLOCK_COUNT_GAMMA = GAMMA << 16;
+
+ /** Flag: write block counts using &delta; coding. */
+ public static final int BLOCK_COUNT_DELTA = DELTA << 16;
+
+ /** Flag: write block counts using unary coding. */
+ public static final int BLOCK_COUNT_UNARY = UNARY << 16;
+
+ /** Flag: write offsets using &gamma; coding (default). */
+ public static final int OFFSETS_GAMMA = GAMMA << 20;
+
+ /** Flag: write offsets using &delta; coding. */
+ public static final int OFFSETS_DELTA = DELTA << 20;
+
+ /** The coding for outdegrees. By default, we use &gamma; coding. */
+ protected int outdegreeCoding = GAMMA;
+
+ /** The coding for copy-block lists. By default, we use &gamma; coding. */
+ protected int blockCoding = GAMMA;
+
+ /** The coding for residuals. By default, we use &zeta; coding. */
+ protected int residualCoding = ZETA;
+
+ /** The coding for references. By default, we use unary coding. */
+ protected int referenceCoding = UNARY;
+
+ /** The coding for block counts. By default, we use &gamma; coding. */
+ protected int blockCountCoding = GAMMA;
+
+ /** The coding for offsets. By default, we use &gamma; coding. */
+ protected int offsetCoding = GAMMA;
+
+ /** The compression flags used. */
+ private int flags = 0;
+
+ private final static boolean STATS = false;
+ @SuppressWarnings("unused")
+ private final static boolean DEBUG = false;
+ private final static boolean ASSERTS = false;
+
+ private PrintWriter offsetStats, outdegreeStats, blockCountStats, blockStats, intervalCountStats, referenceStats, leftStats, lenStats, residualStats, residualCountStats;
+
+ @Override
+ public BVGraph copy() {
+ final BVGraph result = new BVGraph();
+ result.basename = basename;
+ result.n = n;
+ result.m = m;
+ result.isMemory = isMemory;
+ result.isMapped = isMapped;
+ result.graphMemory = graphMemory;
+ result.graphStream = graphStream != null ? new FastMultiByteArrayInputStream(graphStream) : null;
+ result.mappedGraphStream = mappedGraphStream != null ? mappedGraphStream.copy() : null;
+ result.offsets = offsets;
+ result.maxRefCount = maxRefCount;
+ result.windowSize = windowSize;
+ result.minIntervalLength = minIntervalLength;
+ result.offsetType = offsetType;
+ result.zetaK = zetaK;
+ result.outdegreeCoding = outdegreeCoding;
+ result.blockCoding = blockCoding;
+ result.residualCoding = residualCoding;
+ result.referenceCoding = referenceCoding;
+ result.blockCountCoding = blockCountCoding;
+ result.offsetCoding = offsetCoding;
+ result.flags = flags;
+ result.outdegreeIbs = offsetType <= 0 ? null : isMemory ? new InputBitStream(graphMemory): new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0);
+ return result;
+ }
+
+ protected BVGraph() {}
+
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public long numArcs() {
+ return m;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return offsets != null;
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return true;
+ }
+
+ @Override
+ public CharSequence basename() {
+ return basename;
+ }
+
+ /** Returns the maximum reference count of this graph.
+ *
+ * @return the maximum reference count.
+ */
+ public int maxRefCount() {
+ return maxRefCount;
+ }
+
+ /** Returns the window size of this graph.
+ *
+ * @return the window size.
+ */
+ public int windowSize() {
+ return windowSize;
+ }
+
+ /* This family of protected methods is used throughout the class to read data
+ from the graph file following the codings indicated by the compression
+ flags. */
+
+ /** Reads an offset difference from the given stream.
+ *
+ * @param ibs an offset-file input bit stream.
+ * @return the next offset difference.
+ */
+ protected final long readOffset(final InputBitStream ibs) throws IOException {
+ switch(offsetCoding) {
+ case GAMMA: return ibs.readLongGamma();
+ case DELTA: return ibs.readLongDelta();
+ default: throw new UnsupportedOperationException("The required offset coding (" + offsetCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes an offset difference to the given stream.
+ *
+ * @param obs an offset-file output bit stream.
+ * @param x an offset difference to be stored in the stream.
+ * @return the number of bits written.
+ */
+ protected final int writeOffset(final OutputBitStream obs, final long x) throws IOException {
+ switch(offsetCoding) {
+ case GAMMA: return obs.writeLongGamma(x);
+ case DELTA: return obs.writeLongDelta(x);
+ default: throw new UnsupportedOperationException("The required offset coding (" + offsetCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads an outdegree from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next outdegree.
+ */
+ protected final int readOutdegree(final InputBitStream ibs) throws IOException {
+ switch(outdegreeCoding) {
+ case GAMMA: return ibs.readGamma();
+ case DELTA: return ibs.readDelta();
+ default: throw new UnsupportedOperationException("The required outdegree coding (" + outdegreeCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads an outdegree from the given stream at a given offset.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @param offset the offset at which the stream must be positioned.
+ * @return the next outdegree.
+ */
+ protected final int readOutdegree(final InputBitStream ibs, final long offset) throws IOException {
+ ibs.position(offset);
+ return readOutdegree(ibs);
+ }
+
+ /** Writes an outdegree to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param d an outdegree to be stored in the stream.
+ * @return the number of bits written.
+ */
+ protected final int writeOutdegree(final OutputBitStream obs, final int d) throws IOException {
+ switch(outdegreeCoding) {
+ case GAMMA: return obs.writeGamma(d);
+ case DELTA: return obs.writeDelta(d);
+ default: throw new UnsupportedOperationException("The required outdegree coding (" + outdegreeCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads a reference from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next reference.
+ */
+ protected final int readReference(final InputBitStream ibs) throws IOException {
+ final int ref;
+
+ switch(referenceCoding) {
+ case UNARY: ref = ibs.readUnary(); break;
+ case GAMMA: ref = ibs.readGamma(); break;
+ case DELTA: ref = ibs.readDelta(); break;
+ default: throw new UnsupportedOperationException("The required reference coding (" + referenceCoding + ") is not supported.");
+ }
+ if (ref > windowSize) throw new IllegalStateException("The required reference (" + ref + ") is incompatible with the window size (" + windowSize + ")");
+ return ref;
+ }
+
+ /** Writes a reference to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param ref the reference.
+ * @return the number of bits written.
+ */
+ protected final int writeReference(final OutputBitStream obs, final int ref) throws IOException {
+
+ if (ref > windowSize) throw new IllegalStateException("The required reference (" + ref + ") is incompatible with the window size (" + windowSize + ")");
+ switch(referenceCoding) {
+ case UNARY: return obs.writeUnary(ref);
+ case GAMMA: return obs.writeGamma(ref);
+ case DELTA: return obs.writeDelta(ref);
+ default: throw new UnsupportedOperationException("The required reference coding (" + referenceCoding + ") is not supported.");
+ }
+ }
+
+
+ /** Reads a block count from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next block count.
+ */
+ protected final int readBlockCount(final InputBitStream ibs) throws IOException {
+ switch(blockCountCoding) {
+ case UNARY: return ibs.readUnary();
+ case GAMMA: return ibs.readGamma();
+ case DELTA: return ibs.readDelta();
+ default: throw new UnsupportedOperationException("The required block count coding (" + blockCountCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a block count to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param count the block count.
+ * @return the number of written bits.
+ */
+ protected final int writeBlockCount(final OutputBitStream obs, final int count) throws IOException {
+ switch(blockCountCoding) {
+ case UNARY: return obs.writeUnary(count);
+ case GAMMA: return obs.writeGamma(count);
+ case DELTA: return obs.writeDelta(count);
+ default: throw new UnsupportedOperationException("The required block count coding (" + blockCountCoding + ") is not supported.");
+ }
+ }
+
+
+ /** Reads a block from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next block.
+ */
+ protected final int readBlock(final InputBitStream ibs) throws IOException {
+ switch(blockCoding) {
+ case UNARY: return ibs.readUnary();
+ case GAMMA: return ibs.readGamma();
+ case DELTA: return ibs.readDelta();
+ default: throw new UnsupportedOperationException("The required block coding (" + blockCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a block to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param block the block.
+ * @return the number of written bits.
+ */
+ protected final int writeBlock(final OutputBitStream obs, final int block) throws IOException {
+ switch(blockCoding) {
+ case UNARY: return obs.writeUnary(block);
+ case GAMMA: return obs.writeGamma(block);
+ case DELTA: return obs.writeDelta(block);
+ default: throw new UnsupportedOperationException("The required block coding (" + blockCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads a residual from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next residual.
+ */
+ protected final int readResidual(final InputBitStream ibs) throws IOException {
+ switch(residualCoding) {
+ case GAMMA: return ibs.readGamma();
+ case ZETA: return ibs.readZeta(zetaK);
+ case DELTA: return ibs.readDelta();
+ case GOLOMB: return ibs.readGolomb(zetaK);
+ case NIBBLE: return ibs.readNibble();
+ default: throw new UnsupportedOperationException("The required residuals coding (" + residualCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads a long residual from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next residual.
+ */
+ protected final long readLongResidual(final InputBitStream ibs) throws IOException {
+ switch(residualCoding) {
+ case GAMMA: return ibs.readLongGamma();
+ case ZETA: return ibs.readLongZeta(zetaK);
+ case DELTA: return ibs.readLongDelta();
+ case GOLOMB: return ibs.readLongGolomb(zetaK);
+ case NIBBLE: return ibs.readLongNibble();
+ default: throw new UnsupportedOperationException("The required residuals coding (" + residualCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a residual to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param residual the residual.
+ * @return the number of written bits.
+ */
+ protected final int writeResidual(final OutputBitStream obs, final int residual) throws IOException {
+ switch(residualCoding) {
+ case GAMMA: return obs.writeGamma(residual);
+ case ZETA: return obs.writeZeta(residual, zetaK);
+ case DELTA: return obs.writeDelta(residual);
+ case GOLOMB: return obs.writeGolomb(residual, zetaK);
+ case NIBBLE: return obs.writeNibble(residual);
+ default: throw new UnsupportedOperationException("The required residuals coding (" + residualCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a residual to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param residual the residual.
+ * @return the number of written bits.
+ */
+ protected final int writeResidual(final OutputBitStream obs, final long residual) throws IOException {
+ switch(residualCoding) {
+ case GAMMA: return obs.writeLongGamma(residual);
+ case ZETA: return obs.writeLongZeta(residual, zetaK);
+ case DELTA: return obs.writeLongDelta(residual);
+ case GOLOMB: return (int)obs.writeLongGolomb(residual, zetaK);
+ case NIBBLE: return obs.writeLongNibble(residual);
+ default: throw new UnsupportedOperationException("The required residuals coding (" + residualCoding + ") is not supported.");
+ }
+ }
+
+ /** A bit stream wrapping {@link #graphMemory}, or {@link #graphStream}, used <em>only</em> by {@link #outdegree(int)} and {@link #outdegreeInternal(int)}. */
+ private InputBitStream outdegreeIbs;
+
+ /* The code of the following two methods must be kept in sync. */
+
+ @Override
+ public int outdegree(final int x) throws IllegalStateException {
+ if (x == cachedNode) return cachedOutdegree;
+ if (x < 0 || x >= n) throw new IllegalArgumentException("Node index out of range: " + x);
+
+ /* Computing the outdegree is a most basic operation. Thus, it must be always
+ possible to compute the outdegree of a node independently of any other state
+ in a BVGraph. To this purpose, we have special-purpose input bit stream that
+ is used just to read outdegrees. */
+
+ try {
+ // Without offsets, we just give up.
+ if (offsetType <= 0) throw new IllegalStateException("You cannot compute the outdegree of a random node without offsets");
+ // We just position and read.
+ outdegreeIbs.position(offsets.getLong(cachedNode = x));
+ cachedOutdegree = readOutdegree(outdegreeIbs);
+ cachedPointer = outdegreeIbs.position();
+ return cachedOutdegree;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ private int outdegreeInternal(final int x) throws IOException {
+ if (x == cachedNode) return cachedOutdegree;
+ // We just position and read.
+ outdegreeIbs.position(offsets.getLong(cachedNode = x));
+ cachedOutdegree = readOutdegree(outdegreeIbs);
+ cachedPointer = outdegreeIbs.position();
+ return cachedOutdegree;
+ }
+
+
+ /** Returns an iterator over the successors of a given node.
+ *
+ * @param x a node.
+ * @return an iterator over the successors of the node.
+ */
+ @Override
+ public LazyIntIterator successors(final int x) {
+ // We just call successors(int, InputBitStream, int[][], int[], int[]) with
+ // a newly created input bit stream and null elsewhere.
+ if (x < 0 || x >= n) throw new IllegalArgumentException("Node index out of range: " + x);
+ if (offsetType <= 0) throw new UnsupportedOperationException("Random access to successor lists is not possible with sequential or offline graphs");
+ final InputBitStream ibs = isMemory ? new InputBitStream(graphMemory) : new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0);
+ return successors(x, ibs, null, null);
+ }
+
+
+
+ /** An iterator returning the offsets. */
+ private final static class OffsetsLongIterator implements LongIterator {
+ private final InputBitStream offsetIbs;
+ private final int n;
+ private long off;
+ private int i;
+ private final BVGraph g;
+
+ private OffsetsLongIterator(final BVGraph g, final InputBitStream offsetIbs) {
+ this.offsetIbs = offsetIbs;
+ this.g = g;
+ this.n = g.numNodes();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return i <= n;
+ }
+
+ @Override
+ public long nextLong() {
+ i++;
+ try {
+ return off = g.readOffset(offsetIbs) + off;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ }
+
+
+ /** An iterator returning the residuals of a node. */
+ private final static class ResidualIntIterator extends AbstractLazyIntIterator {
+ /** The graph associated to this iterator. */
+ private final BVGraph g;
+ /** The input bit stream from which residuals will be read. */
+ private final InputBitStream ibs;
+ /** The last residual returned. */
+ private int next;
+ /** The number of remaining residuals. */
+ private int remaining;
+
+ private ResidualIntIterator(final BVGraph g, final InputBitStream ibs, final int residualCount, final int x) {
+ this.g = g;
+ this.remaining = residualCount;
+ this.ibs = ibs;
+ try {
+ this.next = (int)(x + Fast.nat2int(g.readLongResidual(ibs)));
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public int nextInt() {
+ if (remaining == 0) return -1;
+ try {
+ final int result = next;
+ if (--remaining != 0) next += g.readResidual(ibs) + 1;
+ return result;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public int skip(int n) {
+ if (n >= remaining) {
+ n = remaining;
+ remaining = 0;
+ return n;
+ }
+ try {
+ for(int i = n; i-- != 0;) next += g.readResidual(ibs) + 1;
+ remaining -= n;
+ return n;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ }
+
+
+
+ /** Given an {@link InputBitStream} wrapping a graph file, returns an iterator over the
+ * successors of a given node <code>x</code>.
+ *
+ * <P>This method can be used in two different ways:
+ * <OL><LI>by providing a node and an input bit stream wrapping a graph file, it is possible
+ * to access the successor list of the node (provided that offsets have been loaded);
+ * <LI>by providing additional data, which essentially are used to keep some state
+ * about the graph, it is possible to perform an efficient sequential visit of all
+ * successor lists (even when no offsets were loaded).
+ * </OL>
+ *
+ * <P>This method may modify the offset and the outdegree caches if <code>window</code> is <code>null</code>.
+ *
+ * @param x a node.
+ * @param ibs an input bit stream wrapping a graph file. After this method returns, the state of <code>ibs</code> is undefined:
+ * however, after the iterator returned is exhausted, <code>ibs</code> will positioned just after the successor list of <code>x</code>.
+ * @param window either <code>null</code>, or a double array with the following meaning: <code>window[(x-i) mod windowSize]</code>
+ * contains, for all <code>i</code> between 1 (inclusive) and {@link #windowSize} (exclusive), the list of successors
+ * of node <code>x</code>&minus;<code>i</code>. If <code>window</code> is not <code>null</code> then <code>ibs</code>
+ * must be positioned before the successor list of <code>x</code>. This parameter will not be modified.
+ * @param outd if <code>window</code> is not <code>null</code>, this is an array with as many elements
+ * as {@link #windowSize}; <code>outd[(x-i) mod windowSize]</code> contains the outdegree of node <code>x</code>
+ * &minus;<code>i</code> for <code>i</code> greater than 0; at the end, this will be true also for <code>i</code> equal to 0.
+ * @return an iterator over the successors of <code>x</code>.
+ * @throws IllegalStateException if <code>window</code> is <code>null</code> and {@link #offsetType} is 0.
+ *
+ */
+ protected LazyIntIterator successors(final int x, final InputBitStream ibs, final int window[][], final int outd[]) throws IllegalStateException {
+ final int ref, refIndex;
+ int i, extraCount, blockCount = 0;
+ int[] block = null, left = null, len = null;
+
+ if (x < 0 || x >= n) throw new IllegalArgumentException("Node index out of range:" + x);
+
+ try {
+ final int d;
+ final int cyclicBufferSize = windowSize + 1;
+ //long nextOffset = -1;
+
+ if (window == null) {
+ d = outdegreeInternal(x);
+ ibs.position(cachedPointer);
+ }
+ else d = outd[x % cyclicBufferSize] = readOutdegree(ibs);
+
+ if (d == 0) return LazyIntIterators.EMPTY_ITERATOR;
+
+ // We read the reference only if the actual window size is larger than one (i.e., the one specified by the user is larger than 0).
+ if (windowSize > 0) ref = readReference(ibs);
+ else ref = -1;
+
+ refIndex = (x - ref + cyclicBufferSize) % cyclicBufferSize; // The index in window[] of the node we are referring to (it makes sense only if ref>0).
+
+ if (ref > 0) { // This catches both no references at all and no reference specifically for this node.
+ if ((blockCount = readBlockCount(ibs)) != 0) block = new int[blockCount];
+
+ int copied = 0, total = 0; // The number of successors copied, and the total number of successors specified in some copy block.
+ for(i = 0; i < blockCount; i++) {
+ block[i] = readBlock(ibs) + (i == 0 ? 0 : 1);
+ total += block[i];
+ if (i % 2 == 0) copied += block[i];
+ }
+ // If the block count is even, we must compute the number of successors copied implicitly.
+ //if (window == null) nextOffset = offsets.getLong(x - ref);
+ if (blockCount % 2 == 0) copied += (window != null ? outd[refIndex] : outdegreeInternal(x - ref)) - total;
+ extraCount = d - copied;
+ }
+ else extraCount = d;
+
+ int intervalCount = 0; // Number of intervals
+
+ if (extraCount > 0) {
+
+ // Prepare to read intervals, if any
+ if (minIntervalLength != NO_INTERVALS && (intervalCount = ibs.readGamma()) != 0) {
+
+ int prev = 0; // Holds the last integer in the last interval.
+ left = new int[intervalCount];
+ len = new int[intervalCount];
+
+ // Now we read intervals
+ left[0] = prev = (int)(Fast.nat2int(ibs.readLongGamma()) + x);
+ len[0] = ibs.readGamma() + minIntervalLength;
+
+ prev += len[0];
+ extraCount -= len[0];
+
+ for (i = 1; i < intervalCount; i++) {
+ left[i] = prev = ibs.readGamma() + prev + 1;
+ len[i] = ibs.readGamma() + minIntervalLength;
+ prev += len[i];
+ extraCount -= len[i];
+ }
+ }
+ }
+
+ final int residualCount = extraCount; // Just to be able to use an anonymous class.
+
+ final LazyIntIterator residualIterator = residualCount == 0 ? null : new ResidualIntIterator(this, ibs, residualCount, x);
+
+ // The extra part is made by the contribution of intervals, if any, and by the residuals iterator.
+ final LazyIntIterator extraIterator = intervalCount == 0
+ ? residualIterator
+ : (residualCount == 0
+ ? (LazyIntIterator)new IntIntervalSequenceIterator(left, len)
+ : (LazyIntIterator)new MergedIntIterator(new IntIntervalSequenceIterator(left, len), residualIterator)
+ );
+
+ final LazyIntIterator blockIterator = ref <= 0
+ ? null
+ : new MaskedIntIterator(
+ // ...block for masking copy and...
+ block,
+ // ...the reference list (either computed recursively or stored in window)...
+ window != null
+ ? LazyIntIterators.wrap(window[refIndex], outd[refIndex])
+ :
+ // This is the recursive lazy part of the construction.
+ successors(x - ref, isMemory ? new InputBitStream(graphMemory) : new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0), null, null)
+ );
+
+ if (ref <= 0) return extraIterator;
+ else return extraIterator == null
+ ? blockIterator
+ : (LazyIntIterator)new MergedIntIterator(blockIterator, extraIterator);
+
+ }
+ catch (IOException e) {
+ LOGGER.error("Exception while accessing node " + x + ", stream position " + ibs.position(), e);
+ throw new RuntimeException(e);
+ }
+ }
+
+
+ private class BVGraphNodeIterator extends NodeIterator {
+ @SuppressWarnings("hiding")
+ final private int n = numNodes();
+ /** Our bit stream. */
+ final InputBitStream ibs;
+ /** We keep the size of the cyclic buffer (the window size + 1) in a local variable. */
+ final private int cyclicBufferSize = windowSize + 1;
+ /** At any time, window will be ready to be passed to {@link BVGraph#successors(int, InputBitStream, int[][], int[], int[])} */
+ final private int window[][] = new int[cyclicBufferSize][INITIAL_SUCCESSOR_LIST_LENGTH];
+ /** At any time, outd will be ready to be passed to {@link BVGraph#successors(int, InputBitStream, int[][], int[], int[])} */
+ final private int outd[] = new int[cyclicBufferSize];
+ /** The index of the node from which we started iterating. */
+ final private int from;
+ /** The index of the node just before the next one. */
+ private int curr;
+ /** No node &ge; this will be returned. */
+ private final int upperBound;
+
+ /** Creates a new iterator starting from a node, with some pre-initialized status.
+ *
+ * @param from the node to start from.
+ * @param upperBound no node &ge; this will be returned.
+ * @param streamPosition the position where the stream should be put to start reading.
+ * @param window the pre-initialized value for the {@link #window} attribute, or <code>null</code>.
+ * @param outd the pre-initialized value for the {@link #outd} attribute, or <code>null</code>.
+ * @throws IOException
+ */
+ private BVGraphNodeIterator(final int from, final int upperBound, final long streamPosition, final int[][] window, final int[] outd) throws IOException {
+ if (from < 0 || from > n) throw new IllegalArgumentException("Node index out of range: " + from);
+ this.from = from;
+ ibs = createInputBitStream();
+ ibs.position(streamPosition);
+ if (window != null) {
+ for (int i = 0; i < window.length; i++) System.arraycopy(window[i], 0, this.window[i] = IntArrays.grow(this.window[i], outd[i], 0), 0, outd[i]);
+ System.arraycopy(outd, 0, this.outd, 0, outd.length);
+ }
+ else if (from != 0) {
+ if (offsetType <= 0) throw new IllegalStateException("You cannot iterate from a chosen node without offsets");
+
+ int pos;
+ for(int i = 1; i < Math.min(from + 1, cyclicBufferSize); i++) {
+ pos = (from - i + cyclicBufferSize) % cyclicBufferSize;
+ this.outd[pos] = BVGraph.this.outdegreeInternal(from - i);
+ System.arraycopy(BVGraph.this.successorArray(from - i), 0, this.window[pos] = IntArrays.grow(this.window[pos], this.outd[pos], 0), 0, this.outd[pos]);
+ }
+ this.ibs.position(offsets.getLong(from)); // We must fix the bit stream position so that we are *before* the outdegree.
+ }
+ curr = from - 1;
+ this.upperBound = upperBound;
+ }
+
+ /** Creates a brand new iterator starting from a node.
+ *
+ * @param from the node to start from.
+ * @throws IOException
+ */
+ private BVGraphNodeIterator(final int from) throws IOException {
+ this(from, Integer.MAX_VALUE, 0, null, null);
+ }
+
+
+ /** At each call, we build the successor iterator (making a call to {@link BVGraph#successors(int, InputBitStream, int[][], int[])},
+ * and we completely iterate over it, filling the appropriate entry in <code>window</code>. */
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+
+ final int currIndex = ++curr % cyclicBufferSize;
+ final LazyIntIterator i = BVGraph.this.successors(curr, ibs, window, outd);
+
+ final int d = outd[currIndex];
+ if (window[currIndex].length < d) window[currIndex] = new int[d];
+ final int[] w = window[currIndex];
+ for(int j = 0; j < d; j++) w[j] = i.nextInt();
+
+ return curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return curr < Math.min(n - 1, upperBound - 1);
+ }
+
+ @Override
+ public LazyIntIterator successors() {
+ if (curr == from - 1) throw new IllegalStateException();
+
+ final int currIndex = curr % cyclicBufferSize;
+ return LazyIntIterators.wrap(window[currIndex], outd[currIndex]);
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (curr == from - 1) throw new IllegalStateException();
+
+ return window[curr % cyclicBufferSize];
+ }
+
+ @Override
+ public int outdegree() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return outd[curr % cyclicBufferSize];
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ ibs.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ try {
+ return new BVGraphNodeIterator(curr + 1, upperBound, ibs.position(), window, outd);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ /** Creates a new bit stream to read the graph from, based on the data of the host instance.
+ *
+ * @return a newly-created bit stream to read the graph from.
+ * @throws FileNotFoundException if the graph is read from a file, but the file does not exist.
+ */
+ private InputBitStream createInputBitStream() throws FileNotFoundException {
+ if (offsetType == -1)
+ return new InputBitStream(new FileInputStream(basename + GRAPH_EXTENSION), STD_BUFFER_SIZE);
+ else
+ if (isMemory)
+ return new InputBitStream(graphMemory);
+ else
+ if (isMapped)
+ return new InputBitStream(mappedGraphStream.copy());
+ else
+ return new InputBitStream(new FastMultiByteArrayInputStream(graphStream), 0);
+ }
+
+
+ };
+
+
+ /** This method returns a node iterator for scanning the graph sequentially, starting from the given node.
+ * It keeps track of a sliding window of {@link #windowSize()} previous successor lists
+ * to speed up the iteration of graphs with significant referentiation.
+ *
+ * @param from the node from which the iterator will iterate.
+ * @return a {@link NodeIterator} for accessing nodes and successors sequentially.
+ */
+
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ try {
+ return new BVGraphNodeIterator(from);
+ } catch (FileNotFoundException e) {
+ throw new IllegalStateException("The graph file \"" + basename + GRAPH_EXTENSION + "\" cannot be found");
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+
+ /* The following private methods handle the flag mask. They are the only methods which replicate
+ * the shifting logic specified in the flag-mask definition.
+ */
+
+ /** Sets the {@link #flags} attribute to the given value, and updates appropriately the
+ * individual coding attributes (<code>g&hellip;Coding</code>).
+ *
+ * <P>If a certain bit-slot within <code>flags</code> is not specified (i.e., 0) the corresponding
+ * coding variable is left unchanged, making the assumption that it is the default value (this condition
+ * is anyway not checked for).
+ *
+ * @param flags a mask of flags as specified by the constants of this class.
+ */
+ private void setFlags(final int flags) {
+ this.flags = flags;
+ if ((flags & 0xF) != 0) outdegreeCoding = flags & 0xF;
+ if (((flags >>> 4) & 0xF) != 0) blockCoding = (flags >>> 4) & 0xF;
+ if (((flags >>> 8) & 0xF) != 0) residualCoding = (flags >>> 8) & 0xF;
+ if (((flags >>> 12) & 0xF) != 0) referenceCoding = (flags >>> 12) & 0xF;
+ if (((flags >>> 16) & 0xF) != 0) blockCountCoding = (flags >>> 16) & 0xF;
+ if (((flags >>> 20) & 0xF) != 0) offsetCoding = (flags >>> 20) & 0xF;
+ }
+
+ /** Produces a string representing the values coded in the given flag mask.
+ *
+ * @param flags a flag mask.
+ * @return a string representing the flag mask.
+ */
+ private static MutableString flags2String(final int flags) {
+ MutableString s = new MutableString();
+
+ if ((flags & 0xF) != 0) s.append(" | ").append("OUTDEGREES_").append(CompressionFlags.CODING_NAME[flags & 0xF]);
+ if (((flags >>> 4) & 0xF) != 0) s.append(" | ").append("BLOCKS_").append(CompressionFlags.CODING_NAME[(flags >>> 4) & 0xF]);
+ if (((flags >>> 8) & 0xF) != 0) s.append(" | ").append("RESIDUALS_").append(CompressionFlags.CODING_NAME[(flags >>> 8) & 0xF]);
+ if (((flags >>> 12) & 0xF) != 0) s.append(" | ").append("REFERENCES_").append(CompressionFlags.CODING_NAME[(flags >>> 12) & 0xF]);
+ if (((flags >>> 16) & 0xF) != 0) s.append(" | ").append("BLOCK_COUNT_").append(CompressionFlags.CODING_NAME[(flags >>> 16) & 0xF]);
+ if (((flags >>> 20) & 0xF) != 0) s.append(" | ").append("OFFSETS_").append(CompressionFlags.CODING_NAME[(flags >>> 20) & 0xF]);
+
+ if (s.length() != 0) s.delete(0, 3);
+ return s;
+ }
+
+ /** Produces a flag mask corresponding to a given string.
+ *
+ * @param flagString a flag string.
+ * @return the flag mask.
+ * @throws IOException if the flag string is malformed.
+ */
+ private static int string2Flags(final String flagString) throws IOException {
+ int flags = 0;
+
+ if (flagString != null && flagString.length() != 0) {
+ String f[] = flagString.split("\\|");
+ for(int i = 0; i < f.length; i++) {
+ try {
+ flags |= BVGraph.class.getField(f[i].trim()).getInt(BVGraph.class);
+ }
+ catch (Exception notFound) {
+ throw new IOException("Compression flag " + f[i] + " unknown.");
+ }
+ }
+ }
+ return flags;
+ }
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory.
+ *
+ * @param basename the basename of the graph.
+ * @param offsetType the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file).
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(final CharSequence basename, final int offsetType, final ProgressLogger pl) throws IOException {
+ return new BVGraph().loadInternal(basename, offsetType, pl);
+ }
+
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with no progress logger.
+ *
+ * @param basename the basename of the graph.
+ * @param offsetType the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file).
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(CharSequence basename, int offsetType) throws IOException {
+ return BVGraph.load(basename, offsetType, null);
+ }
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with no progress logger and
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(CharSequence basename) throws IOException {
+ return BVGraph.load(basename, 1);
+ }
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, 1, pl);
+ }
+
+ /** Creates a new {@link BVGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the offsets, or <code>null</code>.
+ * @return an {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+ public static BVGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, 2, pl);
+ }
+
+ /** Creates a new {@link BVGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+ public static BVGraph loadMapped(CharSequence basename) throws IOException {
+ return BVGraph.loadMapped(basename, null);
+ }
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence, ProgressLogger)} or {@link #loadMapped(CharSequence, ProgressLogger)} instead.
+ */
+ @Deprecated
+ public static BVGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, 0, pl);
+ }
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with no progress logger and
+ * without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence)} or {@link #loadMapped(CharSequence)} instead.
+ */
+ @Deprecated
+ public static BVGraph loadSequential(CharSequence basename) throws IOException {
+ return BVGraph.loadSequential(basename, null);
+ }
+
+
+
+ /** Creates a new {@link BVGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata.
+ */
+ public static BVGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, -1, pl);
+ }
+
+
+
+ /** Creates a new {@link BVGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata.
+ */
+ public static BVGraph loadOffline(CharSequence basename) throws IOException {
+ return BVGraph.loadOffline(basename, (ProgressLogger)null);
+ }
+
+
+
+ /** Loads a compressed graph file from disk into this graph. Note that this method should
+ * be called <em>only</em> on a newly created graph.
+ *
+ * @param basename the basename of the graph.
+ * @param offsetType the desired offset type (2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file).
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return this graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ protected BVGraph loadInternal(final CharSequence basename, int offsetType, final ProgressLogger pl) throws IOException {
+
+ // First of all, we read the property file to get the relevant data.
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ this.offsetType = offsetType;
+ this.basename = new MutableString(basename);
+
+ // Soft check--we accept big stuff, too.
+ if (! getClass().getName().equals(properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY).replace("it.unimi.dsi.big.webgraph", "it.unimi.dsi.webgraph")))
+ throw new IOException("This class (" + this.getClass().getName() + ") cannot load a graph stored using class \"" + properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY) + "\"");
+
+ // We parse the properties and perform some consistency check and assignments.
+ setFlags(string2Flags(properties.getProperty("compressionflags")));
+ if (properties.getProperty("version") == null) throw new IOException("Missing format version information");
+ else if (Integer.parseInt(properties.getProperty("version")) > BVGRAPH_VERSION) throw new IOException("This graph uses format " + properties.getProperty("version") + ", but this class can understand only graphs up to format " + BVGRAPH_VERSION);;
+
+ final long nodes = Long.parseLong(properties.getProperty("nodes"));
+ if (nodes > Integer.MAX_VALUE) throw new IllegalArgumentException("The standard version of WebGraph cannot handle graphs with " + nodes + " (>2^31) nodes");
+ n = (int)nodes;
+ m = Long.parseLong(properties.getProperty("arcs"));
+ windowSize = Integer.parseInt(properties.getProperty("windowsize"));
+ maxRefCount = Integer.parseInt(properties.getProperty("maxrefcount"));
+ minIntervalLength = Integer.parseInt(properties.getProperty("minintervallength"));
+ if (properties.getProperty("zetak") != null) zetaK = Integer.parseInt(properties.getProperty("zetak"));
+
+ if (offsetType < -1 || offsetType > 2) throw new IllegalArgumentException("Illegal offset type " + offsetType);
+ final InputBitStream offsetIbs = offsetType > 0 ? new InputBitStream(new FileInputStream(basename + OFFSETS_EXTENSION), STD_BUFFER_SIZE) : null;
+
+ if (offsetType >= 0) {
+ final FileInputStream fis = new FileInputStream(basename + GRAPH_EXTENSION);
+
+ if (offsetType == 2) {
+ mappedGraphStream = ByteBufferInputStream.map(fis.getChannel(), MapMode.READ_ONLY);
+ isMapped = true;
+ }
+ else {
+ // read the whole graph into memory
+ if (pl != null) {
+ pl.itemsName = "bytes";
+ pl.start("Loading graph...");
+ }
+
+ if (fis.getChannel().size() <= Integer.MAX_VALUE) {
+ graphMemory = new byte[(int) fis.getChannel().size()];
+ BinIO.loadBytes(fis, graphMemory);
+ fis.close();
+ isMemory = true;
+ }
+ else graphStream = new FastMultiByteArrayInputStream(fis, fis.getChannel().size());
+
+ if (pl != null) {
+ pl.count = isMemory ? graphMemory.length : graphStream.length;
+ pl.done();
+ }
+ }
+ }
+
+ if (offsetType == 1 || offsetType == 2) {
+ // read offsets, if required
+
+ if (pl != null) {
+ pl.itemsName = "deltas";
+ pl.start("Loading offsets...");
+ }
+
+ // We try to load a cached big list.
+ final File offsetsBigListFile = new File(basename + OFFSETS_BIG_LIST_EXTENSION);
+ if (offsetsBigListFile.exists()) {
+ if (new File(basename + OFFSETS_EXTENSION).lastModified() > offsetsBigListFile.lastModified()) LOGGER.warn("A cached long big list of offsets was found, but the corresponding offsets file has a later modification time");
+ else try {
+ offsets = (LongBigList)BinIO.loadObject(offsetsBigListFile);
+ }
+ catch (ClassNotFoundException e) {
+ LOGGER.warn("A cached long big list of offsets was found, but its class is unknown", e);
+ }
+ }
+ if (offsets == null) offsets = new EliasFanoMonotoneLongBigList(n + 1, (isMapped ? mappedGraphStream.length() : isMemory ? graphMemory.length : graphStream.length) * Byte.SIZE + 1, new OffsetsLongIterator(this, offsetIbs));
+
+ if (pl != null) {
+ pl.count = n + 1;
+ pl.done();
+ if (offsets instanceof EliasFanoMonotoneLongBigList) pl.logger().info("Pointer bits per node: " + Util.format(((EliasFanoMonotoneLongBigList)offsets).numBits() / (n + 1.0)));
+ }
+ }
+
+ if (offsetIbs != null) offsetIbs.close();
+
+ // We finally create the outdegreeIbs and, if needed, the two caches
+ if (offsetType >= 0) outdegreeIbs = isMemory ? new InputBitStream(graphMemory): new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0);
+
+ return this;
+ }
+
+
+
+ /** This method tries to express an increasing sequence of natural numbers <code>x</code> as a union of an increasing
+ * sequence of intervals and an increasing sequence of residual elements. More precisely, this intervalization works
+ * as follows: first, one looks at <code>x</code> as a sequence of intervals (i.e., maximal sequences of consecutive
+ * elements); those intervals whose length is &ge; <code>minInterval</code> are stored in the lists <code>left</code>
+ * (the list of left extremes) and <code>len</code> (the list of lengths; the length of an integer interval is the
+ * number of integers in that interval). The remaining integers, called <em>residuals</em> are stored in the
+ * <code>residual</code> list.
+ *
+ * <P>Note that the previous content of <code>left</code>, <code>len</code> and <code>residual</code> is lost.
+ *
+ * @param x the list to be intervalized (an increasing list of natural numbers).
+ * @param minInterval the least length that a maximal sequence of consecutive elements must have in order for it to
+ * be considered as an interval.
+ * @param left the resulting list of left extremes of the intervals.
+ * @param len the resulting list of interval lengths.
+ * @param residuals the resulting list of residuals.
+ * @return the number of intervals.
+ */
+ protected static int intervalize(final IntArrayList x, final int minInterval, final IntArrayList left, final IntArrayList len, final IntArrayList residuals) {
+ int nInterval = 0;
+ int vl = x.size();
+ int v[] = x.elements();
+ int i, j;
+
+ left.clear(); len.clear(); residuals.clear();
+ for(i = 0; i < vl; i++) {
+ j = 0;
+ if (i < vl - 1 && v[i] + 1 == v[i + 1]) {
+ do j++; while(i + j < vl - 1 && v[i + j] + 1 == v[i + j + 1]);
+ j++;
+ // Now j is the number of integers in the interval.
+ if (j >= minInterval) {
+ left.add(v[i]);
+ len.add(j);
+ nInterval++;
+ i += j - 1;
+ }
+ }
+ if (j < minInterval) residuals.add(v[i]);
+ }
+ return nInterval;
+ }
+
+
+
+
+
+ /** Writes the given graph using a given base name.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param windowSize the window size (-1 for the default value).
+ * @param maxRefCount the maximum reference count (-1 for the default value).
+ * @param minIntervalLength the minimum interval length (-1 for the default value, {@link #NO_INTERVALS} to disable).
+ * @param zetaK the parameter used for residual &zeta;-coding, if used (-1 for the default value).
+ * @param flags the flag mask.
+ * @param numberOfThreads the number of threads to use; if 0 or negative, it will be replaced by {@link Runtime#availableProcessors()}. Note that if
+ * {@link ImmutableGraph#numNodes()} is not implemented by {@code graph}, the number of threads will be automatically set to one, possibly logging a warning.
+ * @param pl a progress logger to log the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength,
+ int zetaK, int flags, final int numberOfThreads, ProgressLogger pl) throws IOException {
+ BVGraph g = new BVGraph();
+ if (windowSize != -1) g.windowSize = windowSize;
+ if (maxRefCount != -1) g.maxRefCount = maxRefCount;
+ if (minIntervalLength != -1) g.minIntervalLength = minIntervalLength;
+ if (zetaK != -1) g.zetaK = zetaK;
+ g.setFlags(flags);
+ g.storeInternal(graph, basename, numberOfThreads, pl);
+ }
+
+ /** Writes the given graph using a given base name.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param windowSize the window size (-1 for the default value).
+ * @param maxRefCount the maximum reference count (-1 for the default value).
+ * @param minIntervalLength the minimum interval length (-1 for the default value, {@link #NO_INTERVALS} to disable).
+ * @param zetaK the parameter used for residual &zeta;-coding, if used (-1 for the default value).
+ * @param flags the flag mask.
+ * @param pl a progress logger to log the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength,
+ int zetaK, int flags, ProgressLogger pl) throws IOException {
+ BVGraph.store(graph, basename, windowSize, maxRefCount, minIntervalLength, zetaK, flags, 0, pl);
+ }
+
+ /** Writes the given graph using a given base name, without any progress logger.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param windowSize the window size (-1 for the default value).
+ * @param maxRefCount the maximum reference count (-1 for the default value).
+ * @param minIntervalLength the minimum interval length (-1 for the default value, {@link #NO_INTERVALS} to disable).
+ * @param zetaK the parameter used for residual &zeta;-coding, if used (-1 for the default value).
+ * @param flags the flag mask.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength,
+ int zetaK, int flags) throws IOException {
+ BVGraph.store(graph, basename, windowSize, maxRefCount, minIntervalLength, zetaK, flags, 0, (ProgressLogger)null);
+ }
+
+ /** Writes the given graph using a given base name, without any progress logger.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param windowSize the window size (-1 for the default value).
+ * @param maxRefCount the maximum reference count (-1 for the default value).
+ * @param minIntervalLength the minimum interval length (-1 for the default value, {@link #NO_INTERVALS} to disable).
+ * @param zetaK the parameter used for residual &zeta;-coding, if used (-1 for the default value).
+ * @param flags the flag mask.
+ * @param numberOfThreads the number of threads to use; if 0 or negative, it will be replaced by {@link Runtime#availableProcessors()}. Note that if
+ * {@link ImmutableGraph#numNodes()} is not implemented by {@code graph}, the number of threads will be automatically set to one, possibly logging a warning.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength,
+ int zetaK, int flags, int numberOfThreads) throws IOException {
+ BVGraph.store(graph, basename, windowSize, maxRefCount, minIntervalLength, zetaK, flags, numberOfThreads, (ProgressLogger)null);
+ }
+
+ /** Writes the given graph using a given base name, with all
+ * parameters set to their default values.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param pl a progress logger to log the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, ProgressLogger pl) throws IOException {
+ BVGraph.store(graph, basename, -1, -1, -1, -1, 0, 0, pl);
+ }
+
+ /** Writes the given graph using a given base name, with all
+ * parameters set to their default values.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param numberOfThreads the number of threads to use; if 0 or negative, it will be replaced by {@link Runtime#availableProcessors()}. Note that if
+ * {@link ImmutableGraph#numNodes()} is not implemented by {@code graph}, the number of threads will be automatically set to one, possibly logging a warning.
+ * @param pl a progress logger to log the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, final int numberOfThreads, ProgressLogger pl) throws IOException {
+ BVGraph.store(graph, basename, -1, -1, -1, -1, 0, numberOfThreads, pl);
+ }
+
+
+ /** Writes the given graph using a given base name, without any progress logger and with all
+ * parameters set to their default values.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename) throws IOException {
+ BVGraph.store(graph, basename, (ProgressLogger)null);
+ }
+
+
+ /** Updates a list of exponential bins using the gaps a given list of strinctly increasing integers.
+ * @param currNode the current node.
+ * @param list a strictly increasing list of integers.
+ * @param length the number of valid elements in <code>list</code>.
+ * @param bin the bins.
+ */
+ protected static void updateBins(final int currNode, final int[] list, final int length, final long[] bin) {
+ for(int i = length - 1; i-- != 0;) bin[Fast.mostSignificantBit(list[i + 1] - list[i])]++;
+ final int msb = Fast.mostSignificantBit(Fast.int2nat((long)list[0] - currNode));
+ if (msb >= 0) bin[msb]++;
+ }
+
+ @SuppressWarnings("unused")
+ private final class CompressionThread implements Callable<Void> {
+ public final CharSequence threadBasename;
+ private final ProgressLogger pl;
+ public final NodeIterator nodeIterator;
+
+ /** Statistics for the gap width of successor lists (exponentially binned). */
+ public long[] successorGapStats;
+ /** Statistics for the gap width of residuals (exponentially binned). */
+ public long[] residualGapStats;
+ /** Bits used for outdegress. */
+ public long bitsForOutdegrees;
+ /** Bits used to write backward references. */
+ public long bitsForReferences;
+ /** Bits used to write inclusion-exclusion blocks. */
+ public long bitsForBlocks;
+ /** Bits used to write residuals. */
+ public long bitsForResiduals;
+ /** Bits used to write intervals. */
+ public long bitsForIntervals;
+ /** The number of arcs copied. */
+ public long copiedArcs;
+ /** The number of arcs that have been intervalised. */
+ public long intervalisedArcs;
+ /** The number of arcs that are represented explicitly. */
+ public long residualArcs;
+
+ public long totRef = 0, totDist = 0, totLinks = 0;
+ private final int index;
+ private final int numNodes;
+
+
+ private CompressionThread(final int index, final int numNodes, final NodeIterator nodeIterator, final CharSequence basename, final int bufferSize, final ProgressLogger pl) {
+ this.index = index;
+ this.numNodes = numNodes;
+ this.nodeIterator = nodeIterator;
+ this.bufferSize = bufferSize;
+ this.threadBasename = basename;
+ this.pl = pl;
+ }
+
+ /** Scratch variables used by the {@link #diffComp(OutputBitStream, int, int, int[], int, int[], int, boolean)} method. */
+ private final IntArrayList extras = new IntArrayList(), blocks = new IntArrayList(), residuals = new IntArrayList(),
+ left = new IntArrayList(), len = new IntArrayList();
+ private OutputBitStream graphObs;
+ public long graphWrittenBits;
+ public long offsetsWrittenBits;
+ private final int bufferSize;
+
+ /** Compresses differentially the given list. This method is given a node (with index <code>currNode</code>) called the
+ * current node, with its successor list (contained in the array <code>currList[0..currLen-1]</code>), and another node
+ * (with index <code>currNode</code>&minus;<code>ref</code>), called the reference node, with its successor list (contained in the array
+ * <code>refList[0..refLen-1]</code>). This method produces, onto the given output bit stream, the compressed successor
+ * list of the current node using the reference node given (except for the outdegree); the number of bits written is returned.
+ *
+ * Note that <code>ref</code> may be zero, in which case no differential compression is made.
+ *
+ * @param obs an output bit stream where the compressed data will be stored.
+ * @param currNode the index of the node this list of outlinks refers to.
+ * @param ref the distance from the reference list.
+ * @param refList the reference list.
+ * @param refLen the length of the reference list.
+ * @param currList the current list.
+ * @param currLen the current list length.
+ * @param forReal if true, we are really writing data (i.e., <code>obs</code> is not just a bit count stream).
+ * @return the number of bits written.
+ */
+ private int diffComp(final OutputBitStream obs, final int currNode, final int ref, final int refList[], int refLen, final int currList[], final int currLen, final boolean forReal) throws IOException {
+ // Bits already written onto the output bit stream
+ final long writtenBitsAtStart = obs.writtenBits();
+
+ // We build the list of blocks copied and skipped (alternatively) from the previous list.
+ int i, j = 0, k = 0, prev = 0, currBlockLen = 0, t;
+ boolean copying = true;
+
+ if (ref == 0) refLen = 0; // This guarantees that we will not try to differentially compress when ref == 0.
+
+ extras.clear();
+ blocks.clear();
+
+ // j is the index of the next successor of the current node we must examine
+ // k is the index of the next successor of the reference node we must examine
+ // copying is true iff we are producing a copy block (instead of an ignore block)
+ // currBlockLen is the number of entries (in the reference list) we have already copied/ignored (in the current block)
+ while(j < currLen && k < refLen) {
+ if (copying) { // First case: we are currectly copying entries from the reference list
+ if (currList[j] > refList[k]) {
+ /* If while copying we trespass the current element of the reference list,
+ we must stop copying. */
+ blocks.add(currBlockLen);
+ copying = false;
+ currBlockLen = 0;
+ }
+ else if (currList[j] < refList[k]) {
+ /* If while copying we find a non-matching element of the reference list which
+ is larger than us, we can just add the current element to the extra list
+ and move on. j gets increased. */
+ extras.add(currList[j++]);
+ }
+ else { // currList[j] == refList[k]
+ /* If the current elements of the two lists are equal, we just increase the block length.
+ both j and k get increased. */
+ j++;
+ k++;
+ currBlockLen++;
+ if (forReal) copiedArcs++;
+ }
+ }
+ else { // Second case: we are currently skipping entries from the reference list
+ if (currList[j] < refList[k]) {
+ /* If we did not trespass the current element of the reference list, we just
+ add the current element to the extra list and move on. j gets increased. */
+ extras.add(currList[j++]);
+ }
+ else if (currList[j] > refList[k]) {
+ /* If we trespassed the currented element of the reference list, we
+ increase the block length. k gets increased. */
+ k++;
+ currBlockLen++;
+ }
+ else { // currList[j] == refList[k]
+ /* If we found a match we flush the current block and start a new copying phase. */
+ blocks.add(currBlockLen);
+ copying = true;
+ currBlockLen = 0;
+ }
+ }
+ }
+
+ /* We do not record the last block. The only case when we have to enqueue the last block's length
+ * is when we were copying and we did not copy up to the end of the reference list.
+ */
+ if (copying && k < refLen) blocks.add(currBlockLen);
+
+ // If there are still missing elements, we add them to the extra list.
+ while(j < currLen) extras.add(currList[j++]);
+
+ // We store locally the resulting arrays for faster access.
+ final int block[] = blocks.elements(), blockCount = blocks.size(), extraCount = extras.size();
+
+ // If we have a nontrivial reference window we write the reference to the reference list.
+ if (windowSize > 0) {
+ t = writeReference(obs, ref);
+ if (forReal) bitsForReferences += t;
+ }
+
+ if (STATS) if (forReal) referenceStats.println(ref);
+
+ // Then, if the reference is not void we write the length of the copy list.
+ if (ref != 0) {
+ t = writeBlockCount(obs, blockCount);
+ if (forReal) bitsForBlocks += t;
+
+ if (STATS) if (forReal) blockCountStats.println(blockCount);
+
+ // Then, we write the copy list; all lengths except the first one are decremented.
+ if (blockCount > 0) {
+ t = writeBlock(obs, block[0]);
+ if (forReal) bitsForBlocks += t;
+ for(i = 1; i < blockCount; i++) {
+ t = writeBlock(obs, block[i] - 1);
+ if (forReal) bitsForBlocks += t;
+ }
+
+ if (STATS) if (forReal) {
+ blockStats.println(block[0]);
+ for(i = 1; i < blockCount; i++) blockStats.println(block[i] - 1);
+ }
+ }
+ }
+
+ // Finally, we write the extra list.
+ if (extraCount > 0) {
+
+ final int residual[], residualCount;
+
+ if (minIntervalLength != NO_INTERVALS) {
+ // If we are to produce intervals, we first compute them.
+ final int intervalCount = intervalize(extras, minIntervalLength, left, len, residuals);
+
+ // We write the number of intervals.
+ t = obs.writeGamma(intervalCount);
+ if (forReal) bitsForIntervals += t;
+
+ if (STATS) if (forReal) intervalCountStats.println(intervalCount);
+
+ int currIntLen;
+
+ // We write out the intervals.
+ for(i = 0; i < intervalCount; i++) {
+ if (i == 0) t = obs.writeLongGamma(Fast.int2nat((long)(prev = left.getInt(i)) - currNode));
+ else t = obs.writeGamma(left.getInt(i) - prev - 1);
+ if (forReal) bitsForIntervals += t;
+ currIntLen = len.getInt(i);
+ prev = left.getInt(i) + currIntLen;
+ if (forReal) intervalisedArcs += currIntLen;
+ t = obs.writeGamma(currIntLen - minIntervalLength);
+ if (forReal) bitsForIntervals += t;
+ }
+
+ if (STATS) if (forReal) for(i = 0; i < intervalCount; i++) {
+ if (i == 0) leftStats.println(Fast.int2nat((long)(prev = left.getInt(i)) - currNode));
+ else leftStats.println(left.getInt(i) - prev - 1);
+ prev = left.getInt(i) + len.getInt(i);
+ lenStats.println(len.getInt(i) - minIntervalLength);
+ }
+
+
+ residual = residuals.elements();
+ residualCount = residuals.size();
+ }
+ else {
+ residual = extras.elements();
+ residualCount = extras.size();
+ }
+
+ if (STATS) if (forReal) residualCountStats.println(residualCount);
+
+ // Now we write out the residuals, if any
+ if (residualCount != 0) {
+ if (forReal) {
+ residualArcs += residualCount;
+ updateBins(currNode, residual, residualCount, residualGapStats);
+ }
+ t = writeResidual(obs, Fast.int2nat((long)(prev = residual[0]) - currNode));
+ if (forReal) bitsForResiduals += t;
+ for(i = 1; i < residualCount; i++) {
+ if (residual[i] == prev) throw new IllegalArgumentException("Repeated successor " + prev + " in successor list of node " + currNode);
+ t = writeResidual(obs, residual[i] - prev - 1);
+ if (forReal) bitsForResiduals += t;
+ prev = residual[i];
+ }
+
+ if (STATS) if (forReal) {
+ residualStats.println(Fast.int2nat((long)(prev = residual[0]) - currNode));
+ for(i = 1; i < residualCount; i++) {
+ residualStats.println(residual[i] - prev - 1);
+ prev = residual[i];
+ }
+ }
+ }
+
+ }
+
+ return (int)(obs.writtenBits() - writtenBitsAtStart);
+ }
+
+ @Override
+ public Void call() throws Exception {
+ if (nodeIterator == null) return null;
+ // Used for differential compression
+ final OutputBitStream bitCount = new OutputBitStream(NullOutputStream.getInstance(), 0);
+ int outd, currIndex, j, bestIndex, cand;
+ long best, t, bitOffset = 0;
+
+ graphObs = new OutputBitStream(new FileOutputStream(threadBasename + BVGraph.GRAPH_EXTENSION), bufferSize);
+ OutputBitStream offsetObs = new OutputBitStream(new FileOutputStream(threadBasename + BVGraph.OFFSETS_EXTENSION), bufferSize);
+
+ if (STATS) {
+ offsetStats = new PrintWriter(new FileWriter(threadBasename + ".offsetStats"));
+ referenceStats = new PrintWriter(new FileWriter(threadBasename + ".referenceStats"));
+ outdegreeStats = new PrintWriter(new FileWriter(threadBasename + ".outdegreeStats"));
+ blockCountStats = new PrintWriter(new FileWriter(threadBasename + ".blockCountStats"));
+ blockStats = new PrintWriter(new FileWriter(threadBasename + ".blockStats"));
+ intervalCountStats = new PrintWriter(new FileWriter(threadBasename + ".intervalCountStats"));
+ leftStats = new PrintWriter(new FileWriter(threadBasename + ".leftStats"));
+ lenStats = new PrintWriter(new FileWriter(threadBasename + ".lenStats"));
+ residualCountStats = new PrintWriter(new FileWriter(threadBasename + ".residualCountStats"));
+ residualStats = new PrintWriter(new FileWriter(threadBasename + ".residualStats"));
+ }
+
+ final int cyclicBufferSize = windowSize + 1;
+ // Cyclic array of previous lists.
+ int list[][] = new int[cyclicBufferSize][INITIAL_SUCCESSOR_LIST_LENGTH];
+ // For each list, its length.
+ int listLen[] = new int[cyclicBufferSize];
+ // For each list, the depth of its references.
+ int refCount[] = new int[cyclicBufferSize];
+ successorGapStats = new long[32];
+ residualGapStats = new long[32];
+
+ nodeIterator.hasNext();
+
+ if (pl != null && index == 0) { // Only the first thread starts the logger
+ pl.itemsName = "nodes";
+ try {
+ pl.expectedUpdates = numNodes;
+ }
+ catch(UnsupportedOperationException ignore) {}
+ pl.start("Storing...");
+ }
+
+ // We iterate over the nodes of graph
+ int updates = 0;
+ while(nodeIterator.hasNext()) {
+ // currNode is the currently examined node, of outdegree outd, with index currIndex (within the cyclic array)
+ final int currNode = nodeIterator.nextInt();
+ outd = nodeIterator.outdegree();// get the number of successors of currNode
+ currIndex = currNode % cyclicBufferSize;
+
+ // We write the current offset to the offset stream
+ writeOffset(offsetObs, graphObs.writtenBits() - bitOffset);
+
+ if (STATS) offsetStats.println(graphObs.writtenBits() - bitOffset);
+
+ bitOffset = graphObs.writtenBits();
+
+ // We write the node outdegree
+ bitsForOutdegrees += writeOutdegree(graphObs, outd);
+
+ if (STATS) outdegreeStats.println(outd);
+
+ if (outd > list[currIndex].length) list[currIndex] = IntArrays.ensureCapacity(list[currIndex], outd);
+
+ // The successor list we are going to compress and write out
+ System.arraycopy(nodeIterator.successorArray(), 0, list[currIndex], 0, outd);
+ listLen[currIndex] = outd;
+
+ if (outd > 0) {
+ updateBins(currNode, list[currIndex], outd, successorGapStats);
+ try {
+ // Now we check the best candidate for compression.
+ best = Integer.MAX_VALUE;
+ bestIndex = -1;
+
+ refCount[currIndex] = -1;
+
+ for(j = 0; j < cyclicBufferSize; j++) {
+ cand = (currNode - j + cyclicBufferSize) % cyclicBufferSize;
+ if (refCount[cand] < maxRefCount && listLen[cand] != 0
+ && (t = diffComp(bitCount, currNode, j, list[cand], listLen[cand], list[currIndex], listLen[currIndex], false)) < best) {
+ best = t;
+ bestIndex = cand;
+ }
+ }
+
+ if (ASSERTS) assert bestIndex >= 0;
+ refCount[currIndex] = refCount[bestIndex] + 1;
+ diffComp(graphObs, currNode, (currNode - bestIndex + cyclicBufferSize) % cyclicBufferSize, list[bestIndex],
+ listLen[bestIndex], list[currIndex], listLen[currIndex], true);
+
+ totLinks += outd;
+ totRef += refCount[currIndex];
+ totDist += (currNode - bestIndex + cyclicBufferSize) % cyclicBufferSize;
+ } catch (RuntimeException e) {
+ LOGGER.debug("An exception occurred while storing node " + currNode + " with outlinks " + Arrays.toString(Arrays.copyOfRange(nodeIterator.successorArray(), 0, nodeIterator.outdegree())));
+ throw e;
+ }
+ }
+
+
+ if (pl != null && ((currNode + 1) & ((1 << 20) - 1)) == 0) pl.logger().info(new Formatter(Locale.ROOT).format(
+ "bits/link: %.3f; bits/node: %.3f; avgref: %.3f; avgdist: %.3f.",
+ Double.valueOf((double)graphObs.writtenBits() / (totLinks != 0 ? totLinks : 1)),
+ Double.valueOf((double)graphObs.writtenBits() / currNode),
+ Double.valueOf((double)totRef / currNode),
+ Double.valueOf((double)totDist / currNode)
+ ).toString()
+ );
+
+ if (pl != null && (++updates & 0xFFFF) == 0) {
+ synchronized (pl) { pl.update(updates); }
+ updates = 0;
+ }
+ }
+
+ if (pl != null) synchronized (pl) { pl.update(updates); }
+
+ // TODO: find a way to reintroduce this check
+ //if (currNode != end && ! (end == Integer.MAX_VALUE && currNode == graph.numNodes())) throw new IllegalStateException("The graph claimed to have " + end + " nodes, but the node iterator returned " + currNode);
+
+ // We write the final offset to the offset stream.
+ writeOffset(offsetObs, graphObs.writtenBits() - bitOffset);
+
+ graphWrittenBits = graphObs.writtenBits();
+ offsetsWrittenBits = offsetObs.writtenBits();
+ graphObs.close();
+ offsetObs.close();
+
+ return null;
+ }
+ }
+
+ private static final long aggregateLong(CompressionThread[] compressionThread, final String fieldName) {
+ long v = 0;
+ for(CompressionThread t: compressionThread)
+ try {
+ if (t.nodeIterator == null) continue;
+ v += CompressionThread.class.getField(fieldName).getLong(t);
+ }
+ catch (Exception e) {
+ throw new RuntimeException(e.getMessage(), e);
+ }
+ return v;
+ }
+
+ private static final long[] aggregateStats(CompressionThread[] compressionThread, final String fieldName) {
+ final long[] stats = new long[32];
+
+ for(CompressionThread t: compressionThread)
+ try {
+ if (t.nodeIterator == null) continue;
+ final long[] s = (long[])CompressionThread.class.getField(fieldName).get(t);
+ for(int i = stats.length; i-- != 0;) stats[i] += s[i];
+ }
+ catch (Exception e) {
+ throw new RuntimeException(e.getMessage(), e);
+ }
+ return stats;
+ }
+
+ /** Writes the given graph <code>graph</code> using a given base name, and the compression parameters and flags
+ * of this graph object. Note that the latter is relevant only as far as parameters and flags are concerned; its
+ * content is really irrelevant.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param numberOfThreads the number of threads to use; if 0 or negative, it will be replaced by {@link Runtime#availableProcessors()}. Note that if
+ * {@link ImmutableGraph#numNodes()} is not implemented, the number of threads will be automatically set to one, possibly logging a warning.
+ * @param pl a progress logger to measure the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ private void storeInternal(final ImmutableGraph graph, final CharSequence basename, int numberOfThreads, final ProgressLogger pl) throws IOException {
+ int n;
+ if (numberOfThreads <= 0) numberOfThreads = Runtime.getRuntime().availableProcessors();
+ // Set the number of threads to that given by the specific property, if defined on the command line
+ numberOfThreads = Integer.parseInt(System.getProperty(NUMBER_OF_THREADS_PROPERTY, Integer.toString(numberOfThreads)));
+ try {
+ n = graph.numNodes();
+ }
+ catch (Exception e) {
+ n = -1;
+ }
+
+ if (numberOfThreads > 1 && (n == -1 || ! graph.hasCopiableIterators())) {
+ if (n == -1) LOGGER.warn("Number of nodes not available: using just one thread");
+ else LOGGER.warn("The source graph does not provide copiable iterators: using just one thread");
+ numberOfThreads = 1;
+ }
+
+ final ExecutorService executorService = Executors.newFixedThreadPool(numberOfThreads, new ThreadFactoryBuilder().setNameFormat("ProcessingThread-%d").build());
+ final ExecutorCompletionService<Void> executorCompletionService = new ExecutorCompletionService<>(executorService);
+
+ final CompressionThread[] compressionThread = new CompressionThread[numberOfThreads];
+
+ if (numberOfThreads == 1) {
+ executorCompletionService.submit(compressionThread[0] = new CompressionThread(0, n, graph.nodeIterator(), basename, STD_BUFFER_SIZE, pl));
+ }
+ else {
+ NodeIterator[] splitNodeIterators = graph.splitNodeIterators(numberOfThreads);
+ for(int i = numberOfThreads; i-- != 0;) {
+ File tempFile = File.createTempFile(BVGraph.class.getSimpleName(), "-tmp.graph");
+ tempFile.deleteOnExit();
+ executorCompletionService.submit(compressionThread[i] = new CompressionThread(i, n, splitNodeIterators[i], tempFile.toString(), MULTITHREAD_BUFFER_SIZE, pl));
+ }
+ }
+
+ Throwable problem = null;
+ for(int i = numberOfThreads; i-- != 0;)
+ try {
+ executorCompletionService.take().get();
+ }
+ catch(Exception e) {
+ problem = e.getCause(); // We keep only the last one. They will be logged anyway.
+ }
+
+ executorService.shutdown();
+ if (problem != null) {
+ Throwables.throwIfUnchecked(problem);
+ throw new RuntimeException(problem);
+ }
+
+ if (pl != null) pl.done();
+
+
+ if (numberOfThreads > 1) {
+ if (pl != null) pl.logger().info("Copying streams...");
+ OutputBitStream graphObs = new OutputBitStream(basename + GRAPH_EXTENSION, STD_BUFFER_SIZE);
+ OutputBitStream offsetsObs = new OutputBitStream(basename + OFFSETS_EXTENSION, STD_BUFFER_SIZE);
+ writeOffset(offsetsObs, 0);
+ for(CompressionThread t : compressionThread) {
+ if (t.nodeIterator == null) continue;
+ final File graphFile = new File(t.threadBasename.toString() + GRAPH_EXTENSION);
+ final File offsetFile = new File(t.threadBasename.toString() + OFFSETS_EXTENSION);
+ final InputBitStream graphIbs = new InputBitStream(graphFile);
+ final InputBitStream offsetsIbs = new InputBitStream(offsetFile);
+ readOffset(offsetsIbs); // Discard first zero
+ copy(graphIbs, graphObs, t.graphWrittenBits);
+ copy(offsetsIbs, offsetsObs, t.offsetsWrittenBits - offsetsIbs.position());
+ graphIbs.close();
+ offsetsIbs.close();
+ graphFile.delete();
+ offsetFile.delete();
+ }
+
+ graphObs.close();
+ offsetsObs.close();
+ if (pl != null) pl.logger().info("Copy completed.");
+ }
+
+ final DecimalFormat format = ((DecimalFormat)NumberFormat.getInstance(Locale.US));
+ format.applyPattern("0.###");
+
+ // Finally, we save all data related to this graph in a property file.
+ final Properties properties = new Properties();
+ n = graph.numNodes(); // At this point this *must* work (see ArcListASCIIGraph)
+ properties.setProperty("nodes", String.valueOf(n));
+ final long totLinks = aggregateLong(compressionThread, "totLinks");
+ properties.setProperty("arcs", String.valueOf(totLinks));
+ properties.setProperty("windowsize", String.valueOf(windowSize));
+ properties.setProperty("maxrefcount", String.valueOf(maxRefCount));
+ properties.setProperty("minintervallength", String.valueOf(minIntervalLength));
+ if (residualCoding == ZETA) properties.setProperty("zetak", String.valueOf(zetaK));
+ properties.setProperty("compressionflags", flags2String(flags).toString());
+ properties.setProperty("avgref", format.format((double)aggregateLong(compressionThread, "totRef") / n));
+ properties.setProperty("avgdist", format.format((double) aggregateLong(compressionThread, "totDist") / n));
+ properties.setProperty("copiedarcs", String.valueOf(aggregateLong(compressionThread, "copiedArcs")));
+ properties.setProperty("intervalisedarcs", String.valueOf(aggregateLong(compressionThread, "intervalisedArcs")));
+ properties.setProperty("residualarcs", String.valueOf(aggregateLong(compressionThread, "residualArcs")));
+ final long writtenBits = aggregateLong(compressionThread, "graphWrittenBits");
+ properties.setProperty("bitsperlink", format.format((double)writtenBits / totLinks));
+ properties.setProperty("compratio", format.format(writtenBits * Math.log(2) / (stirling((double)n * n) - stirling(totLinks) - stirling((double)n * n - totLinks))));
+ properties.setProperty("bitspernode", format.format((double)writtenBits / n));
+ properties.setProperty("avgbitsforoutdegrees", format.format((double)aggregateLong(compressionThread, "bitsForOutdegrees") / n));
+ properties.setProperty("avgbitsforreferences", format.format((double)aggregateLong(compressionThread, "bitsForReferences") / n));
+ properties.setProperty("avgbitsforblocks", format.format((double)aggregateLong(compressionThread, "bitsForBlocks") / n));
+ properties.setProperty("avgbitsforresiduals", format.format((double)aggregateLong(compressionThread, "bitsForResiduals") / n));
+ properties.setProperty("avgbitsforintervals", format.format((double)aggregateLong(compressionThread, "bitsForIntervals") / n));
+ properties.setProperty("bitsforoutdegrees", Long.toString(aggregateLong(compressionThread, "bitsForOutdegrees")));
+ properties.setProperty("bitsforreferences", Long.toString(aggregateLong(compressionThread, "bitsForReferences")));
+ properties.setProperty("bitsforblocks", Long.toString(aggregateLong(compressionThread, "bitsForBlocks")));
+ properties.setProperty("bitsforresiduals", Long.toString(aggregateLong(compressionThread, "bitsForResiduals")));
+ properties.setProperty("bitsforintervals", Long.toString(aggregateLong(compressionThread, "bitsForIntervals")));
+ properties.setProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY, this.getClass().getName());
+ properties.setProperty("version", String.valueOf(BVGRAPH_VERSION));
+ final FileOutputStream propertyFile = new FileOutputStream(basename + PROPERTIES_EXTENSION);
+ // Binned data
+ int l;
+ final long[] successorGapStats = aggregateStats(compressionThread, "successorGapStats");
+ for(l = successorGapStats.length; l-- != 0;) if (successorGapStats[l] != 0) break;
+ StringBuilder s = new StringBuilder();
+ BigInteger totGap = BigInteger.ZERO;
+ double totLogGap = 0;
+ long numGaps = 0;
+
+ long g = 1;
+ for(int i = 0; i <= l; i++) {
+ if (i != 0) s.append(',');
+ s.append(successorGapStats[i]);
+ numGaps += successorGapStats[i];
+ totGap = totGap.add(BigInteger.valueOf(g * 2 + g - 1).multiply(BigInteger.valueOf(successorGapStats[i])));
+ totLogGap += (Fast.log2(g * 2 + g + 1) - 1) * successorGapStats[i];
+ g *= 2;
+ }
+
+ properties.setProperty("successorexpstats", s.toString());
+ properties.setProperty("successoravggap", numGaps == 0 ? "0" : new BigDecimal(totGap).divide(BigDecimal.valueOf(numGaps * 2), 3, RoundingMode.HALF_EVEN).toString());
+ properties.setProperty("successoravgloggap", numGaps == 0 ? "0" : Double.toString(totLogGap / numGaps));
+
+ s.setLength(0);
+
+ final long[] residualGapStats = aggregateStats(compressionThread, "residualGapStats");
+ for(l = residualGapStats.length; l-- != 0;) if (residualGapStats[l] != 0) break;
+ g = 1;
+ numGaps = 0;
+ totLogGap = 0;
+ totGap = BigInteger.ZERO;
+ for(int i = 0; i <= l; i++) {
+ if (i != 0) s.append(',');
+ s.append(residualGapStats[i]);
+ totGap = totGap.add(BigInteger.valueOf(g * 2 + g - 1).multiply(BigInteger.valueOf(residualGapStats[i])));
+ totLogGap += (Fast.log2(g * 2 + g + 1) - 1) * residualGapStats[i];
+ numGaps += residualGapStats[i];
+ g *= 2;
+ }
+
+ properties.setProperty("residualexpstats", s.toString());
+ properties.setProperty("residualavggap", numGaps == 0 ? "0" : new BigDecimal(totGap).divide(BigDecimal.valueOf(numGaps * 2), 3, RoundingMode.HALF_EVEN).toString());
+ properties.setProperty("residualavgloggap", numGaps == 0 ? "0" : Double.toString(totLogGap / numGaps));
+
+ properties.store(propertyFile, "BVGraph properties");
+
+ propertyFile.close();
+
+ if (STATS) {
+ offsetStats.close();
+ referenceStats.close();
+ outdegreeStats.close();
+ blockCountStats.close();
+ blockStats.close();
+ intervalCountStats.close();
+ leftStats.close();
+ lenStats.close();
+ residualCountStats.close();
+ residualStats.close();
+ }
+ }
+
+ // TODO: remove then the new DSI utilities are out
+ private final static void copy(final InputBitStream ibs, final OutputBitStream obs, long length) throws IOException {
+ final byte[] buffer = new byte[64 * 1024];
+ while(length > 0) {
+ final int toRead = (int)Math.min(length, buffer.length * Byte.SIZE);
+ ibs.read(buffer, toRead);
+ obs.write(buffer, 0, toRead);
+ length -= toRead;
+ }
+ }
+
+ private static double stirling(double n) {
+ return n * Math.log(n) - n + (1./2) * Math.log(2 * Math.PI * n) ;
+ }
+
+ /** Write the offset file to a given bit stream.
+ * @param obs the output bit stream to which offsets will be written.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public void writeOffsets(final OutputBitStream obs, final ProgressLogger pl) throws IOException {
+ final BVGraphNodeIterator nodeIterator = (BVGraphNodeIterator) nodeIterator(0);
+ int n = numNodes();
+ long lastOffset = 0;
+ while(n-- != 0) {
+ // We fetch the current position of the underlying input bit stream, which is at the start of the next node.
+ writeOffset(obs, nodeIterator.ibs.readBits() - lastOffset);
+ lastOffset = nodeIterator.ibs.readBits();
+ nodeIterator.nextInt();
+ nodeIterator.outdegree();
+ nodeIterator.successorArray();
+ if (pl != null) pl.update();
+ }
+ writeOffset(obs, nodeIterator.ibs.readBits() - lastOffset);
+ }
+
+
+ /** Reads an immutable graph and stores it as a {@link BVGraph}. */
+ public static void main(String args[]) throws SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException, ClassNotFoundException, InstantiationException {
+ String source, dest;
+ Class<?> graphClass;
+ int flags = 0;
+
+ SimpleJSAP jsap = new SimpleJSAP(BVGraph.class.getName(), "Compresses differentially a graph. Source and destination are basenames from which suitable filenames will be stemmed; alternatively, if the suitable option was specified, source is a spec (see below). For more information about the compression techniques, see the Javadoc documentation.",
+ new Parameter[] {
+ new FlaggedOption("comp", JSAP.STRING_PARSER, null, JSAP.NOT_REQUIRED, 'c', "comp", "A compression flag (may be specified several times).").setAllowMultipleDeclarations(true),
+ new FlaggedOption("windowSize", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_WINDOW_SIZE), JSAP.NOT_REQUIRED, 'w', "window-size", "Reference window size (0 to disable)."),
+ new FlaggedOption("maxRefCount", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_MAX_REF_COUNT), JSAP.NOT_REQUIRED, 'm', "max-ref-count", "Maximum number of backward references (-1 for ∞)."),
+ new FlaggedOption("minIntervalLength", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_MIN_INTERVAL_LENGTH), JSAP.NOT_REQUIRED, 'i', "min-interval-length", "Minimum length of an interval (0 to disable)."),
+ new FlaggedOption("zetaK", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_ZETA_K), JSAP.NOT_REQUIRED, 'k', "zeta-k", "The k parameter for zeta-k codes."),
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, Integer.toString(Runtime.getRuntime().availableProcessors()), JSAP.NOT_REQUIRED, 't', "threads", "The number of threads."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("offline", 'o', "offline", "No-op for backward compatibility."),
+ new Switch("once", '1', "once", "Use the read-once load method to read a graph from standard input."),
+ new Switch("offsets", 'O', "offsets", "Generates offsets for the source graph."),
+ new Switch("list", 'L', "list", "Precomputes an Elias-Fano list of offsets for the source graph."),
+ new Switch("degrees", 'd', "degrees", "Stores the outdegrees of all nodes using &gamma; coding."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph, or a source spec if --spec was given; it is immaterial when --once is specified."),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph; if omitted, no recompression is performed. This is useful in conjunction with --offsets and --list."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ for(String compressionFlag: jsapResult.getStringArray("comp")) {
+ try {
+ flags |= BVGraph.class.getField(compressionFlag).getInt(BVGraph.class);
+ }
+ catch (Exception notFound) {
+ throw new JSAPException("Compression method " + compressionFlag + " unknown.");
+ }
+ }
+
+ final int windowSize = jsapResult.getInt("windowSize");
+ final int zetaK = jsapResult.getInt("zetaK");
+ int maxRefCount = jsapResult.getInt("maxRefCount");
+ if (maxRefCount == -1) maxRefCount = Integer.MAX_VALUE;
+ final int minIntervalLength = jsapResult.getInt("minIntervalLength");
+ final boolean once = jsapResult.getBoolean("once");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean writeOffsets = jsapResult.getBoolean("offsets");
+ final boolean list = jsapResult.getBoolean("list");
+ final boolean degrees = jsapResult.getBoolean("degrees");
+ final int numberOfThreads = jsapResult.getInt("threads");
+ graphClass = jsapResult.getClass("graphClass");
+ source = jsapResult.getString("sourceBasename");
+ dest = jsapResult.getString("destBasename");
+
+ final ImmutableGraph graph;
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ if (graphClass != null) {
+ if (spec) {
+ System.err.println("Options --graph-class and --spec are incompatible");
+ System.exit(1);
+ }
+ if (once) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.ONCE.toMethod(), InputStream.class).invoke(null, System.in);
+ else graph = (ImmutableGraph)graphClass.getMethod(numberOfThreads == 1 ? LoadMethod.OFFLINE.toMethod() : LoadMethod.MAPPED.toMethod(), CharSequence.class).invoke(null, source);
+ }
+ else {
+ if (!spec) graph = once ? ImmutableGraph.loadOnce(System.in) : numberOfThreads == 1 || dest == null ? ImmutableGraph.loadOffline(source, pl) : ImmutableGraph.loadMapped(source, pl);
+ else graph = ObjectParser.fromSpec(source, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ }
+
+ if (dest != null) {
+ if (writeOffsets || list || degrees) throw new IllegalArgumentException("You cannot specify a destination graph with these options");
+ BVGraph.store(graph, dest, windowSize, maxRefCount, minIntervalLength, zetaK, flags, numberOfThreads, pl);
+ }
+ else {
+ if (! (graph instanceof BVGraph)) throw new IllegalArgumentException("The source graph is not a BVGraph");
+ final BVGraph bvGraph = (BVGraph)graph;
+ if (writeOffsets) {
+ final OutputBitStream offsets = new OutputBitStream(graph.basename() + OFFSETS_EXTENSION, 64 * 1024);
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Writing offsets...");
+ ((BVGraph)graph).writeOffsets(offsets, pl);
+ offsets.close();
+ pl.count = graph.numNodes();
+ pl.done();
+ }
+ if (list) {
+ final InputBitStream offsets = new InputBitStream(graph.basename() + OFFSETS_EXTENSION);
+ BinIO.storeObject(new EliasFanoMonotoneLongBigList(graph.numNodes() + 1, new File(graph.basename() + GRAPH_EXTENSION).length() * Byte.SIZE + 1, new OffsetsLongIterator(bvGraph, offsets)), graph.basename() + OFFSETS_BIG_LIST_EXTENSION);
+ offsets.close();
+ }
+ if (degrees) {
+ final OutputBitStream outdegrees = new OutputBitStream(graph.basename() + OUTDEGREES_EXTENSION, 64 * 1024);
+ NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = graph.numNodes(); i-- != 0;) {
+ nodeIterator.nextInt();
+ outdegrees.writeGamma(nodeIterator.outdegree());
+ }
+
+ outdegrees.close();
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/BuildHostMap.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/BuildHostMap.java
new file mode 100644
index 0000000..2bd562d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/BuildHostMap.java
@@ -0,0 +1,126 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2008-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.regex.Pattern;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.google.common.net.InternetDomainName;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** A class computing host-related data given a list of URLs (usually, the URLs of the nodes of a web graph).
+ * All processing is performed by the static utility method {@link #run(BufferedReader, PrintStream, DataOutputStream, DataOutputStream, boolean, ProgressLogger)}.
+ *
+ * <p><strong>Warning:</strong> this class provides a main method that saves the host list to standard output, but it
+ * does some logging, too, so be careful not to log to standard output.
+ *
+ * @author Sebastiano Vigna
+ */
+public class BuildHostMap {
+ private final static Logger LOGGER = LoggerFactory.getLogger(BuildHostMap.class);
+
+ public static final Pattern DOTTED_ADDRESS = Pattern.compile("(([0-9A-Fa-f]+[:])*[0-9A-Fa-f]+)|((((0x[0-9A-Fa-f]+)|([0-9]+))\\.)*((0x[0-9A-Fa-f]+)|([0-9]+)))");
+
+ /** This method reads URLs and writes hosts (or, possibly, top private domains), together with a map
+ * from URLs to hosts and a host count.
+ *
+ * @param br the buffered reader returning the list of URLs.
+ * @param hosts the print stream where hosts will be printed.
+ * @param mapDos the data output stream where the map from URLs to hosts will be written (one integer per URL).
+ * @param countDos the data output stream where the host counts will be written (one integer per host).
+ * @param topPrivateDomain if true, we use {@link InternetDomainName#topPrivateDomain()} to map to top private domains, rather than hosts.
+ * @param pl a progress logger, or {@code null}.
+ */
+ public static void run(final BufferedReader br, final PrintStream hosts, final DataOutputStream mapDos, final DataOutputStream countDos, final boolean topPrivateDomain, ProgressLogger pl) throws IOException, URISyntaxException {
+ Object2IntOpenHashMap<String> map = new Object2IntOpenHashMap<>();
+ int[] count = new int[1024];
+ map.defaultReturnValue(-1);
+ int hostIndex = -1;
+
+ if (pl != null) pl.start("Reading URLS...");
+ for(String s, name; (s = br.readLine()) != null;) {
+ final URI uri = new URI(s);
+ name = uri.getHost();
+ if (name == null) throw new IllegalArgumentException();
+ if (topPrivateDomain) {
+ if (! DOTTED_ADDRESS.matcher(name).matches()) {
+ final InternetDomainName idn = InternetDomainName.from(name);
+ if (idn.isUnderPublicSuffix()) name = idn.topPrivateDomain().toString();
+ }
+ }
+
+ if ((hostIndex = map.getInt(name)) == -1) {
+ hosts.println(name);
+ map.put(name, hostIndex = map.size());
+ }
+ mapDos.writeInt(hostIndex);
+ count = IntArrays.grow(count, hostIndex + 1);
+ count[hostIndex]++;
+ if (pl != null) pl.lightUpdate();
+ }
+
+ BinIO.storeInts(count, 0, map.size(), countDos);
+ if (pl != null) pl.done();
+ }
+
+
+ public static void main(String[] arg) throws IOException, JSAPException, URISyntaxException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(BuildHostMap.class.getName(), "Reads a list of URLs from standard input, computes the host map and counts and saves the host list to standard output.",
+ new Parameter[] {
+ new Switch("topPrivateDomain", 't', "top-private-domain", "Use top private domains instead of hosts."),
+ new UnflaggedOption("map", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the host map will be stored as a list of integers in DataOutput format."),
+ new UnflaggedOption("counts", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the host count will be stored as a list of integers in DataOutput format.")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+ final BufferedReader fbr = new BufferedReader(new InputStreamReader(System.in, Charsets.ISO_8859_1));
+ final DataOutputStream mapDos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(jsapResult.getString("map"))));
+ final DataOutputStream countDos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(jsapResult.getString("counts"))));
+ run(fbr, System.out, mapDos, countDos, jsapResult.getBoolean("topPrivateDomain"), new ProgressLogger(LOGGER));
+ mapDos.close();
+ countDos.close();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Check.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Check.java
new file mode 100644
index 0000000..da6b0f5
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Check.java
@@ -0,0 +1,171 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.webgraph.Transform.ensureNumArgs;
+import static it.unimi.dsi.webgraph.Transform.load;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.File;
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Static methods that check properties of immutable graphs. */
+
+public class Check {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(Check.class);
+
+ private Check() {}
+
+ /** Check whether a graph is symmetric using {@link Transform#transpose(ImmutableGraph, ProgressLogger)}.
+ *
+ * @param graph a graph.
+ * @return whether <code>graph</code> is symmetric.
+ */
+ public static boolean symmetry(final ImmutableGraph graph) {
+ return symmetry(graph, null);
+ }
+
+ /** Check whether a graph is symmetric using {@link Transform#transpose(ImmutableGraph, ProgressLogger)}.
+ *
+ * @param graph a graph.
+ * @param pl passed to {@link Transform#transpose(ImmutableGraph, ProgressLogger)}.
+ * @return whether <code>graph</code> is symmetric.
+ */
+ public static boolean symmetry(final ImmutableGraph graph, final ProgressLogger pl) {
+ return graph.equals(Transform.transpose(graph, pl));
+ }
+
+ /** Check whether a graph is symmetric using {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ *
+ * @param graph a graph.
+ * @param batchSize passed to {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ * @return whether <code>graph</code> is symmetric.
+ */
+ public static boolean symmetryOffline(final ImmutableGraph graph, final int batchSize) throws IOException {
+ return symmetryOffline(graph, batchSize, null);
+ }
+
+ /** Check whether a graph is symmetric using {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ *
+ * @param graph a graph.
+ * @param batchSize passed to {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ * @param tempDir passed to {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ * @return whether <code>graph</code> is symmetric.
+ */
+ public static boolean symmetryOffline(final ImmutableGraph graph, final int batchSize, final File tempDir) throws IOException {
+ return symmetryOffline(graph, batchSize, tempDir, null);
+ }
+
+ /** Check whether a graph is symmetric using {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ *
+ * @param graph a graph.
+ * @param batchSize passed to {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ * @param tempDir passed to {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ * @param pl passed to {@link Transform#transposeOffline(ImmutableGraph, int, File, ProgressLogger)}.
+ * @return whether <code>graph</code> is symmetric.
+ */
+ public static boolean symmetryOffline(final ImmutableGraph graph, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ return graph.equals(Transform.transposeOffline(graph, batchSize, tempDir, pl));
+ }
+
+
+ public static void main(String args[]) throws IOException, IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException {
+ Class<?> graphClass = null;
+ boolean offline = false;
+
+ SimpleJSAP jsap = new SimpleJSAP(Check.class.getName(),
+ "Checks properties of a graph. All checks require, after the name,\n" +
+ "some parameters specified below:\n" +
+ "\n" +
+ "symmetry sourceBasename\n" +
+ "symmetryOffline sourceBasename [batchSize] [tempDir]\n" +
+ "\n" +
+ "Please consult the Javadoc documentation for more information on each check.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class to load the graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("offline", 'o', "offline", "Use the offline load method to reduce memory consumption."),
+ new Switch("sequential", 'S', "sequential", "Equivalent to offline."),
+ new UnflaggedOption("check", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The check."),
+ new UnflaggedOption("param", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.GREEDY, "The remaining parameters."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ graphClass = jsapResult.getClass("graphClass");
+ offline = jsapResult.getBoolean("offline") || jsapResult.getBoolean("sequential");
+ String check = jsapResult.getString("check");
+ String[] param = jsapResult.getStringArray("param");
+
+ String source[] = null;
+ int batchSize = 1000000;
+ File tempDir = null;
+
+ if (! ensureNumArgs(param, -1)) return;
+
+ if (check.equals("symmetry")) {
+ if (! ensureNumArgs(param, 1)) return;
+ source = new String[] { param[0] };
+ }
+ else if (check.equals("symmetryOffline")) {
+ source = new String[] { param[0] };
+ if (param.length >= 2) {
+ batchSize = ((Integer)JSAP.INTSIZE_PARSER.parse(param[1])).intValue();
+ if (param.length == 3) tempDir = new File(param[2]);
+ else if (! ensureNumArgs(param, 2)) return;
+ }
+ else if (! ensureNumArgs(param, 1)) return;
+ }
+ else {
+ System.err.println("Unknown check: " + check);
+ return;
+ }
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ final ImmutableGraph[] graph = new ImmutableGraph[source.length];
+
+ for (int i = 0; i < source.length; i++)
+ if (source[i] == null) graph[i] = null;
+ else graph[i] = load(graphClass, source[i], offline, pl);
+
+ if (check.equals("symmetry")) {
+ System.out.println(symmetry(graph[0], pl));
+ }
+ else if (check.equals("symmetryOffline")) {
+ System.out.println(symmetryOffline(graph[0], batchSize, tempDir, pl));
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/CompressionFlags.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/CompressionFlags.java
new file mode 100644
index 0000000..1ed2527
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/CompressionFlags.java
@@ -0,0 +1,50 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2006-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+/** This interface provides constants to be used as compression flags. */
+
+
+public interface CompressionFlags {
+
+ /** &delta; coding (see {@link it.unimi.dsi.io.OutputBitStream#writeDelta(int)}). */
+ public static final int DELTA = 1;
+
+ /** &gamma; coding (see {@link it.unimi.dsi.io.OutputBitStream#writeGamma(int)}). */
+ public static final int GAMMA = 2;
+
+ /** Golomb coding (see {@link it.unimi.dsi.io.OutputBitStream#writeGolomb(int,int)}). */
+ public static final int GOLOMB = 3;
+
+ /** Skewed Golomb coding (see {@link it.unimi.dsi.io.OutputBitStream#writeSkewedGolomb(int,int)}). */
+ public static final int SKEWED_GOLOMB = 4;
+
+ /** Unary coding (see {@link it.unimi.dsi.io.OutputBitStream#writeUnary(int)}). */
+ public static final int UNARY = 5;
+
+ /** &zeta;<sub><var>k</var></sub> coding (see {@link it.unimi.dsi.io.OutputBitStream#writeZeta(int,int)}). */
+ public static final int ZETA = 6;
+
+ /** Variable-length nibble coding (see {@link it.unimi.dsi.io.OutputBitStream#writeNibble(int)}). */
+ public static final int NIBBLE = 7;
+
+ public static final String[] CODING_NAME = { "DEFAULT", "DELTA", "GAMMA", "GOLOMB", "SKEWED_GOLOMB", "UNARY", "ZETA", "NIBBLE" };
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/DegreeRangeImmutableSubgraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/DegreeRangeImmutableSubgraph.java
new file mode 100644
index 0000000..d0d624f
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/DegreeRangeImmutableSubgraph.java
@@ -0,0 +1,80 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.lang.ObjectParser;
+
+import java.io.IOException;
+
+/** A subclass of {@link ImmutableSubgraph} exposing the subgraph formed by nodes whose outdegree is in a given range.
+ *
+ * <p>Note that the {@linkplain #DegreeRangeImmutableSubgraph(String, String, String, String) string-based constructors} can be
+ * used with an {@link ObjectParser} to specify a graph on the command line.
+ */
+
+public class DegreeRangeImmutableSubgraph extends ImmutableSubgraph {
+ protected static int[] createMap(final ImmutableGraph graph, final int minDegree, final int maxDegree) {
+ final IntArrayList map = new IntArrayList();
+ final int n = graph.numNodes();
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = 0; i < n; i++) {
+ nodeIterator.nextInt();
+ final int d = nodeIterator.outdegree();
+ if (d >= minDegree && d < maxDegree) map.add(i);
+ }
+ return map.toIntArray();
+ }
+
+ /** Create a subgraph formed by the nodes with outdegree in a specified range.
+ *
+ * @param graph the supergraph.
+ * @param minDegree the minimum outdegree (inclusive).
+ * @param maxDegree the maximum outdegree (exclusive).
+ */
+ public DegreeRangeImmutableSubgraph(final ImmutableGraph graph, final int minDegree, final int maxDegree) {
+ super(graph, createMap(graph, minDegree, maxDegree));
+ }
+
+ /** Create a subgraph formed by the nodes with outdegree in a specified range.
+ *
+ * <p>This is a string-based constructor that can be used with an {@link ObjectParser}.
+ *
+ * @param graph the supergraph.
+ * @param minDegree the minimum outdegree (inclusive).
+ * @param maxDegree the meximum outdegree (exclusive).
+ */
+ public DegreeRangeImmutableSubgraph(final String graph, final String minDegree, final String maxDegree) throws IOException {
+ this(graph, minDegree, maxDegree, "false");
+ }
+
+ /** Create a subgraph formed by the nodes with outdegree in a specified range.
+ *
+ * <p>This is a string-based constructor that can be used with an {@link ObjectParser}.
+ *
+ * @param graph the supergraph.
+ * @param minDegree the minimum outdegree (inclusive).
+ * @param maxDegree the maximum outdegree (exclusive).
+ * @param mapped if true, the supergraph will be loaded with {@link ImmutableGraph#loadMapped(CharSequence, it.unimi.dsi.logging.ProgressLogger)} instead
+ * of {@link ImmutableGraph#load(CharSequence, it.unimi.dsi.logging.ProgressLogger)}.
+ */
+ public DegreeRangeImmutableSubgraph(final String graph, final String minDegree, final String maxDegree, final String mapped) throws IOException {
+ this(Boolean.parseBoolean(mapped) ? ImmutableGraph.loadMapped(graph) : ImmutableGraph.load(graph), Integer.parseInt(minDegree), Integer.parseInt(maxDegree));
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/EFGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/EFGraph.java
new file mode 100644
index 0000000..d888796
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/EFGraph.java
@@ -0,0 +1,1306 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.bits.Fast.MSBS_STEP_8;
+import static it.unimi.dsi.bits.Fast.ONES_STEP_4;
+import static it.unimi.dsi.bits.Fast.ONES_STEP_8;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.RandomAccessFile;
+import java.lang.reflect.InvocationTargetException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.LongBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.WritableByteChannel;
+import java.text.DecimalFormat;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.BigArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrayBigList;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+import it.unimi.dsi.util.ByteBufferLongBigList;
+
+/** An immutable graph based on the Elias&ndash;Fano representation of monotone sequences.
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class EFGraph extends ImmutableGraph {
+ private static final Logger LOGGER = LoggerFactory.getLogger(EFGraph.class);
+
+ /** The standard extension for the graph longword bit stream. */
+ public static final String GRAPH_EXTENSION = ".graph";
+ /** The standard extension for the graph-offsets bit stream. */
+ public static final String OFFSETS_EXTENSION = ".offsets";
+ /** The standard extension for the cached {@link LongBigList} containing the graph offsets. */
+ public static final String OFFSETS_BIG_LIST_EXTENSION = ".obl";
+ /** The default size of the bit cache. */
+ public final static int DEFAULT_CACHE_SIZE = 16 * 1024 * 1024;
+ /** This number classifies the present graph format. When new features require introducing binary
+ * incompatibilities, this number is bumped so to ensure that old classes do not try to read
+ * graphs they cannot understand. */
+ public final static int EFGRAPH_VERSION = 0;
+ /** The default base-two logarithm of the quantum. */
+ public static final int DEFAULT_LOG_2_QUANTUM = 8;
+
+ /** The number of nodes of the graph. */
+ protected final int n;
+ /** The upper bound used during the graph construction (greater than or equal to {@link #n}. */
+ protected final int upperBound;
+ /** The number of arcs of the graph. */
+ protected final long m;
+ /** The list containing the graph. */
+ protected final LongBigList graph;
+ /** An Elias&ndash;Fano monotone list containing the pointers of the bit streams stored in
+ * {@link #graph}. */
+ protected final LongBigList offsets;
+ /** The basename of this graph (or possibly <code>null</code>). */
+ protected final CharSequence basename;
+ /** A longword bit reader used to read outdegrees. */
+ protected final LongWordBitReader outdegreeLongWordBitReader;
+ /** The base-two logarithm of the indexing quantum. */
+ protected final int log2Quantum;
+ /** If not {@link Integer#MIN_VALUE}, the node whose degree is cached in {@link #cachedOutdegree}. */
+ protected int cachedNode;
+ /** If {@link #cachedNode} is not {@link Integer#MIN_VALUE}, its cached outdegree. */
+ protected int cachedOutdegree;
+ /** If {@link #cachedNode} is not {@link Integer#MIN_VALUE}, the position immediately after the
+ * coding of the outdegree of {@link #cachedNode}. */
+ protected long cachedPointer;
+
+ protected EFGraph(final CharSequence basename, final int n, final long m, final int upperBound, final int log2Quantum, LongBigList graph, LongBigList offsets) {
+ this.basename = basename;
+ this.n = n;
+ this.m = m;
+ this.upperBound = upperBound;
+ this.log2Quantum = log2Quantum;
+ this.graph = graph;
+ this.offsets = offsets;
+ outdegreeLongWordBitReader = new LongWordBitReader(graph, 0);
+ cachedNode = Integer.MIN_VALUE;
+ }
+
+ @Override
+ public CharSequence basename() {
+ return basename;
+ }
+
+ /** Returns the number of lower bits for the Elias&ndash;Fano encoding of a list of given length,
+ * upper bound and strictness.
+ *
+ * @param length the number of elements of the list.
+ * @param upperBound an upper bound for the elements of the list.
+ * @return the number of bits for the Elias&ndash;Fano encoding of a list with the specified
+ * parameters. */
+ public static int lowerBits(final long length, final long upperBound) {
+ return length == 0 ? 0 : Math.max(0, Fast.mostSignificantBit(upperBound / length));
+ }
+
+ /** Returns the size in bits of forward or skip pointers to the Elias&ndash;Fano encoding of a
+ * list of given length, upper bound and strictness.
+ *
+ * @param length the number of elements of the list.
+ * @param upperBound an upper bound for the elements of the list.
+ * @return the size of bits of forward or skip pointers the Elias&ndash;Fano encoding of a list
+ * with the specified parameters. */
+ public static int pointerSize(final long length, final long upperBound) {
+ return Math.max(0, Fast.ceilLog2(length + (upperBound >>> lowerBits(length, upperBound))));
+ }
+
+ /** Returns the number of forward or skip pointers to the Elias&ndash;Fano encoding of a list of
+ * given length, upper bound and strictness.
+ *
+ * @param length the number of elements of the list.
+ * @param upperBound an upper bound for the elements of the list.
+ * @param log2Quantum the logarithm of the quantum size.
+ * @return an upper bound on the number of skip pointers or the (exact) number of forward
+ * pointers. */
+ public static long numberOfPointers(final long length, final long upperBound, final int log2Quantum) {
+ if (length == 0) return 0;
+ return (upperBound >>> lowerBits(length, upperBound)) >>> log2Quantum;
+ }
+
+ protected final static class LongWordCache implements Closeable {
+ /** The spill file. */
+ private final File spillFile;
+ /** A channel opened on {@link #spillFile}. */
+ private final FileChannel spillChannel;
+ /** A cache for longwords. Will be spilled to {@link #spillChannel} in case more than
+ * {@link #cacheLength} bits are added. */
+ private final ByteBuffer cache;
+ /** The current bit buffer. */
+ private long buffer;
+ /** The current number of free bits in {@link #buffer}. */
+ private int free;
+ /** The length of the cache, in bits. */
+ private final long cacheLength;
+ /** The number of bits currently stored. */
+ private long length;
+ /** Whether {@link #spillChannel} should be repositioned at 0 before usage. */
+ private boolean spillMustBeRewind;
+
+ @SuppressWarnings("resource")
+ public LongWordCache(final int cacheSize, final String suffix) throws IOException {
+ spillFile = File.createTempFile(EFGraph.class.getName(), suffix);
+ spillFile.deleteOnExit();
+ spillChannel = new RandomAccessFile(spillFile, "rw").getChannel();
+ cache = ByteBuffer.allocateDirect(cacheSize).order(ByteOrder.nativeOrder());
+ cacheLength = cacheSize * 8L;
+ free = Long.SIZE;
+ }
+
+ private void flushBuffer() throws IOException {
+ cache.putLong(buffer);
+ if (!cache.hasRemaining()) {
+ if (spillMustBeRewind) {
+ spillMustBeRewind = false;
+ spillChannel.position(0);
+ }
+ cache.flip();
+ spillChannel.write(cache);
+ cache.clear();
+ }
+ }
+
+ public int append(final long value, final int width) throws IOException {
+ assert width == Long.SIZE || (-1L << width & value) == 0;
+ buffer |= value << (Long.SIZE - free);
+ length += width;
+
+ if (width < free) free -= width;
+ else {
+ flushBuffer();
+
+ if (width == free) {
+ buffer = 0;
+ free = Long.SIZE;
+ }
+ else {
+ // free < Long.SIZE
+ buffer = value >>> free;
+ free = Long.SIZE - width + free; // width > free
+ }
+ }
+ return width;
+ }
+
+ public void clear() {
+ length = buffer = 0;
+ free = Long.SIZE;
+ cache.clear();
+ spillMustBeRewind = true;
+ }
+
+ @Override
+ public void close() throws IOException {
+ spillChannel.close();
+ spillFile.delete();
+ }
+
+ public long length() {
+ return length;
+ }
+
+ public void writeUnary(int l) throws IOException {
+ if (l >= free) {
+ // Phase 1: align
+ l -= free;
+ length += free;
+ flushBuffer();
+
+ // Phase 2: jump over longwords
+ buffer = 0;
+ free = Long.SIZE;
+ while (l >= Long.SIZE) {
+ flushBuffer();
+ l -= Long.SIZE;
+ length += Long.SIZE;
+ }
+ }
+
+ append(1L << l, l + 1);
+ }
+
+ public long readLong() throws IOException {
+ if (!cache.hasRemaining()) {
+ cache.clear();
+ spillChannel.read(cache);
+ cache.flip();
+ }
+ return cache.getLong();
+ }
+
+ public void rewind() throws IOException {
+ if (free != Long.SIZE) cache.putLong(buffer);
+
+ if (length > cacheLength) {
+ cache.flip();
+ spillChannel.write(cache);
+ spillChannel.position(0);
+ cache.clear();
+ spillChannel.read(cache);
+ cache.flip();
+ }
+ else cache.rewind();
+ }
+ }
+
+ public final static class LongWordOutputBitStream {
+ private static final int BUFFER_SIZE = 64 * 1024;
+
+ /** The 64-bit buffer, whose upper {@link #free} bits do not contain data. */
+ private long buffer;
+ /** The Java nio buffer used to write with prescribed endianness. */
+ private final ByteBuffer byteBuffer;
+ /** The number of upper free bits in {@link #buffer} (strictly positive). */
+ private int free;
+ /** The output channel. */
+ private final WritableByteChannel writableByteChannel;
+
+ public LongWordOutputBitStream(final WritableByteChannel writableByteChannel, final ByteOrder byteOrder) {
+ this.writableByteChannel = writableByteChannel;
+ byteBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE).order(byteOrder);
+ free = Long.SIZE;
+ }
+
+ public int append(final long value, final int width) throws IOException {
+ assert width == Long.SIZE || (-1L << width & value) == 0;
+ buffer |= value << (Long.SIZE - free);
+
+ if (width < free) free -= width;
+ else {
+ byteBuffer.putLong(buffer); // filled
+ if (!byteBuffer.hasRemaining()) {
+ byteBuffer.flip();
+ writableByteChannel.write(byteBuffer);
+ byteBuffer.clear();
+ }
+
+ if (width == free) {
+ buffer = 0;
+ free = Long.SIZE;
+ }
+ else {
+ // free < Long.SIZE
+ buffer = value >>> free;
+ free = Long.SIZE - width + free; // width > free
+ }
+ }
+ return width;
+ }
+
+ public long append(final long[] value, final long length) throws IOException {
+ long l = length;
+ for (int i = 0; l > 0; i++) {
+ final int width = (int)Math.min(l, Long.SIZE);
+ append(value[i], width);
+ l -= width;
+ }
+
+ return length;
+ }
+
+ public long append(final LongBigList value, final long length) throws IOException {
+ long l = length;
+ for (long i = 0; l > 0; i++) {
+ final int width = (int)Math.min(l, Long.SIZE);
+ append(value.getLong(i), width);
+ l -= width;
+ }
+
+ return length;
+ }
+
+ public long append(final LongArrayBitVector bv) throws IOException {
+ return append(bv.bits(), bv.length());
+ }
+
+ public long append(final LongWordCache cache) throws IOException {
+ long l = cache.length();
+ cache.rewind();
+ while (l > 0) {
+ final int width = (int)Math.min(l, Long.SIZE);
+ append(cache.readLong(), width);
+ l -= width;
+ }
+
+ return cache.length();
+ }
+
+ public int align() throws IOException {
+ if (free != Long.SIZE) {
+ byteBuffer.putLong(buffer); // partially filled
+ if (!byteBuffer.hasRemaining()) {
+ byteBuffer.flip();
+ writableByteChannel.write(byteBuffer);
+ byteBuffer.clear();
+ }
+
+ final int result = free;
+ buffer = 0;
+ free = Long.SIZE;
+ return result;
+ }
+
+ return 0;
+ }
+
+ public int writeNonZeroGamma(long value) throws IOException {
+ if (value <= 0) throw new IllegalArgumentException("The argument " + value + " is not strictly positive.");
+ final int msb = Fast.mostSignificantBit(value);
+ final long unary = 1L << msb;
+ append(unary, msb + 1);
+ append(value ^ unary, msb);
+ return 2 * msb + 1;
+ }
+
+ public int writeGamma(long value) throws IOException {
+ if (value < 0) throw new IllegalArgumentException("The argument " + value + " is negative.");
+ return writeNonZeroGamma(value + 1);
+ }
+
+ public void close() throws IOException {
+ byteBuffer.putLong(buffer);
+ byteBuffer.flip();
+ writableByteChannel.write(byteBuffer);
+ writableByteChannel.close();
+ }
+ }
+
+ protected final static class Accumulator implements Closeable {
+ /** The minimum size in bytes of a {@link LongWordCache}. */
+ private static final int MIN_CACHE_SIZE = 16;
+ /** The accumulator for successors (to zeros or ones). */
+ private final LongWordCache successors;
+ /** The accumulator for high bits. */
+ private final LongWordCache upperBits;
+ /** The accumulator for low bits. */
+ private final LongWordCache lowerBits;
+ /** The number of lower bits. */
+ private int l;
+ /** A mask extracting the {@link #l} lower bits. */
+ private long lowerBitsMask;
+ /** The number of elements that will be added to this list. */
+ private long length;
+ /** The current length of the list. */
+ private long currentLength;
+ /** The current prefix sum (decremented by {@link #currentLength} if {@link #strict} is
+ * true). */
+ private long currentPrefixSum;
+ /** An upper bound to the sum of all values that will be added to the list (decremented by
+ * {@link #currentLength} if {@link #strict} is true). */
+ private long correctedUpperBound;
+ /** The logarithm of the indexing quantum. */
+ private int log2Quantum;
+ /** The indexing quantum. */
+ private long quantum;
+ /** The size of a pointer (the ceiling of the logarithm of {@link #maxUpperBits}). */
+ private int pointerSize;
+ /** The last position where a one was set. */
+ private long lastOnePosition;
+ /** The expected number of points. */
+ private long expectedNumberOfPointers;
+ /** The number of bits used for the upper-bits array. */
+ public long bitsForUpperBits;
+ /** The number of bits used for the lower-bits array. */
+ public long bitsForLowerBits;
+ /** The number of bits used for forward/skip pointers. */
+ public long bitsForPointers;
+
+ public Accumulator(int bufferSize, int log2Quantum) throws IOException {
+ // A reasonable logic to allocate space.
+ bufferSize = bufferSize & -bufferSize; // Ensure power of 2.
+ /* Very approximately, half of the cache for lower, half for upper, and a small fraction
+ * (8/quantum) for pointers. This will generate a much larger cache than expected if
+ * quantum is very small. */
+ successors = new LongWordCache(Math.max(MIN_CACHE_SIZE, bufferSize >>> Math.max(3, log2Quantum - 3)), "pointers");
+ lowerBits = new LongWordCache(Math.max(MIN_CACHE_SIZE, bufferSize / 2), "lower");
+ upperBits = new LongWordCache(Math.max(MIN_CACHE_SIZE, bufferSize / 2), "upper");
+ }
+
+ public int lowerBits() {
+ return l;
+ }
+
+ public int pointerSize() {
+ return pointerSize;
+ }
+
+ public long numberOfPointers() {
+ return expectedNumberOfPointers;
+ }
+
+ public void init(final long length, final long upperBound, final boolean strict, final boolean indexZeroes, final int log2Quantum) {
+ this.log2Quantum = log2Quantum;
+ this.length = length;
+ quantum = 1L << log2Quantum;
+ successors.clear();
+ lowerBits.clear();
+ upperBits.clear();
+ correctedUpperBound = upperBound - (strict ? length : 0);
+ final long correctedLength = length + (!strict && indexZeroes ? 1 : 0); // The length, including the final terminator
+ if (correctedUpperBound < 0) throw new IllegalArgumentException();
+
+ currentPrefixSum = 0;
+ currentLength = 0;
+ lastOnePosition = -1;
+
+ l = EFGraph.lowerBits(correctedLength, upperBound);
+
+
+ lowerBitsMask = (1L << l) - 1;
+
+ pointerSize = EFGraph.pointerSize(correctedLength, upperBound);
+ expectedNumberOfPointers = EFGraph.numberOfPointers(correctedLength, upperBound, log2Quantum);
+ // System.err.println("l = " + l + " numberOfPointers = " + expectedNumberOfPointers +
+ // " pointerSize = " + pointerSize);
+ }
+
+ public void add(final long x) throws IOException {
+ if (currentLength != 0 && x == 0) throw new IllegalArgumentException();
+ // System.err.println("add(" + x + "), l = " + l + ", length = " + length);
+ currentPrefixSum += x;
+ if (currentPrefixSum > correctedUpperBound) throw new IllegalArgumentException("Too large prefix sum: " + currentPrefixSum + " >= " + correctedUpperBound);
+ if (l != 0) lowerBits.append(currentPrefixSum & lowerBitsMask, l);
+ final long onePosition = (currentPrefixSum >>> l) + currentLength;
+
+ upperBits.writeUnary((int)(onePosition - lastOnePosition - 1));
+
+ long zeroesBefore = lastOnePosition - currentLength + 1;
+ for (long position = lastOnePosition + (zeroesBefore & -1L << log2Quantum) + quantum - zeroesBefore; position < onePosition; position += quantum, zeroesBefore += quantum)
+ successors.append(position + 1, pointerSize);
+
+ lastOnePosition = onePosition;
+ currentLength++;
+ }
+
+ public long dump(final LongWordOutputBitStream lwobs) throws IOException {
+ if (currentLength != length) throw new IllegalStateException();
+ // Add last fictional document pointer equal to the number of documents.
+ add(correctedUpperBound - currentPrefixSum);
+ assert pointerSize == 0 || successors.length() / pointerSize == expectedNumberOfPointers : "Expected " + expectedNumberOfPointers + " pointers, found " + successors.length() / pointerSize;
+ // System.err.println("pointerSize :" + pointerSize);
+ bitsForPointers = lwobs.append(successors);
+ // System.err.println("pointers: " + bitsForPointers);
+ bitsForLowerBits = lwobs.append(lowerBits);
+ // System.err.println("lower: " + bitsForLowerBits);
+ bitsForUpperBits = lwobs.append(upperBits);
+ // System.err.println("upper: " + bitsForUpperBits);
+ return bitsForLowerBits + bitsForUpperBits + bitsForPointers;
+ }
+
+ @Override
+ public void close() throws IOException {
+ successors.close();
+ upperBits.close();
+ lowerBits.close();
+ }
+ }
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, with no
+ * progress logger and all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph. */
+ public static EFGraph load(CharSequence basename) throws IOException {
+ return loadInternal(basename, false, null);
+ }
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, with
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph. */
+ public static EFGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return loadInternal(basename, false, pl);
+ }
+
+ /** Creates a new {@link EFGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the
+ * offsets. */
+ public static EFGraph loadMapped(CharSequence basename) throws IOException {
+ return loadInternal(basename, true, null);
+ }
+
+ /** Creates a new {@link EFGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the offsets, or <code>null</code>.
+ * @return an {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the
+ * offsets. */
+ public static EFGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return loadInternal(basename, true, pl);
+ }
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, without
+ * offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence, ProgressLogger)} or {@link #loadMapped(CharSequence, ProgressLogger)} instead.
+ */
+ @Deprecated
+ public static EFGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return EFGraph.load(basename, pl);
+ }
+
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, with no
+ * progress logger and without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @deprecated Use {@link #loadOffline(CharSequence)} or {@link #loadMapped(CharSequence)} instead.
+ */
+ @Deprecated
+ public static EFGraph loadSequential(CharSequence basename) throws IOException {
+ return EFGraph.load(basename);
+ }
+
+ /** Creates a new {@link EFGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata. */
+ public static EFGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return EFGraph.loadMapped(basename, pl);
+ }
+
+
+
+ /** Creates a new {@link EFGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata. */
+ public static EFGraph loadOffline(CharSequence basename) throws IOException {
+ return EFGraph.loadMapped(basename, null);
+ }
+
+ /** An iterator returning the offsets. */
+ private final static class OffsetsLongIterator implements LongIterator {
+ private final InputBitStream offsetIbs;
+ private final long n;
+ private long offset;
+ private long i;
+
+ private OffsetsLongIterator(final InputBitStream offsetIbs, final long n) {
+ this.offsetIbs = offsetIbs;
+ this.n = n;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return i <= n;
+ }
+
+ @Override
+ public long nextLong() {
+ if (!hasNext()) throw new NoSuchElementException();
+ i++;
+ try {
+ return offset += offsetIbs.readLongDelta();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ }
+
+
+ /** Commodity method for loading a big list of binary longs with specified endianness into a
+ * {@linkplain LongBigArrays long big array}.
+ *
+ * @param filename the file containing the longs.
+ * @param byteOrder the endianness of the longs.
+ * @return a big list of longs containing the longs in <code>filename</code>. */
+ public static LongBigArrayBigList loadLongBigList(final CharSequence filename, final ByteOrder byteOrder) throws IOException {
+ final long length = new File(filename.toString()).length() / (Long.SIZE / Byte.SIZE);
+ @SuppressWarnings("resource")
+ final ReadableByteChannel channel = new FileInputStream(filename.toString()).getChannel();
+ final ByteBuffer byteBuffer = ByteBuffer.allocateDirect(64 * 1024).order(byteOrder);
+ final LongBuffer longBuffer = byteBuffer.asLongBuffer();
+ final long[][] array = LongBigArrays.newBigArray(length);
+
+ long pos = 0;
+ while (channel.read(byteBuffer) > 0) {
+ byteBuffer.flip();
+ final int remainingLongs = byteBuffer.remaining() / (Long.SIZE / Byte.SIZE);
+ longBuffer.clear();
+ longBuffer.limit(remainingLongs);
+ longBuffer.get(array[BigArrays.segment(pos)], BigArrays.displacement(pos), remainingLongs);
+ pos += remainingLongs;
+ byteBuffer.clear();
+ }
+
+ channel.close();
+ return LongBigArrayBigList.wrap(array);
+
+ }
+
+
+ /** Loads a compressed graph file from disk into this graph. Note that this method should be
+ * called <em>only</em> on a newly created graph.
+ *
+ * @param basename the basename of the graph.
+ * @param mapped whether we want to memory-map the file.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return this graph. */
+ protected static EFGraph loadInternal(final CharSequence basename, final boolean mapped, final ProgressLogger pl) throws IOException {
+ // First of all, we read the property file to get the relevant data.
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ // Soft check--we accept big stuff, too.
+ if (!EFGraph.class.getName().equals(properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY).replace("it.unimi.dsi.big.webgraph", "it.unimi.dsi.webgraph"))) throw new IOException(
+ "This class (" + EFGraph.class.getName() + ") cannot load a graph stored using class \"" + properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY) + "\"");
+
+ if (properties.getProperty("version") == null) throw new IOException("Missing format version information");
+ else if (Integer.parseInt(properties.getProperty("version")) > EFGRAPH_VERSION) throw new IOException("This graph uses format " + properties.getProperty("version")
+ + ", but this class can understand only graphs up to format " + EFGRAPH_VERSION);;
+ final long nodes = Long.parseLong(properties.getProperty("nodes"));
+ if (nodes > Integer.MAX_VALUE) throw new IllegalArgumentException("The standard version of WebGraph cannot handle graphs with " + nodes + " (>2^31) nodes");
+ final int n = (int)nodes;
+ final long m = Long.parseLong(properties.getProperty("arcs"));
+ final int upperBound = properties.containsKey("upperbound") ? Integer.parseInt(properties.getProperty("upperbound")) : n;
+ final long quantum = Long.parseLong(properties.getProperty("quantum"));
+ final int log2Quantum = Fast.mostSignificantBit(quantum);
+ if (1L << log2Quantum != quantum) throw new IllegalArgumentException("Illegal quantum (must be a power of 2): " + quantum);
+
+ final ByteOrder byteOrder;
+ if (properties.get("byteorder").equals(ByteOrder.BIG_ENDIAN.toString())) byteOrder = ByteOrder.BIG_ENDIAN;
+ else if (properties.get("byteorder").equals(ByteOrder.LITTLE_ENDIAN.toString())) byteOrder = ByteOrder.LITTLE_ENDIAN;
+ else throw new IllegalArgumentException("Unknown byte order " + properties.get("byteorder"));
+
+ final FileInputStream graphIs = new FileInputStream(basename + GRAPH_EXTENSION);
+ final LongBigList graph;
+ if (mapped) graph = ByteBufferLongBigList.map(graphIs.getChannel(), byteOrder);
+ else {
+ if (pl != null) {
+ pl.itemsName = "bytes";
+ pl.start("Loading graph...");
+ }
+
+ graph = loadLongBigList(basename + GRAPH_EXTENSION, byteOrder);
+
+ if (pl != null) {
+ pl.count = graph.size64() * (Long.SIZE / Byte.SIZE);
+ pl.done();
+ }
+
+ graphIs.close();
+ }
+
+ if (pl != null) {
+ pl.itemsName = "deltas";
+ pl.start("Loading offsets...");
+ }
+
+ // We try to load a cached big list.
+ final File offsetsBigListFile = new File(basename + OFFSETS_BIG_LIST_EXTENSION);
+ LongBigList offsets = null;
+
+ if (offsetsBigListFile.exists()) {
+ if (new File(basename + OFFSETS_EXTENSION).lastModified() > offsetsBigListFile.lastModified()) LOGGER
+ .warn("A cached long big list of offsets was found, but the corresponding offsets file has a later modification time");
+ else try {
+ offsets = (LongBigList)BinIO.loadObject(offsetsBigListFile);
+ }
+ catch (ClassNotFoundException e) {
+ LOGGER.warn("A cached long big list of offsets was found, but its class is unknown", e);
+ }
+ }
+
+ if (offsets == null) {
+ final InputBitStream offsetIbs = new InputBitStream(basename + OFFSETS_EXTENSION);
+ offsets = new EliasFanoMonotoneLongBigList(n + 1, graph.size64() * Long.SIZE + 1, new OffsetsLongIterator(offsetIbs, n));
+ offsetIbs.close();
+ }
+
+ if (pl != null) {
+ pl.count = n + 1;
+ pl.done();
+ if (offsets instanceof EliasFanoMonotoneLongBigList) pl.logger().info("Pointer bits per node: " + Util.format(((EliasFanoMonotoneLongBigList)offsets).numBits() / (n + 1.0)));
+ }
+
+ return new EFGraph(basename, n, m, upperBound, log2Quantum, graph, offsets);
+ }
+
+
+ public static void store(ImmutableGraph graph, final int upperBound, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ store(graph, upperBound, basename, DEFAULT_LOG_2_QUANTUM, DEFAULT_CACHE_SIZE, ByteOrder.nativeOrder(), pl);
+ }
+
+ public static void store(ImmutableGraph graph, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ store(graph, basename, DEFAULT_LOG_2_QUANTUM, DEFAULT_CACHE_SIZE, ByteOrder.nativeOrder(), pl);
+ }
+
+ public static void store(ImmutableGraph graph, final CharSequence basename) throws IOException {
+ store(graph, basename, null);
+ }
+
+ private static double stirling(double n) {
+ return n * Math.log(n) - n + (1. / 2) * Math.log(2 * Math.PI * n);
+ }
+
+ public static void store(ImmutableGraph graph, final CharSequence basename, final int log2Quantum, final int cacheSize, final ByteOrder byteOrder, final ProgressLogger pl) throws IOException {
+ store(graph, graph.numNodes(), basename, log2Quantum, cacheSize, byteOrder, pl);
+ }
+
+ public static void store(ImmutableGraph graph, final int upperBound, final CharSequence basename, final int log2Quantum, final int cacheSize, final ByteOrder byteOrder, final ProgressLogger pl)
+ throws IOException {
+ if (log2Quantum < 0) throw new IllegalArgumentException(Integer.toString(log2Quantum));
+
+ final Accumulator successorsAccumulator = new Accumulator(cacheSize, log2Quantum);
+ final FileOutputStream graphOs = new FileOutputStream(basename + GRAPH_EXTENSION);
+ final FileChannel graphChannel = graphOs.getChannel();
+ final LongWordOutputBitStream graphStream = new LongWordOutputBitStream(graphChannel, byteOrder);
+ final OutputBitStream offsets = new OutputBitStream(basename + OFFSETS_EXTENSION);
+
+ long numberOfArcs = 0;
+ long bitsForOutdegrees = 0;
+ long bitsForSuccessors = 0;
+ offsets.writeLongDelta(0);
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ try {
+ pl.expectedUpdates = graph.numNodes();
+ }
+ catch (UnsupportedOperationException ignore) {}
+ pl.start("Storing...");
+ }
+
+ for (NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ nodeIterator.nextInt();
+ final long outdegree = nodeIterator.outdegree();
+ numberOfArcs += outdegree;
+ long lastSuccessor = 0;
+ final int outdegreeBits = graphStream.writeGamma(outdegree);
+ bitsForOutdegrees += outdegreeBits;
+ successorsAccumulator.init(outdegree, upperBound, false, true, log2Quantum);
+ final LazyIntIterator successors = nodeIterator.successors();
+ for (long successor; (successor = successors.nextInt()) != -1;) {
+ successorsAccumulator.add(successor - lastSuccessor);
+ lastSuccessor = successor;
+ }
+
+ final long successorsBits = successorsAccumulator.dump(graphStream);
+ bitsForSuccessors += successorsBits;
+ offsets.writeLongDelta(outdegreeBits + successorsBits);
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ successorsAccumulator.close();
+ graphStream.close();
+ graphOs.close();
+ offsets.close();
+
+ final long n = graph.numNodes();
+
+ if (pl != null) {
+ pl.done();
+ if (pl.count != n) throw new IllegalStateException("The graph claimed to have " + graph.numNodes() + " nodes, but the node iterator returned " + pl.count);
+ }
+
+ final DecimalFormat format = new java.text.DecimalFormat("0.###");
+ final long writtenBits = new File(basename + GRAPH_EXTENSION).length() * 8;
+
+ final Properties properties = new Properties();
+ properties.setProperty("nodes", String.valueOf(n));
+ properties.setProperty("arcs", String.valueOf(numberOfArcs));
+ if (upperBound != n) properties.setProperty("upperbound", String.valueOf(upperBound));
+ properties.setProperty("quantum", String.valueOf(1L << log2Quantum));
+ properties.setProperty("byteorder", byteOrder.toString());
+ properties.setProperty("bitsperlink", format.format((double)writtenBits / numberOfArcs));
+ properties.setProperty("compratio", format.format(writtenBits * Math.log(2) / (stirling((double)n * n) - stirling(numberOfArcs) - stirling((double)n * n - numberOfArcs))));
+ properties.setProperty("bitspernode", format.format((double)writtenBits / n));
+ properties.setProperty("avgbitsforoutdegrees", format.format((double)bitsForOutdegrees / n));
+ properties.setProperty("bitsforoutdegrees", Long.toString(bitsForOutdegrees));
+ properties.setProperty("bitsforsuccessors", Long.toString(bitsForSuccessors));
+ properties.setProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY, EFGraph.class.getName());
+ properties.setProperty("version", String.valueOf(EFGRAPH_VERSION));
+ final FileOutputStream propertyFile = new FileOutputStream(basename + PROPERTIES_EXTENSION);
+ properties.store(propertyFile, "EFGraph properties");
+ propertyFile.close();
+ }
+
+
+ protected final static class LongWordBitReader {
+
+ private static final boolean DEBUG = false;
+
+ /** The underlying list. */
+ private final LongBigList list;
+ /** The extraction width for {@link #extract()} and {@link #extract(long)}. */
+ private final int l;
+ /** {@link Long#SIZE} minus {@link #l}, cached. */
+ private final int longSizeMinusl;
+ /** The extraction mask for {@link #l} bits. */
+ private final long mask;
+
+ /** The 64-bit buffer, whose lower {@link #filled} bits contain data. */
+ private long buffer;
+ /** The number of lower used bits {@link #buffer}. */
+ private int filled;
+ /** The current position in the list. */
+ private long curr;
+
+ public LongWordBitReader(final LongBigList list, final int l) {
+ assert l < Long.SIZE;
+ this.list = list;
+ this.l = l;
+ this.longSizeMinusl = Long.SIZE - l;
+ mask = (1L << l) - 1;
+ curr = -1;
+ }
+
+ public LongWordBitReader position(final long position) {
+ if (DEBUG) System.err.println(this + ".position(" + position + ") [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ buffer = list.getLong(curr = position / Long.SIZE);
+ final int bitPosition = (int)(position % Long.SIZE);
+ buffer >>>= bitPosition;
+ filled = Long.SIZE - bitPosition;
+
+ if (DEBUG) System.err.println(this + ".position() filled: " + filled + " buffer: " + Long.toBinaryString(buffer));
+ return this;
+ }
+
+ public long position() {
+ return curr * Long.SIZE + Long.SIZE - filled;
+ }
+
+ private long extractInternal(final int width) {
+ if (DEBUG) System.err.println(this + ".extract(" + width + ") [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ if (width <= filled) {
+ long result = buffer & (1L << width) - 1;
+ filled -= width;
+ buffer >>>= width;
+ return result;
+ }
+ else {
+ long result = buffer;
+ buffer = list.getLong(++curr);
+
+ final int remainder = width - filled;
+ // Note that this WON'T WORK if remainder == Long.SIZE, but that's not going to
+ // happen.
+ result |= (buffer & (1L << remainder) - 1) << filled;
+ buffer >>>= remainder;
+ filled = Long.SIZE - remainder;
+ return result;
+ }
+ }
+
+ public long extract() {
+ if (DEBUG) System.err.println(this + ".extract() " + l + " bits [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ if (l <= filled) {
+ final long result = buffer & mask;
+ filled -= l;
+ buffer >>>= l;
+ return result;
+ }
+ else {
+ long result = buffer;
+ buffer = list.getLong(++curr);
+ result |= buffer << filled & mask;
+ // Note that this WON'T WORK if remainder == Long.SIZE, but that's not going to
+ // happen.
+ buffer >>>= l - filled;
+ filled += longSizeMinusl;
+ return result;
+ }
+ }
+
+ public long extract(long position) {
+ if (DEBUG) System.err.println(this + ".extract(" + position + ") [l=" + l + "]");
+
+ final int bitPosition = (int)(position % Long.SIZE);
+ final int totalOffset = bitPosition + l;
+ final long result = list.getLong(curr = position / Long.SIZE) >>> bitPosition;
+
+ if (totalOffset <= Long.SIZE) {
+ buffer = result >>> l;
+ filled = Long.SIZE - totalOffset;
+ return result & mask;
+ }
+
+ final long t = list.getLong(++curr);
+
+ buffer = t >>> totalOffset;
+ filled = 2 * Long.SIZE - totalOffset;
+
+ return result | t << -bitPosition & mask;
+ }
+
+ public int readUnary() {
+ if (DEBUG) System.err.println(this + ".readUnary() [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ int accumulated = 0;
+
+ for (;;) {
+ if (buffer != 0) {
+ final int msb = Long.numberOfTrailingZeros(buffer);
+ filled -= msb + 1;
+ /* msb + 1 can be Long.SIZE, so we must break down the shift. */
+ buffer >>>= msb;
+ buffer >>>= 1;
+ if (DEBUG) System.err.println(this + ".readUnary() => " + (msb + accumulated));
+ return msb + accumulated;
+ }
+ accumulated += filled;
+ buffer = list.getLong(++curr);
+ filled = Long.SIZE;
+ }
+
+ }
+
+ public long readNonZeroGamma() {
+ final int msb = readUnary();
+ return extractInternal(msb) | (1L << msb);
+ }
+
+ public long readGamma() {
+ return readNonZeroGamma() - 1;
+ }
+ }
+
+
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public long numArcs() {
+ return m;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return true;
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return true;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ if (x == cachedNode) return cachedOutdegree;
+ cachedOutdegree = (int)outdegreeLongWordBitReader.position(offsets.getLong(cachedNode = x)).readGamma();
+ cachedPointer = outdegreeLongWordBitReader.position();
+ return cachedOutdegree;
+ }
+
+
+ protected final static class EliasFanoSuccessorReader extends AbstractLazyIntIterator implements LazyIntSkippableIterator {
+ private final static int SKIPPING_THRESHOLD = 8;
+ /** The number of nodes in the graph. */
+ private final long n;
+ /** The upper bound used at construction time. */
+ private final int upperBound;
+ /** The underlying list. */
+ protected final LongBigList graph;
+ /** The longword bit reader for pointers. */
+ protected final LongWordBitReader skipPointers;
+ /** The starting position of the pointers. */
+ protected final long skipPointersStart;
+ /** The starting position of the upper bits. */
+ protected final long upperBitsStart;
+ /** The longword bit reader for the lower bits. */
+ private final LongWordBitReader lowerBits;
+ /** The starting position of the lower bits. */
+ private final long lowerBitsStart;
+ /** The logarithm of the quantum, cached from the graph. */
+ protected final int log2Quantum;
+ /** The quantum, cached from the graph. */
+ protected final int quantum;
+ /** The size of a pointer. */
+ protected final int pointerSize;
+ /** The outdegree. */
+ protected final long outdegree;
+ /** The 64-bit window. */
+ protected long window;
+ /** The current word position in the list of upper bits. */
+ protected long curr;
+ /** The index of the current prefix sum. */
+ public long currentIndex;
+ /** The number of lower bits. */
+ private final int l;
+ /** The last value returned by {@link #nextInt()}, {@link Integer#MIN_VALUE} if the list has
+ * never be accessed, or {@link LazyIntSkippableIterator#END_OF_LIST} if the list has been
+ * exhausted. */
+ private int last;
+
+ public EliasFanoSuccessorReader(final long n, final int upperBound, final LongBigList graph, final long outdegree, final long skipPointersStart, final int log2Quantum) {
+ this.n = n;
+ this.upperBound = upperBound;
+ this.graph = graph;
+ this.log2Quantum = log2Quantum;
+ this.quantum = 1 << log2Quantum;
+ this.outdegree = outdegree;
+ this.skipPointersStart = skipPointersStart;
+
+ l = lowerBits(outdegree + 1, upperBound);
+ final long numberOfPointers = numberOfPointers(outdegree + 1, upperBound, log2Quantum);
+ pointerSize = pointerSize(outdegree + 1, upperBound);
+
+ lowerBitsStart = skipPointersStart + pointerSize * numberOfPointers;
+ upperBitsStart = lowerBitsStart + l * (outdegree + 1);
+
+ skipPointers = numberOfPointers == 0 ? null : new LongWordBitReader(graph, pointerSize);
+ (lowerBits = new LongWordBitReader(graph, l)).position(lowerBitsStart);
+ position(upperBitsStart);
+ last = Integer.MIN_VALUE;
+ }
+
+ private void position(final long position) {
+ window = graph.getLong(curr = position / Long.SIZE) & -1L << (int)(position);
+ }
+
+ private long getNextUpperBits() {
+ while (window == 0)
+ window = graph.getLong(++curr);
+ final long upperBits = curr * Long.SIZE + Long.numberOfTrailingZeros(window) - currentIndex++ - upperBitsStart;
+ window &= window - 1;
+ return upperBits;
+ }
+
+ @Override
+ public int nextInt() {
+ if (currentIndex >= outdegree) {
+ last = END_OF_LIST;
+ return -1;
+ }
+ return last = (int)(getNextUpperBits() << l | lowerBits.extract());
+ }
+
+ @Override
+ public int skipTo(final int lowerBound) {
+ if (lowerBound <= last) return last;
+ final long zeroesToSkip = lowerBound >>> l;
+ long delta = zeroesToSkip - ((last & (-1 >>> 1)) >>> l); // This catches last =
+ // Integer.MIN_VALUE and
+ // turns it into 0
+ assert delta >= 0;
+
+ if (delta < SKIPPING_THRESHOLD) {
+ do
+ nextInt();
+ while (last < lowerBound);
+ return last == n ? last = END_OF_LIST : last;
+ }
+
+ if (delta > quantum) {
+ final long block = zeroesToSkip >>> log2Quantum;
+ assert block > 0;
+ assert block <= numberOfPointers(outdegree + 1, upperBound, log2Quantum);
+ final long blockZeroes = block << log2Quantum;
+ final long skip = skipPointers.extract(skipPointersStart + (block - 1) * pointerSize);
+ assert skip != 0;
+ position(upperBitsStart + skip);
+ currentIndex = skip - blockZeroes;
+ delta = zeroesToSkip - curr * Long.SIZE + currentIndex + upperBitsStart;
+ }
+
+ assert delta >= 0 : delta;
+
+ for (int bitCount; (bitCount = Long.bitCount(~window)) < delta;) {
+ window = graph.getLong(++curr);
+ delta -= bitCount;
+ currentIndex += Long.SIZE - bitCount;
+ }
+
+ /* Note that for delta == 1 the following code is a NOP, but the test for zero is so
+ * faster that it is not worth replacing with a > 1. Predecrementing won't work as delta
+ * might be zero. */
+ if (delta-- != 0) {
+ // Phase 1: sums by byte
+ final long word = ~window;
+ assert delta < Long.bitCount(word) : delta + " >= " + Long.bitCount(word);
+ long byteSums = word - ((word & 0xa * ONES_STEP_4) >>> 1);
+ byteSums = (byteSums & 3 * ONES_STEP_4) + ((byteSums >>> 2) & 3 * ONES_STEP_4);
+ byteSums = (byteSums + (byteSums >>> 4)) & 0x0f * ONES_STEP_8;
+ byteSums *= ONES_STEP_8;
+
+ // Phase 2: compare each byte sum with delta to obtain the relevant byte
+ final long rankStep8 = delta * ONES_STEP_8;
+ final long byteOffset = (((((rankStep8 | MSBS_STEP_8) - byteSums) & MSBS_STEP_8) >>> 7) * ONES_STEP_8 >>> 53) & ~0x7;
+
+ final int byteRank = (int)(delta - (((byteSums << 8) >>> byteOffset) & 0xFF));
+
+ final int select = (int)(byteOffset + Fast.selectInByte[(int)(word >>> byteOffset & 0xFF) | byteRank << 8]);
+
+ // We cancel up to, but not including, the target one.
+ window &= -1L << select;
+ currentIndex += select - delta;
+ }
+
+ final long lower = lowerBits.extract(lowerBitsStart + l * currentIndex);
+ last = (int)(getNextUpperBits() << l | lower);
+
+ for (;;) {
+ if (last >= lowerBound) return last == n ? last = END_OF_LIST : last;
+ nextInt();
+ }
+ }
+
+ @Override
+ public String toString() {
+ return this.getClass().getSimpleName() + '@' + Integer.toHexString(System.identityHashCode(this));
+ }
+ }
+
+ @Override
+ public LazyIntSkippableIterator successors(final int x) {
+ return new EliasFanoSuccessorReader(n, upperBound, graph, outdegree(x), cachedPointer, log2Quantum);
+ }
+
+ @Override
+ public EFGraph copy() {
+ return new EFGraph(basename, n, m, upperBound, log2Quantum, graph instanceof ByteBufferLongBigList ? ((ByteBufferLongBigList)graph).copy() : graph, offsets);
+ }
+
+ public static void main(String args[]) throws SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException, ClassNotFoundException,
+ InstantiationException {
+ String source, dest;
+ Class<?> graphClass;
+
+ final SimpleJSAP jsap = new SimpleJSAP(
+ BVGraph.class.getName(),
+ "Compresses a graph using the Elias-Fano representation. Source and destination are basenames from which suitable filenames will be stemmed; alternatively, if the suitable option was specified, source is a spec (see below). For more information about the compression techniques, see the Javadoc documentation.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval",
+ "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("log2Quantum", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_LOG_2_QUANTUM), JSAP.NOT_REQUIRED, 'q', "--log2-quantum",
+ "The base-two logarithm of the indexing quantum."),
+ new Switch("offline", 'o', "offline", "No-op for backward compatibility."),
+ new Switch("once", '1', "once", "Use the read-once load method to read a graph from standard input."),
+ new Switch("list", 'L', "list", "Precomputes an Elias-Fano list of offsets for the source graph."),
+ new Switch("fixedWidthList", 'F', "fixed-width-list", "Precomputes a list of fixed-width offsets for the source graph."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY,
+ "The basename of the source graph, or a source spec if --spec was given; it is immaterial when --once is specified."),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY,
+ "The basename of the destination graph; if omitted, no recompression is performed. This is useful in conjunction with --offsets and --list."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean once = jsapResult.getBoolean("once");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean list = jsapResult.getBoolean("list");
+ final boolean fixedWidthList = jsapResult.getBoolean("fixedWidthList");
+ final int log2Quantum = jsapResult.getInt("log2Quantum");
+ graphClass = jsapResult.getClass("graphClass");
+ source = jsapResult.getString("sourceBasename");
+ dest = jsapResult.getString("destBasename");
+
+ final ImmutableGraph graph;
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ if (graphClass != null) {
+ if (spec) {
+ System.err.println("Options --graph-class and --spec are incompatible");
+ System.exit(1);
+ }
+ if (once) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.ONCE.toMethod(), InputStream.class).invoke(null, System.in);
+ else graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.OFFLINE.toMethod(), CharSequence.class).invoke(null, source);
+ }
+ else {
+ if (!spec) graph = once ? ImmutableGraph.loadOnce(System.in) : ImmutableGraph.loadOffline(source, pl);
+ else graph = ObjectParser.fromSpec(source, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ }
+
+ if (dest != null) {
+ if (list || fixedWidthList) throw new IllegalArgumentException("You cannot specify a destination graph with these options");
+ EFGraph.store(graph, dest, log2Quantum, DEFAULT_CACHE_SIZE, ByteOrder.nativeOrder(), pl);
+ }
+ else {
+ if (!(graph instanceof EFGraph)) throw new IllegalArgumentException("The source graph is not an EFGraph");
+ final InputBitStream offsets = new InputBitStream(graph.basename() + OFFSETS_EXTENSION);
+ final long sizeInBits = new File(graph.basename() + GRAPH_EXTENSION).length() * Byte.SIZE + 1;
+ final OffsetsLongIterator offsetsIterator = new OffsetsLongIterator(offsets, graph.numNodes());
+ if (list) {
+ BinIO.storeObject(new EliasFanoMonotoneLongBigList(graph.numNodes() + 1, sizeInBits, offsetsIterator), graph.basename() + OFFSETS_BIG_LIST_EXTENSION);
+ }
+ else if (fixedWidthList) {
+ final LongBigList t = LongArrayBitVector.getInstance().asLongBigList(Fast.length(sizeInBits));
+ while (offsetsIterator.hasNext())
+ t.add(offsetsIterator.nextLong());
+ BinIO.storeObject(t, graph.basename() + OFFSETS_BIG_LIST_EXTENSION);
+ }
+ offsets.close();
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/GraphClassParser.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/GraphClassParser.java
new file mode 100644
index 0000000..9de6a76
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/GraphClassParser.java
@@ -0,0 +1,77 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import com.martiansoftware.jsap.ParseException;
+import com.martiansoftware.jsap.stringparsers.ClassStringParser;
+
+/** A small wrapper around JSAP's standard {@link ClassStringParser}. It
+ * tries to prefix the package names in {@link #PACKAGE} to the provided
+ * class name, making the specification of graph classes on the command line much easier. */
+
+public class GraphClassParser extends ClassStringParser {
+ /** The packages that will be prepended to each graph class. */
+ public final static String[] PACKAGE = { "it.unimi.dsi.webgraph", "it.unimi.dsi.webgraph.labelling" };
+
+ private final static GraphClassParser INSTANCE = new GraphClassParser();
+
+ @SuppressWarnings("deprecation")
+ protected GraphClassParser() {}
+
+ public static ClassStringParser getParser() {
+ return INSTANCE;
+ }
+
+ /** Parses the given class name, but as a first try prepends the package names found in {@link #PACKAGE}.
+ * @param className the name of a class, possibly without package specification.
+ */
+ @Override
+ public Object parse(String className) throws ParseException {
+ for(String p: PACKAGE) {
+ try {
+ return super.parse(p + "." + className);
+ }
+ catch(Exception notFound) {}
+ }
+ return super.parse(className);
+ }
+
+ /** @deprecated Use {@link it.unimi.dsi.lang.ObjectParser#fromSpec(String, Class, String[], String[])}. */
+ @Deprecated
+ public static ImmutableGraph getGraphFromSpec(String spec) throws ParseException {
+ int parPos = spec.indexOf('(');
+ if (parPos < 0 || spec.charAt(spec.length() - 1) != ')') throw new ParseException("Wrong parenthesis in " + spec);
+ String className = spec.substring(0, parPos);
+
+ Class<?> c;
+ try {
+ c = (Class<?>)INSTANCE.parse(className);
+ if (!ImmutableGraph.class.isAssignableFrom(c)) throw new ParseException(className + " is not a valid ImmutableGraph class");
+
+ return (ImmutableGraph)c.getConstructor(String[].class).newInstance((Object)spec.substring(parPos + 1, spec.length() - 1).split(","));
+ }
+ catch (ParseException e) {
+ throw e;
+ }
+ catch (Exception e) {
+ throw new ParseException("Parse exception in spec " + spec, e);
+ }
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableGraph.java
new file mode 100644
index 0000000..fab9728
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableGraph.java
@@ -0,0 +1,746 @@
+package it.unimi.dsi.webgraph;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.lang.FlyweightPrototype;
+import it.unimi.dsi.logging.ProgressLogger;
+
+/** A simple abstract class representing an immutable graph.
+ *
+ * <P>Subclasses of this class are used to create and access <em>immutable graphs</em>, that is,
+ * graphs that are computed once for all, stored conveniently, and then accessed repeatedly.
+ * Moreover, immutable graphs are usually very large&mdash;so large that two such graphs may not
+ * fit into central memory (the main example being a sizable portion of the web).
+ *
+ * <P>A subclass of this class must implement methods to obtain the {@linkplain
+ * #numNodes() number of nodes}, the {@linkplain #outdegree(int) outdegree of a
+ * node} and the successors of a node (either {@link #successors(int)}
+ * or {@link #successorArray(int)}). Additionally, it may provide methods to
+ * obtain the {@linkplain #numNodes() number of arcs}, and a {@linkplain #basename() basename}.
+ *
+ * <P>This class provides {@link #equals(Object)} and {@link #hashCode()} methods that consider
+ * two graph equals if they have the same size and all their successor lists are equal.
+ *
+ * <H2>Iterating on successors</H2>
+ *
+ * <p>Starting with WebGraph 2.0, the iterator architecture is <em>fully lazy</em>&mdash;you have no
+ * <code>hasNext()</code> method. Rather, the {@link LazyIntIterator} returned by {@link #successors(int)}
+ * will return -1 when no more successors are available. The idiomatic forms for enumerating successors
+ * <i>via</i> iterators are
+ * <pre>
+ * LazyIntIterator successors = g.successors(x);
+ * int d = g.outdegree(x);
+ * while(d-- != 0) doSomething(successors.nextInt());
+ * </pre>
+ * and
+ * <pre>
+ * LazyIntIterator successors = g.successors(x);
+ * int t;
+ * while((t = successors.nextInt()) != -1) doSomething(t);
+ * </pre>
+ *
+ * <p>The alternative method {@link #successorArray(int)} provides an array containing the successors
+ * <em>and possibly more elements</em>. Use {@link #outdegree(int)} to know how many elements are valid.
+ * The efficiency of {@link #successors(int)} and {@link #successorArray(int)} may vary depending on the
+ * implementation.
+ *
+ * <H2>Iterating on a graph in parallel</H2>
+ *
+ * <p>You can scan a graph sequentially using {@linkplain NodeIterator node iterators}. Starting with version 3.5.0,
+ * implementations of this class may return true on {@link #hasCopiableIterators()}, which means that
+ * node iterators implement the optional {@link NodeIterator#copy(int) copy(int)} method. Using {@link NodeIterator#copy(int) copy(int)},
+ * the method {@link #splitNodeIterators(int)} of this class is able to provide separate, thread-safe iterators on different segments
+ * of contiguous nodes of the graph. The class {@link BVGraph}, for example, uses this interface to provide
+ * parallel compression. We suggest that all classes providing parallel iteration read the system variable
+ * {@value #NUMBER_OF_THREADS_PROPERTY} to override the number of parallel threads.
+ *
+ *
+ * <H2>Building an immutable graph</H2>
+ *
+ * <P>Due to their large size, immutable
+ * graphs have a peculiar serialisation scheme. Every subclass of this class
+ * <strong>must</strong> implement a number of static methods that create an immutable
+ * graph, given a string (usually a basename for a set of files) and, optionally, a {@link it.unimi.dsi.logging.ProgressLogger}.
+ * The signatures that <strong>must</strong> be implemented are
+ * <UL>
+ * <LI><code>ImmutableGraph load(CharSequence, ProgressLogger)</code>;
+ * <LI><code>ImmutableGraph load(CharSequence)</code>;
+ * <LI><code>ImmutableGraph loadOffline(CharSequence, ProgressLogger)</code>;
+ * <LI><code>ImmutableGraph loadOffline(CharSequence)</code>.
+ * <LI><code>ImmutableGraph loadOnce(InputStream)</code>;
+ * </UL>
+ *
+ * <p>Additionally, the following signatures <strong>can</strong> be implemented:
+ * <UL>
+ * <LI><code>ImmutableGraph loadMapped(CharSequence, ProgressLogger)</code>;
+ * <LI><code>ImmutableGraph loadMapped(CharSequence)</code>;
+ * </UL>
+ *
+ * <p>The special semantics associated to <code>loadOffline()</code>
+ * is that the immutable graph should be set up, and possibly some metadata could be read from disk, but no
+ * actual data is loaded into memory; the class should guarantee that offline sequential access (i.e., by means
+ * of {@link #nodeIterator(int)}) is still possible. In other words, in most cases {@link #nodeIterator(int)} will have to be
+ * overridden by the subclasses to behave properly even in an offline setting (see {@link #nodeIterator()}).
+ * The special semantics associated with <code>loadOnce()</code> is that the graph can be traversed
+ * <em>just once</em> using a call to {@link #nodeIterator()}. The special semantics associated with <code>loadMapped()</code>
+ * is that metadata could be read from disk, but the graph will be accessed by memory mapping; the class
+ * should guarantee that random access is possible.
+ *
+ * <P>Note that a simple class may just implement all special forms of graph loading delegating to the standard
+ * load method (see, e.g., {@link it.unimi.dsi.webgraph.ASCIIGraph}).
+ * Specific implementations of {@link ImmutableGraph} may also decide to expose internal load methods
+ * to make it easier to write load methods for subclasses
+ * (see, e.g., {@link it.unimi.dsi.webgraph.BVGraph#loadInternal(CharSequence, int, ProgressLogger) loadInternal()}).
+ *
+ * <P>Analogously, a subclass of this class <strong>may</strong> also implement
+ * <UL>
+ * <LI><code>store(ImmutableGraph, CharSequence, ProgressLogger)</code>;
+ * <LI><code>store(ImmutableGraph, CharSequence)</code>.
+ * </UL>
+ *
+ * These methods must store in compressed form a given immutable graph, using the default values
+ * for compression parameters, etc. It is likely, however, that more
+ * of <code>store</code> methods are available, as parameters vary wildly
+ * from subclass to subclass. The method {@link #store(Class, ImmutableGraph, CharSequence, ProgressLogger)}
+ * invokes by reflection the methods above on the provided class.
+ *
+ * <P>The standard method to build a new immutable graph is creating a (possibly anonymous) class
+ * that extends this class, and save it using a concrete subclass (e.g., {@link it.unimi.dsi.webgraph.BVGraph}). See
+ * the source of {@link it.unimi.dsi.webgraph.Transform} for several examples.
+ *
+ * <H2>Properties Conventions</H2>
+ *
+ * <P>To provide a simple way to load an immutable graph without knowing in advance its class,
+ * the following convention may be followed: a graph with basename <var><code>name</code></var> may feature
+ * a Java property file <code><var>name</var>.properties</code> with a property <code>graphclass</code>
+ * containing the actual class of the graph. In this case, you can use the implementation of the load/store
+ * methods contained in this class, similarly to the standard Java serialisation scheme. {@link BVGraph}, for instance,
+ * follows this convention, but {@link ASCIIGraph} does not.
+ *
+ * <P>The reason why this convention is not enforced is that it is sometimes useful to write lightweight classes,
+ * mostly for debugging purposes, whose graph representation is entirely contained in a single file (e.g., {@link ASCIIGraph}),
+ * so that {@link #loadOnce(InputStream)} can be easily implemented.
+ *
+ * <H2>Facilities for loading an immutable graph</H2>
+ *
+ * <P>{@link ImmutableGraph} provides ready-made implementations of the load methods that work as follows: they
+ * opens a property file with the given basename, and look for the <code>graphclass</code> property; then, they simply
+ * delegates the actual load to the specified graph class by reflection.
+ *
+ * <h2>Thread-safety and flyweight copies</h2>
+ *
+ * <p>Implementations of this class need not be thread-safe. However, they implement the
+ * {@link FlyweightPrototype} pattern: the {@link #copy()} method is
+ * thread-safe and will return a lightweight copy of the graph&mdash;usually, all immutable
+ * data will be shared between copies. Concurrent access to different copies is safe.
+ *
+ * <p>Note that by contract {@link #copy()} is guaranteed to work only if {@link #randomAccess()}
+ * returns true.
+ */
+
+
+public abstract class ImmutableGraph implements FlyweightPrototype<ImmutableGraph> {
+ private final static Logger LOGGER = LoggerFactory.getLogger(ImmutableGraph.class);
+
+ public static final String GRAPHCLASS_PROPERTY_KEY = "graphclass";
+ /** The standard extension of property files. */
+ public static final String PROPERTIES_EXTENSION = ".properties";
+ /** The property used to set the number of parallel compression threads. */
+ public static final String NUMBER_OF_THREADS_PROPERTY = "it.unimi.dsi.webgraph.threads";
+
+ private final static class ImmutableGraphNodeIterator extends NodeIterator {
+ private final ImmutableGraph graph;
+ private final int from;
+ private final int to;
+ private int curr;
+
+ private ImmutableGraphNodeIterator(final ImmutableGraph graph, final int from, final int to) {
+ this.graph = graph;
+ this.from = from;
+ curr = from - 1;
+ this.to = Math.min(graph.numNodes(), to);
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ return ++curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return curr < to - 1;
+ }
+
+ @Override
+ public LazyIntIterator successors() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return graph.successors(curr);
+ }
+
+ @Override
+ public int outdegree() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return graph.outdegree(curr);
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ return new ImmutableGraphNodeIterator(graph.copy(), curr + 1, upperBound);
+ }
+ }
+
+ /** A list of the methods that can be used to load a graph. They are used
+ * by {@link ImmutableGraph} and other classes to represent standard
+ * (i.e., random access), sequential, offline and read-once graph loading. */
+
+ public static enum LoadMethod {
+ STANDARD,
+ @Deprecated
+ SEQUENTIAL,
+ OFFLINE,
+ ONCE,
+ MAPPED;
+
+ public String toMethod() {
+ switch(this) {
+ case STANDARD: return "load";
+ case SEQUENTIAL: return "loadSequential";
+ case OFFLINE: return "loadOffline";
+ case ONCE: return "loadOnce";
+ case MAPPED: return "loadMapped";
+ default: throw new AssertionError();
+ }
+ }
+ };
+
+ /** Returns the number of nodes of this graph.
+ *
+ * <p>Albeit this method is not optional, it is allowed that this method throws
+ * an {@link UnsupportedOperationException} if this graph has never been entirely
+ * traversed using a {@link #nodeIterator() node iterator}. This apparently bizarre
+ * behaviour is necessary to support implementations as {@link ArcListASCIIGraph}, which
+ * do not know the actual number of nodes until a traversal has been completed.
+ *
+ * @return the number of nodes.
+ */
+ public abstract int numNodes();
+
+ /** Returns the number of arcs of this graph (optional operation).
+ *
+ * @return the number of arcs.
+ */
+ public long numArcs() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Checks whether this graph provides random access to successor lists.
+ *
+ * @return true if this graph provides random access to successor lists.
+ */
+ public abstract boolean randomAccess();
+
+ /** Whether the node iterators returned by this graph support {@link NodeIterator#copy(int)}.
+ *
+ * <p>This implementation just returns false.
+ *
+ * @return true if this graph provides copiable iterators.
+ */
+ public boolean hasCopiableIterators() {
+ return randomAccess();
+ }
+
+ /** Returns a symbolic basename for this graph (optional operation).
+ *
+ * <P>Implementors of this class may provide a basename (usually
+ * a pathname from which various files storing the graph are stemmed).
+ * This method is optional because it is sometimes unmeaningful (e.g.,
+ * for one-off anonymous classes).
+ *
+ * @return the basename.
+ */
+ public CharSequence basename() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Returns a lazy iterator over the successors of a given node. The iteration terminates
+ * when -1 is returned.
+ *
+ * <P>This implementation just wraps the array returned by {@link #successorArray(int)}. Subclasses
+ * are encouraged to override this implementation.
+ *
+ * <p>The semantics of this method has been significantly modified in WebGraph 2.0 to take advantage of the new,
+ * faster lazy architecture.
+ *
+ * @param x a node.
+ * @return a lazy iterator over the successors of the node.
+ */
+ public LazyIntIterator successors(final int x) {
+ return LazyIntIterators.wrap(successorArray(x), outdegree(x));
+ }
+
+ /** Returns a reference to an array containing the successors of a given node.
+ *
+ * <P>The returned array may contain more entries than the outdegree of <code>x</code>.
+ * However, only those with indices from 0 (inclusive) to the outdegree of <code>x</code> (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwraps the iterator returned by {@link #successors(int)}. Subclasses
+ * are encouraged to override this implementation.
+ *
+ * <p><strong>Warning</strong>: all implementations must guarantee that a distinct array is returned for
+ * each node. The caller, in turn, must treat the array as a read-only object.
+ *
+ *
+ * @param x a node.
+ * @return an array whose first elements are the successors of the node; the array must not
+ * be modified by the caller.
+ */
+ public int[] successorArray(final int x) {
+ final int[] successor = new int[outdegree(x)];
+ LazyIntIterators.unwrap(successors(x), successor);
+ return successor;
+ }
+
+ /** Returns the outdegree of a node.
+ *
+ * @param x a node.
+ * @throws IllegalStateException if called without offsets.
+ * @return the outdegree of the given node.
+ */
+ public abstract int outdegree(int x);
+
+ /** Returns a node iterator for scanning the graph sequentially, starting from the given node.
+ *
+ * <P>This implementation just calls the random-access methods ({@link #successors(int)} and
+ * {@link #outdegree(int)}). More specific implementations may choose to maintain some extra state
+ * to make the enumeration more efficient.
+ *
+ * @param from the node from which the iterator will iterate.
+ * @return a {@link NodeIterator} for accessing nodes and successors sequentially.
+ */
+ public NodeIterator nodeIterator(final int from) {
+ return new ImmutableGraphNodeIterator(this, from, Integer.MAX_VALUE);
+ }
+
+ /** Returns a node iterator for scanning the graph sequentially, starting from the first node.
+ *
+ * @return a {@link NodeIterator} for accessing nodes and successors sequentially.
+ */
+ public NodeIterator nodeIterator() {
+ return nodeIterator(0);
+ }
+
+ /** Returns an array of node iterators, scanning each a portion of the nodes of
+ * a graph. Iterators are guaranteed to scan mutually disjoint sets of nodes,
+ * and every node is guaranteed to be scanned by one iterator.
+ *
+ * <p>This is an optional operation. If implemented, though, the returned iterators must
+ * properly implement {@link NodeIterator#copy(int)}.
+ *
+ * @param howMany the number of iterators to be returned (at the end of the array, some of them may be empty).
+ * @return the required iterators.
+ */
+ public NodeIterator[] splitNodeIterators(final int howMany) {
+ if (numNodes() == 0 && howMany == 0) return new NodeIterator[0];
+ if (howMany < 1) throw new IllegalArgumentException();
+ final NodeIterator[] result = new NodeIterator[howMany];
+ if (! hasCopiableIterators()) {
+ // No possibility to split
+ result[0] = nodeIterator();
+ return result;
+ }
+ final int n = numNodes();
+ final int m = (int)Math.ceil((double)n / howMany);
+ if (randomAccess()) {
+ int from, i;
+ // This approach is slightly wasteful, but replicating the state should have an infinitesimal cost.
+ for (from = i = 0; from < n; from += m, i++) result[i] = nodeIterator(from).copy(from + m);
+ Arrays.fill(result, i, result.length, NodeIterator.EMPTY);
+ return result;
+ } else {
+ final NodeIterator nodeIterator = nodeIterator();
+ int i = 0;
+ int nextNode = 0;
+ while (i < result.length && nodeIterator.hasNext()) {
+ if (nextNode % m == 0) result[i++] = nodeIterator.copy(nextNode + m);
+ final int node = nodeIterator.nextInt();
+ assert node == nextNode;
+ nextNode++;
+ }
+ Arrays.fill(result, i, result.length, NodeIterator.EMPTY);
+ return result;
+ }
+ }
+
+ /** Returns a flyweight copy of this immutable graph.
+ *
+ * @return a flyweight copy of this immutable graph.
+ * @throws UnsupportedOperationException if flyweight copies are not supported:
+ * support is guaranteed only if {@link #randomAccess()} returns true.
+ * @see FlyweightPrototype
+ */
+
+ @Override
+ public abstract ImmutableGraph copy();
+
+ @Override
+ public String toString() {
+ final StringBuilder s = new StringBuilder();
+
+ long numArcs = -1;
+ try {
+ numArcs = numArcs();
+ }
+ catch(final UnsupportedOperationException ignore) {}
+
+ s.append("Nodes: " + numNodes() + "\nArcs: " + (numArcs == -1 ? "unknown" : Long.toString(numArcs)) + "\n");
+
+ final NodeIterator nodeIterator = nodeIterator();
+ LazyIntIterator successors;
+ int curr;
+ for (int i = numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextInt();
+ s.append("Successors of " + curr + " (degree " + nodeIterator.outdegree() + "):");
+ successors = nodeIterator.successors();
+ int d = nodeIterator.outdegree();
+ while (d-- != 0) s.append(" " + successors.nextInt());
+ s.append('\n');
+ }
+ return s.toString();
+ }
+
+ /** Returns an iterator enumerating the outdegrees of the nodes of this graph.
+ *
+ * @return an iterator enumerating the outdegrees of the nodes of this graph.
+ */
+ public IntIterator outdegrees() {
+ return randomAccess() ?
+ new IntIterator() {
+ private final int n = numNodes();
+ private int next = 0;
+ @Override
+ public boolean hasNext() {
+ return next < n;
+ }
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ return outdegree(next++);
+ }
+ } :
+ new IntIterator() {
+ private final NodeIterator nodeIterator = nodeIterator();
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+ @Override
+ public int nextInt() {
+ nodeIterator.nextInt();
+ return nodeIterator.outdegree();
+ }
+ };
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, without
+ * offsets.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence)} or {@link #loadMapped(CharSequence)} instead.
+ */
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, null);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, without
+ * offsets.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence, ProgressLogger)} or {@link #loadMapped(CharSequence, ProgressLogger)} instead.
+ */
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, null, pl);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading offline a graph file.
+ *
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+ public static ImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, null);
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading offline a graph file.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+ public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, null, pl);
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by memory-mapping a graph file.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the offsets, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.MAPPED, basename, null, pl);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by memory-mapping a graph file.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(LoadMethod.MAPPED, basename, null);
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading a read-once graph from an input stream.
+ *
+ * <p>This implementation just throws a {@link UnsupportedOperationException}. There
+ * is no way to write a generic implementation, because there is no way to know
+ * in advance the class that should read the graph.
+ *
+ * @param is an input stream containing the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @throws UnsupportedOperationException if this graph class does not support read-once graphs.
+ */
+
+ public static ImmutableGraph loadOnce(final InputStream is) throws IOException {
+ throw new UnsupportedOperationException("This class does not support read-once loading");
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, with
+ * all offsets, using no progress logger.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(LoadMethod.STANDARD, basename, null);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, with
+ * all offsets, using a progress logger.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.STANDARD, basename, null, pl);
+ }
+
+ private static final ProgressLogger UNUSED = new ProgressLogger();
+
+ /** Creates a new {@link ImmutableGraph} using the given method and no progress logger.
+ *
+ * @param method the load method.
+ * @param basename the basename of the graph, if <code>method</code> is not {@link LoadMethod#ONCE}.
+ * @param is an input stream the containing the graph, if <code>method</code> is {@link LoadMethod#ONCE}.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ private static ImmutableGraph load(LoadMethod method, CharSequence basename, InputStream is) throws IOException {
+ return load(method, basename, is, UNUSED);
+ }
+
+ /** Creates a new immutable graph by loading a graph file from disk to memory, delegating the
+ * actual loading to the class specified in the <code>graphclass</code> property within the property
+ * file (named <code><var>basename</var>.properties</code>). The exact load method to be used
+ * depends on the <code>method</code> argument.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param method the method to be used to load the graph.
+ * @param basename the basename of the graph, if <code>method</code> is not {@link LoadMethod#ONCE}.
+ * @param is an input stream the containing the graph, if <code>method</code> is {@link LoadMethod#ONCE}.
+ * @param pl the progress logger; it can be <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ protected static ImmutableGraph load(LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl) throws IOException {
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ String graphClassName;
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ if ((graphClassName = properties.getProperty(GRAPHCLASS_PROPERTY_KEY)) == null) throw new IOException("The property file for " + basename + " does not contain a graphclass property");
+
+ // Small kludge to fix old usage of toString() instead of getName();
+ if (graphClassName.startsWith("class ")) graphClassName = graphClassName.substring(6);
+
+ // Small kludge to try to load small graphs created with the big version.
+ if (graphClassName.startsWith("it.unimi.dsi.big.webgraph")) {
+ final String standardGraphClassName = graphClassName.replace("it.unimi.dsi.big.webgraph", "it.unimi.dsi.webgraph");
+ LOGGER.warn("Replacing class " + graphClassName + " with " + standardGraphClassName);
+ graphClassName = standardGraphClassName;
+ }
+
+ final Class<?> graphClass;
+ ImmutableGraph graph = null;
+
+ try {
+ graphClass = Class.forName(graphClassName);
+
+ if (method == LoadMethod.ONCE) graph = (ImmutableGraph)graphClass.getMethod(method.toMethod(), InputStream.class).invoke(null, is);
+ else {
+ if (pl == UNUSED) graph = (ImmutableGraph)graphClass.getMethod(method.toMethod(), CharSequence.class).invoke(null, basename);
+ else graph = (ImmutableGraph)graphClass.getMethod(method.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ }
+ } catch (final InvocationTargetException e) {
+ if (e.getCause() instanceof IOException) throw (IOException) e.getCause();
+ throw new RuntimeException(e);
+ } catch(final Exception e) {
+ throw new RuntimeException(e);
+ }
+
+ return graph;
+ }
+
+
+ /** Stores an immutable graph using a specified subclass and a progress logger.
+ *
+ * <P>This method is a useful shorthand that invoke by reflection the store method of a given subclass.
+ * Note, however, that usually a subclass will provide more refined store methods with more parameters.
+ *
+ * @param graphClass the subclass of {@link ImmutableGraph} that should store the graph.
+ * @param graph the graph to store.
+ * @param basename the basename.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+
+ public static void store(final Class<?> graphClass, final ImmutableGraph graph, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ if (! ImmutableGraph.class.isAssignableFrom(graphClass)) throw new ClassCastException(graphClass.getName() + " is not a subclass of ImmutableGraph");
+ try {
+ if (pl == UNUSED) graphClass.getMethod("store", ImmutableGraph.class, CharSequence.class).invoke(null, graph, basename);
+ else graphClass.getMethod("store", ImmutableGraph.class, CharSequence.class, ProgressLogger.class).invoke(null, graph, basename, pl);
+ } catch (final InvocationTargetException e) {
+ if (e.getCause() instanceof IOException) throw (IOException) e.getCause();
+ throw new RuntimeException(e);
+ } catch(final Exception e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ /** Stores an immutable graph using a specified subclass.
+ *
+ * @param graphClass the subclass of {@link ImmutableGraph} that should store the graph.
+ * @param graph the graph to store.
+ * @param basename the basename.
+ * @see #store(Class, ImmutableGraph, CharSequence, ProgressLogger)
+ */
+
+ public static void store(final Class<?> graphClass, final ImmutableGraph graph, final CharSequence basename) throws IOException {
+ store(graphClass, graph, basename, UNUSED);
+ }
+
+ /** Compare this immutable graph to another object.
+ *
+ * @return true iff the given object is an immutable graph of the same size, and
+ * the successor list of every node of this graph is equal to the successor list of the corresponding node of <code>o</code>.
+ */
+
+ @Override
+ public boolean equals(final Object o) {
+ if (! (o instanceof ImmutableGraph)) return false;
+ final ImmutableGraph g = (ImmutableGraph) o;
+ int n = numNodes();
+ if (n != g.numNodes()) return false;
+ final NodeIterator i = nodeIterator(), j = g.nodeIterator();
+ int[] s, t;
+ int d;
+ while(n-- != 0) {
+ i.nextInt();
+ j.nextInt();
+ if ((d = i.outdegree()) != j.outdegree()) return false;
+ s = i.successorArray();
+ t = j.successorArray();
+ while(d-- != 0) if (s[d] != t[d]) return false;
+ }
+
+ return true;
+ }
+
+ /** Returns a hash code for this immutable graph.
+ *
+ * @return a hash code for this immutable graph.
+ */
+
+ @Override
+ public int hashCode() {
+ int n = numNodes(), h = -1;
+ final NodeIterator i = nodeIterator();
+ int[] s;
+ int d;
+ while(n-- != 0) {
+ h = h * 31 + i.nextInt();
+ s = i.successorArray();
+ d = i.outdegree();
+ while(d-- != 0) h = h * 31 + s[d];
+ }
+
+ return h;
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableSequentialGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableSequentialGraph.java
new file mode 100644
index 0000000..99c3846
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableSequentialGraph.java
@@ -0,0 +1,54 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract immutable graph that throws an {@link java.lang.UnsupportedOperationException}
+ * on all random-access methods.
+ *
+ * <p>The main purpose of this class is to be used as a base for the numerous anonymous
+ * classes that do not support random access. Note that we override {@link ImmutableGraph}'s
+ * implementation of {@link #nodeIterator(int)}: here we just call
+ * {@link #nodeIterator()} and skip to the desired node. This makes <code>nodeIterator()</code>
+ * and <code>nodeIterator(0)</code> equivalent, which is usually what you want.
+ */
+
+public abstract class ImmutableSequentialGraph extends ImmutableGraph {
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public int[] successorArray(final int x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public int outdegree(final int x) { throw new UnsupportedOperationException(); }
+ /** Returns false.
+ * @return false.
+ */
+ @Override
+ public boolean randomAccess() { return false; }
+
+ @Override
+ public NodeIterator nodeIterator(int from) {
+ final NodeIterator nodeIterator = nodeIterator();
+ while(from-- != 0) nodeIterator.nextInt();
+ return nodeIterator;
+ }
+
+ /** Throws an {@link UnsupportedOperationException}. */
+ @Override
+ public ImmutableGraph copy() { throw new UnsupportedOperationException(); }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableSubgraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableSubgraph.java
new file mode 100644
index 0000000..b9a3a45
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ImmutableSubgraph.java
@@ -0,0 +1,593 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.DataInput;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.io.PrintWriter;
+import java.io.UnsupportedEncodingException;
+import java.util.Arrays;
+import java.util.Properties;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** An induced subgraph of a given immutable graph.
+ *
+ * <P><strong>Warning</strong>: this class is experimental, and might be subject to changes.
+ *
+ * <P>The nodes of the subgraph are
+ * specified via an {@link it.unimi.dsi.fastutil.ints.IntSet} or an array of integers. Of course, each node in the subgraph will have
+ * a different index than the corresponding node in the supergraph. The two methods {@link #toSupergraphNode(int)} and {@link #fromSupergraphNode(int)}
+ * are used to translate indices back and forth.
+ *
+ * <P>An immutable subgraph is stored as a property file (which follows the convention established
+ * in {@link it.unimi.dsi.webgraph.ImmutableGraph}), and as a node subset file. The latter must contain an
+ * (increasing) list of integers in {@link java.io.DataOutput} format representing
+ * the set of nodes of the subgraph.
+ *
+ * <P>The property file, with named <code><var>basename</var>.properties</code>, contains the following entries:
+ * <UL>
+ * <LI><code>supergraphbasename</code>: the basename of the supergraph. Note that this name is system-dependent:
+ * it is suggested that you use a path-free filename.
+ * <LI><code>subgraphnodes</code>: the filename of the node subset file, which must be an list of integers in {@link DataInput} format.
+ * If this property is not present, it is assumed to be <code><var>basename</var>.subgraphnodes</code>.
+ * </UL>
+ *
+ * <P>You can create an immutable subgraph using the public constructor, or you can load one using one of the provided
+ * load methods. Note that there is no <code>store</code> method, because it makes no sense to store a generic {@link it.unimi.dsi.webgraph.ImmutableGraph}
+ * as a subgraph. There is, however, a {@linkplain #save(CharSequence, ProgressLogger) save method} that allows one to save
+ * the files related to a subgraph (the property file and the subgraph node file).
+ *
+ * <H2>Root graphs</H2>
+ *
+ * <P>When creating tree-shaped hierarchies of subgraphs, the methods {@link #rootBasename()}, {@link #fromRootNode(int)}
+ * and {@link #toRootNode(int)} may be used to access information about the root (i.e., the unique highest graph
+ * in the hierarchy: note that it cannot be an <code>ImmutableSubgraph</code>).
+ *
+ * <P>Should you need to treat uniformly a generic immutable graph as an immutable subgraph, the method
+ * {@link #asImmutableSubgraph(ImmutableGraph)} will return a subgraph view of the given immutable graph in which
+ * all to/from functions are identities.
+ */
+public class ImmutableSubgraph extends ImmutableGraph {
+ private final static boolean ASSERTS = false;
+ private final static boolean DEBUG = false;
+
+ /** The standard property key for the name of the file containing the subgraph nodes. */
+ public static final String SUBGRAPHNODES_PROPERTY_KEY = "subgraphnodes";
+ /** The standard property key for the supergraph basename. */
+ public static final String SUPERGRAPHBASENAME_PROPERTY_KEY = "supergraphbasename";
+
+ /** The supergraph. */
+ final protected ImmutableGraph supergraph;
+
+ /** If {@link #supergraph} is an instance of {@link ImmutableSubgraph}, it is cached here. */
+ protected final ImmutableSubgraph supergraphAsSubgraph;
+
+ /** The nodes of the subgraph, in increasing order. */
+ protected final int subgraphNode[];
+
+ /** A mapping from nodes of the supergraph to nodes in the subgraph (-1 for missing nodes). */
+ protected final int supergraphNode[];
+
+ /** The number of nodes in the subgraph. */
+ protected final int subgraphSize;
+
+ /** The number of nodes in the supergraph. */
+ protected final int supergraphNumNodes;
+
+ /** The basename of this immutable subgraph, if it was loaded from disk, or <code>null</code>. */
+ protected CharSequence basename;
+
+ private static final int[] set2sortedArray(final IntSet subgraphNodes) {
+ final int a[] = subgraphNodes.toIntArray();
+ IntArrays.parallelQuickSort(a);
+ return a;
+ }
+
+ /** Creates a new immutable subgraph using a given subset of nodes.
+ *
+ * @param supergraph the supergraph.
+ * @param subgraphNodes the set of nodes defining the induced subgraph.
+ */
+ public ImmutableSubgraph(final ImmutableGraph supergraph, final IntSet subgraphNodes) {
+ this(supergraph, set2sortedArray(subgraphNodes));
+ }
+
+ /** Creates a new immutable subgraph using a given backing node array.
+ *
+ * <P>Note that <code>subgraphNode</code> is <em>not</em> copied: thus, it must not
+ * be modified until this subgraph is no longer in use.
+ *
+ * @param supergraph the supergraph.
+ * @param subgraphNode an increasing array containing the nodes defining the induced subgraph.
+ */
+ public ImmutableSubgraph(final ImmutableGraph supergraph, final int subgraphNode[]) {
+ this.supergraph = supergraph;
+ this.supergraphAsSubgraph = supergraph instanceof ImmutableSubgraph ? (ImmutableSubgraph)supergraph : null;
+ this.subgraphNode = subgraphNode;
+ this.subgraphSize = subgraphNode.length;
+ this.supergraphNumNodes = supergraph.numNodes();
+ this.supergraphNode = new int[supergraphNumNodes];
+ Arrays.fill(supergraphNode, -1);
+ for(int i = subgraphSize; i-- != 0;) supergraphNode[subgraphNode[i]] = i;
+ for (int i = 1; i < subgraphSize; i++)
+ if (subgraphNode[i - 1] >= subgraphNode[i])
+ throw new IllegalArgumentException("The provided integer array is not strictly increasing: " + (i-1) + "-th element is " + subgraphNode[i - 1] + ", " + i + "-th element is " + subgraphNode[i]);
+ if (subgraphSize > 0 && subgraphNode[subgraphSize - 1] >= supergraphNumNodes) throw new IllegalArgumentException("Subnode index out of bounds: " + subgraphNode[subgraphSize - 1]);
+ }
+
+ /** Creates a new immutable subgraph by copying an existing one.
+ *
+ * @param immutableSubgraph an immutable subgraph.
+ */
+ protected ImmutableSubgraph(ImmutableSubgraph immutableSubgraph) {
+ this.supergraphNumNodes = immutableSubgraph.supergraphNumNodes;
+ this.subgraphSize = immutableSubgraph.subgraphSize;
+ this.supergraph = immutableSubgraph.supergraph.copy();
+ this.supergraphAsSubgraph = supergraph instanceof ImmutableSubgraph ? (ImmutableSubgraph)supergraph : null;
+ this.subgraphNode = immutableSubgraph.subgraphNode;
+ this.supergraphNode = immutableSubgraph.supergraphNode;
+ }
+
+ /** Creates a new immutable subgraph by wrapping an immutable graph.
+ *
+ * @param immutableGraph an immutable graph.
+ */
+ protected ImmutableSubgraph(ImmutableGraph immutableGraph) {
+ this.subgraphSize = this.supergraphNumNodes = immutableGraph.numNodes();
+ this.supergraph = immutableGraph;
+ this.supergraphAsSubgraph = null;
+ this.subgraphNode = this.supergraphNode = null;
+ }
+
+ @Override
+ public int numNodes() {
+ return subgraphSize;
+ }
+
+ @Override
+ public long numArcs() {
+ throw new UnsupportedOperationException("Cannot determine the number of arcs in a subgraph");
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return supergraph.randomAccess();
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return supergraph.hasCopiableIterators();
+ }
+
+ @Override
+ public CharSequence basename() {
+ if (basename == null) throw new IllegalStateException("This immutable subgraph has no basename");
+ return basename;
+ }
+
+ /** Returns the basename of the root graph.
+ *
+ * @return the {@linkplain ImmutableGraph#basename() basename} of the root graph.
+ */
+ public CharSequence rootBasename() {
+ return supergraphAsSubgraph != null ? supergraphAsSubgraph.rootBasename() : supergraph.basename();
+ }
+
+ /** Returns the index of a node of this graph in its supergraph.
+ *
+ * @param x an index of a node in this graph.
+ * @return the index of node <code>x</code> in the supergraph.
+ */
+ public int toSupergraphNode(final int x) {
+ if (x < 0 || x >= subgraphSize) throw new IllegalArgumentException();
+ return subgraphNode[x];
+ }
+
+ /** Returns the index of a node of the supergraph in this graph.
+ *
+ * @param x an index of a node in the supergraph.
+ * @return the index of node <code>x</code> in this graph, or a negative value if <code>x</code> does not belong to the subgraph.
+ */
+ public int fromSupergraphNode(final int x) {
+ return supergraphNode[x];
+ }
+
+ /** Returns the index of a node of this graph in its root graph.
+ *
+ * @param x an index of a node in this graph.
+ * @return the index of node <code>x</code> in the root graph.
+ */
+ public int toRootNode(final int x) {
+ return supergraphAsSubgraph != null ? supergraphAsSubgraph.toRootNode(toSupergraphNode(x)) : toSupergraphNode(x);
+ }
+
+ /** Returns the index of a node of the root graph in this graph.
+ *
+ * @param x an index of a node in the root graph.
+ * @return the index of node <code>x</code> in this graph, or a negative value if <code>x</code> does not belong to the root graph.
+ */
+ public int fromRootNode(final int x) {
+ if (supergraphAsSubgraph == null) return fromSupergraphNode(x);
+ final int y = supergraphAsSubgraph.fromRootNode(x);
+ if (y < 0) return -1;
+ return fromSupergraphNode(y);
+ }
+
+ /** If this variable is non-negative, we are caching the successors' array of node <code>cacheNode</code> (in the subgraph). */
+ private int cacheNode = -1;
+
+ /** If <code>cacheNode</code>&gt; 0, this array contains the successors of node <code>cacheNode</code> (in the subgraph). */
+ private int cacheSuccessors[];
+
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ /** The invariant that we are assuming here is the following: at any time, <code>node</code> is the next (subgraph)
+ * node to be returned by {@link #nextInt()}. This variable contain sensible data
+ * only when <code>node</code> &lt; <code>subgraphSize</code>. Moreover, if outdegree >= 0 then it is
+ * the outdegree of <code>node</code>-1, and <code>successorsCache</code> contains the successors. */
+
+ // TODO: decide for a strategy. Note that super.nodeIterator is very dangerous, as it uses random access.
+ return supergraph.randomAccess() && subgraphSize < supergraphNumNodes / 8 ? super.nodeIterator(from) :
+
+ new ImmutableSubgraphNodeIterator(from, Integer.MAX_VALUE);
+ }
+
+ @Override
+ public LazyIntIterator successors(final int x) {
+ return successors(x, supergraph.successors(toSupergraphNode(x)));
+ }
+
+ private LazyIntIterator successors(final int x, final LazyIntIterator supergraphSuccessors) {
+ if (DEBUG) System.err.println(this.getClass().getName() + ".successors(" + x + ", " + supergraphSuccessors + ")");
+
+ if (x < 0 || x >= subgraphSize) throw new IllegalArgumentException();
+ if (cacheNode == x) return LazyIntIterators.wrap(cacheSuccessors);
+
+ if (DEBUG) System.err.println(this.getClass().getName() + ": returning new iterator");
+
+ return new LazyIntIterator() {
+
+ @Override
+ public int nextInt() {
+ int x, result;
+ while ((x = supergraphSuccessors.nextInt()) != -1) {
+ result = supergraphNode[x];
+ if (result >= 0) return result;
+ }
+
+ return -1;
+ }
+
+ @Override
+ public int skip(final int n) {
+ int i;
+ for(i = 0; i < n && nextInt() != -1; i++);
+ return i;
+ }
+ };
+ }
+
+ @Override
+ public int outdegree(final int x) {
+ return outdegree(x, supergraph.successors(toSupergraphNode(x)));
+ }
+
+ public int outdegree(final int x, final LazyIntIterator supergraphSuccessors) {
+ if (x < 0 || x >= subgraphSize) throw new IllegalArgumentException();
+ if (cacheNode == x) return cacheSuccessors.length;
+ // TODO: this is not really efficient--we should reuse the cache.
+ cacheSuccessors = LazyIntIterators.unwrap(successors(x, supergraphSuccessors));
+ cacheNode = x;
+ if (ASSERTS) assert cacheSuccessors != null;
+ return cacheSuccessors.length;
+ }
+
+ private final class ImmutableSubgraphNodeIterator extends NodeIterator {
+ private final int from;
+ /** The current node (the next to be returned). */
+ int node;
+ /** This array caches the successors of the node that was returned last (<code>from</code>-1). */
+ int[] successorsCache = IntArrays.EMPTY_ARRAY;
+ /** The outdegree of the node that was returned last (<code>node</code>-1). */
+ int outdegree = -1;
+ final NodeIterator supergraphNodeIterator;
+ /** No node &ge; this will ever be returned. */
+ final int upperBound;
+
+ private ImmutableSubgraphNodeIterator(final int from, final int to) {
+ this.from = from;
+ node = from;
+ supergraphNodeIterator = supergraph.nodeIterator(subgraphNode[from]);
+ upperBound = to;
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ if (node != from) supergraphNodeIterator.skip(subgraphNode[node] - subgraphNode[node - 1]);
+ else supergraphNodeIterator.nextInt();
+ outdegree = -1;
+ return node++;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return node < Math.min(subgraphSize, upperBound);
+ }
+
+ private void unwrapSuccessors() {
+ int start = 0, done;
+ final LazyIntIterator i = ImmutableSubgraph.this.successors(node - 1, supergraphNodeIterator.successors());
+ // ALERT: we removed i.hasNext() at the end of this check, but it is not necessary
+ while((done = LazyIntIterators.unwrap(i, successorsCache, start, successorsCache.length - start)) == successorsCache.length - start) {
+ start = successorsCache.length;
+ successorsCache = IntArrays.grow(successorsCache, successorsCache.length + 1);
+ }
+ outdegree = start + done;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (node == from) throw new IllegalStateException();
+ if (outdegree == -1) unwrapSuccessors();
+ return successorsCache;
+ }
+
+ @Override
+ public LazyIntIterator successors() {
+ if (node == from) throw new IllegalStateException();
+ if (outdegree == -1) unwrapSuccessors();
+ return LazyIntIterators.wrap(successorsCache, outdegree);
+ }
+
+ @Override
+ public int outdegree() {
+ if (node == from) throw new IllegalStateException();
+ if (outdegree == -1) unwrapSuccessors();
+ return outdegree;
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ ImmutableSubgraphNodeIterator result = new ImmutableSubgraphNodeIterator(from, upperBound);
+ result.node = node;
+ result.outdegree = outdegree;
+ return result;
+ }
+
+ }
+
+ /** A wrapper for immutable graphs, which exhibits them as immutable subgraphs.
+ * Essentially, all functions concerning supergraphs are defined as identities.
+ */
+
+ private static class ImmutableGraphWrapper extends ImmutableSubgraph {
+
+ public ImmutableGraphWrapper(final ImmutableGraph graph) {
+ super(graph);
+ try {
+ basename = graph.basename();
+ }
+ catch(UnsupportedOperationException e) {
+ basename = null;
+ }
+ }
+
+ @Override
+ public NodeIterator nodeIterator() { return supergraph.nodeIterator(); }
+ @Override
+ public NodeIterator nodeIterator(final int from) { return supergraph.nodeIterator(from); }
+ @Override
+ public long numArcs() { return supergraph.numArcs(); }
+ @Override
+ public int numNodes() { return supergraph.numNodes(); }
+ @Override
+ public int outdegree(final int x) { return supergraph.outdegree(x); }
+ @Override
+ public int[] successorArray(final int x) { return supergraph.successorArray(x); }
+ @Override
+ public LazyIntIterator successors(final int x) { return supergraph.successors(x); }
+ @Override
+ public int toSupergraphNode(final int x) { return x; }
+ @Override
+ public int fromSupergraphNode(final int x) { return x; }
+ @Override
+ public int toRootNode(final int x) { return x; }
+ @Override
+ public int fromRootNode(final int x) { return x; }
+ }
+
+ @Override
+ public ImmutableSubgraph copy() {
+ return new ImmutableSubgraph(this);
+ }
+
+ /** Returns a subgraph view of the given immutable graph.
+ *
+ * <P>The wrapper returned by this method may be used whenever immutable
+ * graphs and subgraphs must be mixed.
+ *
+ * @param graph an immutable graph.
+ * @return the given graph, viewed as a trivial subgraph of itself.
+ */
+ public static ImmutableSubgraph asImmutableSubgraph(final ImmutableGraph graph) {
+ return new ImmutableGraphWrapper(graph);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(final CharSequence basename) throws IOException {
+ return load(LoadMethod.STANDARD, basename); // TODO: is this what we really want?
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(final CharSequence basename, final ProgressLogger pl) throws IOException {
+ return load(LoadMethod.STANDARD, basename, pl);
+ }
+
+ public static ImmutableGraph loadOffline(final CharSequence basename) throws IOException {
+ return load(LoadMethod.OFFLINE, basename);
+ }
+
+ public static ImmutableGraph loadOffline(final CharSequence basename, final ProgressLogger pl) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, pl);
+ }
+
+ public static ImmutableGraph load(final CharSequence basename) throws IOException {
+ return load(LoadMethod.STANDARD, basename);
+ }
+
+ public static ImmutableGraph load(final CharSequence basename, final ProgressLogger pl) throws IOException {
+ return load(LoadMethod.STANDARD, basename, pl);
+ }
+
+ private static ImmutableGraph load(final LoadMethod method, final CharSequence basename) throws IOException {
+ return load(method, basename, null);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.MAPPED, basename, pl);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(LoadMethod.MAPPED, basename, null);
+ }
+
+
+ /** Creates a new immutable subgraph by loading the supergraph, delegating the
+ * actual loading to the class specified in the <code>supergraphclass</code> property within the property
+ * file (named <code><var>basename</var>.properties</code>), and loading the subgraph array in memory.
+ * The exact load method to be used depends on the <code>method</code> argument.
+ *
+ * @param method the method to be used to load the supergraph.
+ * @param basename the basename of the graph.
+ * @param pl the progress logger; it can be <code>null</code>.
+ * @return an immutable subgraph containing the specified graph.
+ */
+
+ protected static ImmutableGraph load(final LoadMethod method, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ final String graphClassName = properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY);
+ if (! graphClassName.equals(ImmutableSubgraph.class.getName())) throw new IOException("This class (" + ImmutableSubgraph.class.getName() + ") cannot load a graph stored using " + graphClassName);
+
+ final String supergraphBasename = properties.getProperty(SUPERGRAPHBASENAME_PROPERTY_KEY);
+ if (supergraphBasename == null) throw new IOException("This property file does not specify the required property supergraphbasename");
+
+ final ImmutableGraph supergraph = ImmutableGraph.load(method, supergraphBasename, null, pl);
+
+ if (pl != null) pl.start("Reading nodes...");
+ final String nodes = properties.getProperty(SUBGRAPHNODES_PROPERTY_KEY);
+ final ImmutableSubgraph isg = new ImmutableSubgraph(supergraph, BinIO.loadInts(nodes != null ? nodes : basename + ".nodes"));
+ if (pl != null) {
+ pl.count = isg.numNodes();
+ pl.done();
+ }
+ isg.basename = new MutableString(basename);
+ return isg;
+ }
+
+ /** Throws an {@link UnsupportedOperationException}. */
+ @SuppressWarnings("unused")
+ public static void store(final ImmutableGraph graph, final CharSequence basename, final ProgressLogger pm) {
+ throw new UnsupportedOperationException("You cannot store a generic immutable graph as a subgraph");
+ }
+
+ /** Throws an {@link UnsupportedOperationException}. */
+ public static void store(final ImmutableGraph graph, final CharSequence basename) {
+ store(graph, basename, (ProgressLogger)null);
+ }
+
+ /** Saves this immutable subgraph with a given basename.
+ *
+ * <P>Note that this method will <strong>not</strong> save the
+ * supergraph, but only the subgraph files, that is, the subgraph property file
+ * (with extension <code>.properties</code>) and the file containing
+ * the subgraph nodes (with extension <code>.nodes</code>). A reference
+ * to the supergraph basename will be stored in the property file.
+ *
+ * @param basename the basename to be used to save the subgraph.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public void save(final CharSequence basename, final ProgressLogger pl) throws IOException {
+
+ final Properties properties = new Properties();
+ properties.setProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY, ImmutableSubgraph.class.getName());
+ properties.setProperty(SUPERGRAPHBASENAME_PROPERTY_KEY, supergraph.basename().toString());
+
+ final FileOutputStream propertyFile = new FileOutputStream(basename + PROPERTIES_EXTENSION);
+ properties.store(propertyFile, null);
+ propertyFile.close();
+
+ // Save the subgraph nodes
+ if (pl != null) pl.start("Saving nodes...");
+ BinIO.storeInts(subgraphNode, 0, subgraphNode.length, basename + ".nodes");
+
+ if (pl != null) {
+ pl.count = subgraphNode.length;
+ pl.done();
+ }
+ }
+
+
+ public void save(final CharSequence basename) throws IOException {
+ save(basename, (ProgressLogger)null);
+ }
+
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, JSAPException, UnsupportedEncodingException, FileNotFoundException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(ImmutableSubgraph.class.getName(), "Writes the property file of an immutable subgraph.",
+ new Parameter[] {
+ new UnflaggedOption("supergraphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the supergraph."),
+ new FlaggedOption("subgraphNodes", JSAP.STRING_PARSER, null, JSAP.NOT_REQUIRED, 's', "subgraph-nodes", "Sets a subgraph node file (a list integers in DataInput format). If not specified, the name will be stemmed from the basename."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of resulting immutable subgraph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FileOutputStream(jsapResult.getString("basename") + ImmutableGraph.PROPERTIES_EXTENSION), "UTF-8"));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + ImmutableSubgraph.class.getName());
+ pw.println("supergraphbasename = " + jsapResult.getString("supergraphBasename"));
+ if (jsapResult.userSpecified("subgraphNodes")) pw.println("subgraphnodes = " + jsapResult.getString("subgraphNodes"));
+ pw.close();
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/IncrementalImmutableSequentialGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/IncrementalImmutableSequentialGraph.java
new file mode 100644
index 0000000..1276b97
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/IncrementalImmutableSequentialGraph.java
@@ -0,0 +1,148 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.util.Arrays;
+import java.util.NoSuchElementException;
+import java.util.concurrent.ArrayBlockingQueue;
+
+
+/** An adapter exposing an {@link ImmutableGraph} that can be filled incrementally using
+ * a family of {@linkplain #add(int[], int, int) addition methods} that make it possible to specify
+ * the list of successors of each node in increasing order. At the end of the process, the user
+ * must add the special marker list {@link #END_OF_GRAPH}.
+ *
+ * <p>The class provides a single
+ * call to {@link #nodeIterator()}: once the returned {@link NodeIterator} has been exhausted, {@link #numNodes()} will return the number of nodes,
+ * which will be equal to the number of calls to addition methods.
+ *
+ * <p>The class works using a producer/consumer patten: in a typical usage, the thread invoking the
+ * addition method will be different from the thread performing the traversal, as in
+ * <pre class= code>
+ * final IncrementalImmutableSequentialGraph g = new IncrementalImmutableSequentialGraph();
+ * ExecutorService executor = Executors.newSingleThreadExecutor();
+ * final Future&lt;Void&gt; future = executor.submit(new Callable&lt;Void&gt;() {
+ * public Void call() throws IOException {
+ * BVGraph.store(g, basename);
+ * return null;
+ * }
+ * });
+ *
+ * // Do one add() for each node, to specify the successors
+ *
+ * g.add(IncrementalImmutableSequentialGraph.END_OF_GRAPH);
+ * future.get();
+ * executor.shutdown();
+ *</pre>
+ */
+
+public class IncrementalImmutableSequentialGraph extends ImmutableSequentialGraph {
+ /** A marker for the end of the graph. */
+ public static int[] END_OF_GRAPH = new int[0];
+
+ /** The number of nodes (known after a traversal). */
+ private int n;
+ /** The queue connecting the add methods and node iterator successor mehotds. */
+ private final ArrayBlockingQueue<int[]> successorQueue;
+
+ public IncrementalImmutableSequentialGraph() {
+ n = -1;
+ this.successorQueue = new ArrayBlockingQueue<>(100);
+ }
+
+ @Override
+ public int numNodes() {
+ if (n == -1) throw new UnsupportedOperationException("The number of nodes is unknown (you need to complete a traversal)");
+ return n;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ if (n != -1) throw new IllegalStateException();
+ return new NodeIterator() {
+ int i = 0;
+ private int[] currentSuccessor;
+ private int[] nextSuccessor;
+ @Override
+ public boolean hasNext() {
+ if (nextSuccessor == END_OF_GRAPH) return false;
+ if (nextSuccessor != null) return true;
+
+ try {
+ nextSuccessor = successorQueue.take();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e.getMessage(), e);
+ }
+
+ final boolean end = nextSuccessor == END_OF_GRAPH;
+ if (end) n = i;
+ return ! end;
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ currentSuccessor = nextSuccessor;
+ nextSuccessor = null;
+ return i++;
+ }
+
+ @Override
+ public int outdegree() {
+ if (currentSuccessor == null) throw new IllegalStateException();
+ return currentSuccessor.length;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (currentSuccessor == null) throw new IllegalStateException();
+ return currentSuccessor;
+ }
+
+ @Override
+ public NodeIterator copy(int k) {
+ throw new UnsupportedOperationException();
+ }
+ };
+ }
+
+ /** Adds a new node having as successors contained in the specified array fragment.
+ *
+ *
+ * <p>The array must be sorted in increasing order.
+ *
+ * @param successor an array.
+ * @param offset the first valid entry in <code>successor</code>.
+ * @param length the number of valid entries.
+ */
+ public void add(final int[] successor, final int offset, final int length) throws InterruptedException {
+ successorQueue.put(Arrays.copyOfRange(successor, offset, offset + length));
+ }
+
+ /** Adds a new node having as successors contained in the specified array.
+ *
+ * <p>The array must be sorted in increasing order.
+ *
+ * @param successor an array.
+ */
+ public void add(final int[] successor) throws InterruptedException {
+ successorQueue.put(successor);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/IntIntervalSequenceIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/IntIntervalSequenceIterator.java
new file mode 100644
index 0000000..4d9e8f5
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/IntIntervalSequenceIterator.java
@@ -0,0 +1,98 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+/** An iterator returning the integers contained in a sequence of intervals. */
+public class IntIntervalSequenceIterator implements LazyIntIterator {
+
+ /** The left extremes. */
+ private final int left[];
+ /** The lengths. */
+ private final int len[];
+ /** The number of remaining intervals (including the current one). It is zero exactly when the iterator is exhausted. */
+ private int remaining;
+ /** The index of the current interval. */
+ private int currInterval;
+ /** The current position in the current interval: the next integer to be output is {@link #currLeft} + {@link #currIndex}. */
+ private int currIndex;
+ /** The left point of the current interval. */
+ private int currLeft;
+
+ /** Creates a new interval-sequence iterator by specifying
+ * arrays of left extremes and lengths. Note that the two arrays are <em>not</em> copied,
+ * so they are supposed not to be changed during the iteration.
+ *
+ * @param left an array containing the left extremes of the intervals generating this iterator.
+ * @param len an array (of the same length as <code>left</code>) containing the number of integers (greater than zero) in each interval.
+ */
+
+ public IntIntervalSequenceIterator(final int left[], final int len[]) {
+ this(left, len, left.length);
+ }
+
+ /** Creates a new interval-sequence iterator by specifying
+ * arrays of left extremes and lengths, and the number of valid entries. Note that the two arrays are <em>not</em> copied,
+ * so they are supposed not to be changed during the iteration.
+ *
+ * @param left an array containing the left extremes of the intervals generating this iterator.
+ * @param len an array (of the same length as <code>left</code>) containing the number of integers (greater than zero) in each interval.
+ * @param n the number of valid entries in <code>left</code> and <code>len</code>.
+ */
+
+ public IntIntervalSequenceIterator(final int left[], final int len[], final int n) {
+ this.left = left;
+ this.len = len;
+ this.remaining = n;
+ if (n != 0) currLeft = left[0];
+ }
+
+ private void advance() {
+ remaining--;
+ if (remaining != 0) currLeft = left[++currInterval];
+ currIndex = 0;
+ }
+
+ @Override
+ public int nextInt() {
+ if (remaining == 0) return -1;
+
+ final int next = currLeft + currIndex++;
+ if (currIndex == len[currInterval]) advance();
+ return next;
+ }
+
+ @Override
+ public int skip(final int n) {
+ int skipped = 0;
+
+ while(skipped < n && remaining != 0) {
+ if (n - skipped < len[currInterval] - currIndex) {
+ currIndex += (n - skipped);
+ return n;
+ }
+ else {
+ skipped += len[currInterval] - currIndex;
+ advance();
+ }
+ }
+
+ return skipped;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntIterator.java
new file mode 100644
index 0000000..d6a1e05
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntIterator.java
@@ -0,0 +1,45 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A lazy iterator over the integers.
+ *
+ * <p>An instance of this class represent a (skippable) iterator over the integers.
+ * The iterator is exhausted when an implementation-dependent special marker is
+ * returned. This fully lazy architecture halves the number of method
+ * calls w.r.t. Java's eager iterators.
+ */
+
+public interface LazyIntIterator {
+ /** The next integer returned by this iterator, or the special
+ * marker if this iterator is exhausted.
+ *
+ * @return next integer returned by this iterator, or the special
+ * marker if this iterator is exhausted.
+ */
+ public int nextInt();
+
+ /** Skips a given number of elements.
+ *
+ * @param n the number of elements to skip.
+ * @return the number of elements actually skipped (which might
+ * be less than <code>n</code> if this iterator is exhausted).
+ */
+ public int skip(int n);
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntIterators.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntIterators.java
new file mode 100644
index 0000000..b80a083
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntIterators.java
@@ -0,0 +1,249 @@
+package it.unimi.dsi.webgraph;
+
+import java.util.NoSuchElementException;
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+/** A class providing static methods and objects that do useful
+ * things with {@linkplain LazyIntIterator lazy integer iterators}. */
+
+public class LazyIntIterators {
+
+ protected LazyIntIterators() {}
+
+ /** An empty lazy iterator. */
+ public final static LazyIntIterator EMPTY_ITERATOR = new LazyIntIterator() {
+ @Override
+ public int nextInt() { return -1; }
+ @Override
+ public int skip(final int n) { return 0; }
+ };
+
+ /** Unwraps the elements returned by a lazy iterator into an array.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @param array an array.
+ * @return the number of elements unwrapped into <code>array</code> starting from index 0.
+ */
+ public static int unwrap(final LazyIntIterator lazyIntIterator, final int array[]) {
+ int j, t, l = array.length;
+ for(j = 0; j < l && (t = lazyIntIterator.nextInt()) != -1; j++) array[j] = t;
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into an array fragment.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @param array an array.
+ * @param offset the index of the first element ot <code>array</code> to be used.
+ * @param length the maximum number of elements to be unwrapped.
+ * @return the number of elements unwrapped into <code>array</code> starting from index <code>offset</code>.
+ */
+ public static int unwrap(final LazyIntIterator lazyIntIterator, final int array[], final int offset, final int length) {
+ int j, t, l = Math.min(length, array.length - offset);
+ for(j = 0; j < l && (t = lazyIntIterator.nextInt()) != -1; j++) array[offset + j] = t;
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a new array.
+ *
+ * <p>If you need the resulting array to contain the
+ * elements returned by <code>lazyIntIterator</code>, but some more elements set to zero
+ * would cause no harm, consider using {@link #unwrapLoosely(LazyIntIterator)}, which
+ * usually avoids a final call to {@link IntArrays#trim(int[], int)}.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @return an array containing the elements returned by <code>lazyIntIterator</code>.
+ * @see #unwrapLoosely(LazyIntIterator)
+ */
+ public static int[] unwrap(final LazyIntIterator lazyIntIterator) {
+ int array[] = new int[16];
+ int j = 0, t;
+
+ while((t = lazyIntIterator.nextInt()) != -1) {
+ if (j == array.length) array = IntArrays.grow(array, j + 1);
+ array[j++] = t;
+ }
+
+ return IntArrays.trim(array, j);
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a new array that can contain additional entries set to zero.
+ *
+ * <p>If you need the resulting array to contain <em>exactly</em> the
+ * elements returned by <code>lazyIntIterator</code>, consider using {@link #unwrap(LazyIntIterator)}, but this
+ * method avoids a final call to {@link IntArrays#trim(int[], int)}.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @return an array containing the elements returned by <code>lazyIntIterator</code>; note
+ * that in general it might contains some final zeroes beyond the elements returned by <code>lazyIntIterator</code>,
+ * so the number of elements actually written into <code>array</code> must be known externally.
+ * @see #unwrap(LazyIntIterator)
+ */
+ public static int[] unwrapLoosely(final LazyIntIterator lazyIntIterator) {
+ int array[] = new int[16];
+ int j = 0, t;
+
+ while((t = lazyIntIterator.nextInt()) != -1) {
+ if (j == array.length) array = IntArrays.grow(array, j + 1);
+ array[j++] = t;
+ }
+
+ return array;
+ }
+
+ /** A lazy iterator returning the elements of a given array. */
+
+ private static final class ArrayLazyIntIterator implements LazyIntIterator {
+ /** The underlying array. */
+ private final int[] a;
+ /** The number of valid elements in {@link #a}, starting from 0. */
+ private final int length;
+ /** The next element of {@link #a} that will be returned. */
+ private int pos;
+
+ public ArrayLazyIntIterator(final int a[], final int length) {
+ this.a = a;
+ this.length = length;
+ }
+
+ @Override
+ public int nextInt() {
+ if (pos == length) return -1;
+ return a[pos++];
+ }
+
+ @Override
+ public int skip(final int n) {
+ final int toSkip = Math.min(n, length - pos);
+ pos += toSkip;
+ return toSkip;
+ }
+ }
+
+ /** Returns a lazy integer iterator enumerating the given number of elements of an array.
+ *
+ * @param array an array.
+ * @param length the number of elements to enumerate.
+ * @return a lazy integer iterator enumerating the first <code>length</code> elements of <code>array</code>.
+ */
+
+ public static LazyIntIterator wrap(final int array[], final int length) {
+ if (length == 0) return EMPTY_ITERATOR;
+ return new ArrayLazyIntIterator(array, length);
+ }
+
+ /** Returns a lazy integer iterator enumerating the elements of an array.
+ *
+ * @param array an array.
+ * @return a lazy integer iterator enumerating the elements of <code>array</code>.
+ */
+
+ public static LazyIntIterator wrap(final int array[]) {
+ return wrap(array, array.length);
+ }
+
+ /** An adapter from lazy to eager iteration. */
+ private static final class LazyToEagerIntIterator implements IntIterator {
+ /** The underlying lazy iterator. */
+ private final LazyIntIterator lazyIntIterator;
+ /** Whether this iterator has been already advanced, that is, whether {@link #next} is valid. */
+ private boolean advanced;
+ /** The next value to be returned, if {@link #advanced} is true. */
+ private int next;
+
+ public LazyToEagerIntIterator(final LazyIntIterator lazyIntIterator) {
+ this.lazyIntIterator = lazyIntIterator;
+ }
+
+ @Override
+ public boolean hasNext() {
+ if (! advanced) {
+ advanced = true;
+ next = lazyIntIterator.nextInt();
+ }
+ return next != -1;
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ advanced = false;
+ return next;
+ }
+
+ @Override
+ public int skip(final int n) {
+ if (n == 0) return 0;
+ final int increment = advanced ? 1 : 0;
+ advanced = false;
+ return lazyIntIterator.skip(n - increment) + increment;
+ }
+ }
+
+ /** Returns an eager {@link IntIterator} enumerating the same elements of
+ * a given lazy integer iterator.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @return an eager {@link IntIterator} enumerating the same elements of
+ * <code>lazyIntIterator</code>.
+ */
+
+ public static IntIterator eager(final LazyIntIterator lazyIntIterator) {
+ return new LazyToEagerIntIterator(lazyIntIterator);
+ }
+
+
+ private static final class EagerToLazyIntIterator implements LazyIntIterator {
+ private final IntIterator underlying;
+
+
+ public EagerToLazyIntIterator(final IntIterator underlying) {
+ this.underlying = underlying;
+ }
+
+ @Override
+ public int nextInt() {
+ return underlying.hasNext() ? underlying.nextInt() : -1;
+ }
+
+ @Override
+ public int skip(final int n) {
+ return underlying.skip(n);
+ }
+
+ }
+
+ /** Returns a {@link LazyIntIterator} enumerating the same elements of
+ * a given eager integer iterator.
+ *
+ * @param eagerIntIterator an eager integer iterator.
+ * @return a lazy integer iterator enumerating the same elements of
+ * <code>eagerIntIterator</code>.
+ */
+
+ public static LazyIntIterator lazy(final IntIterator eagerIntIterator) {
+ return new EagerToLazyIntIterator(eagerIntIterator);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntSkippableIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntSkippableIterator.java
new file mode 100644
index 0000000..3959fb5
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/LazyIntSkippableIterator.java
@@ -0,0 +1,46 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A skippable {@linkplain LazyIntIterator lazy iterator over the integers}.
+ *
+ * <p>An instance of this class represent an iterator over integers
+ * that returns elements in increasing order. The iterator makes it possible to {@linkplain #skipTo(int) skip elements
+ * by <em>value</em>}.
+ */
+
+public interface LazyIntSkippableIterator extends LazyIntIterator {
+ public static final int END_OF_LIST = Integer.MAX_VALUE;
+
+ /** Skips to a given element.
+ *
+ * <p>Note that this interface is <em>fragile</em>: after {@link #END_OF_LIST}
+ * has been returned, the behavour of further calls to this method will be
+ * unpredictable.
+ *
+ * @param lowerBound a lower bound to the returned element.
+ * @return if the last returned element is greater than or equal to
+ * {@code lowerBound}, the last returned element; otherwise,
+ * the smallest element greater
+ * than or equal to <code>lowerBound</code> that would be
+ * returned by this iterator, or {@link #END_OF_LIST}
+ * if no such element exists.
+ */
+ public int skipTo(int lowerBound);
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/MaskedIntIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/MaskedIntIterator.java
new file mode 100644
index 0000000..a759b5e
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/MaskedIntIterator.java
@@ -0,0 +1,129 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An iterator returning the element of an underlying iterator but filters
+ * them using a inclusion-exclusion block list.
+ *
+ * <p>A <em>mask</em> is an array of integers. The sum of the values contained in the mask
+ * must not exceed the number of elements returned by the underlying iterator. Moreover, all integers in the mask
+ * must be positive, except possibly for the first one, which may be zero.
+ *
+ * <P>Mask values are interpreted as specifying inclusion-exclusion blocks.
+ * Suppose that the underlying iterator returns <var>N</var> values, and that the mask is
+ * <var>n</var><sub>0</sub>, <var>n</var><sub>1</sub>, &hellip;, <var>n</var><sub>k</sub>.
+ * Then, the first <var>n</var><sub>0</sub> values returned by the underlying iterator must be kept,
+ * the next <var>n</var><sub>1</sub> values must be ignored, the next <var>n</var><sub>2</sub> must be
+ * kept and so on. The last <var>N</var>&minus;(<var>n</var><sub>0</sub>+&hellip;+<var>n</var><sub>k</sub>)
+ * must be kept if <var>k</var> is odd, and must be ignored otherwise.
+ * An instance of this class will returns the kept values only, in increasing order.
+ */
+
+public class MaskedIntIterator implements LazyIntIterator {
+ private final static boolean ASSERTS = false;
+
+ /** The underlying iterator. */
+ private final LazyIntIterator underlying;
+ /** The mask. */
+ private final int mask[];
+ /** The mask. */
+ private final int maskLen;
+ /** This index in mask always represents an exclusion block. */
+ private int currMask;
+ /** How many integers are left in the current inclusion block. If <code>0</code> everything left must be discarded; if
+ * <code>-1</code> all remaining values must be kept. */
+ private int left;
+
+ /** Creates a new masked iterator using a given mask and underlying iterator.
+ *
+ * @param mask a mask, or <code>null</code>, meaning an empty mask (everything is copied).
+ * @param underlying an underlying iterator.
+ */
+ public MaskedIntIterator(final int mask[], final LazyIntIterator underlying) {
+ this(mask, mask == null ? 0 : mask.length, underlying);
+ }
+
+ /** Creates a new masked iterator using a given mask, mask length and underlying iterator.
+ *
+ * @param mask a mask, or <code>null</code>, meaning an empty mask (everything is copied).
+ * @param maskLen an explicit mask length.
+ * @param underlying an underlying iterator.
+ */
+ public MaskedIntIterator(final int mask[], final int maskLen, final LazyIntIterator underlying) {
+
+ this.mask = mask;
+ this.maskLen = maskLen;
+ this.underlying = underlying;
+
+ if (maskLen != 0) {
+ left = mask[currMask++];
+ advance();
+ }
+ else left = -1;
+ }
+
+ @Override
+ public int nextInt() {
+ if (left == 0) return -1;
+ final int next = underlying.nextInt();
+
+ if (left == -1 || next == -1) return next;
+ if (left > 0) {
+ left--;
+ advance();
+ }
+ return next;
+ }
+
+ private void advance() {
+ if (ASSERTS) assert left != -1;
+ if (left == 0 && currMask < maskLen) {
+ underlying.skip(mask[currMask++]);
+ left = currMask < maskLen ? mask[currMask++] : -1;
+ }
+ }
+
+ @Override
+ public int skip(final int n) {
+ int skipped = 0;
+
+ while(skipped < n && left != 0) {
+ if (left == -1) {
+ final int result = underlying.skip(n - skipped);
+ skipped += result;
+ if (skipped < n) break;
+ }
+ else {
+ if (n - skipped < left) {
+ underlying.skip(n - skipped);
+ left -= (n - skipped);
+ return n;
+ }
+ else {
+ underlying.skip(left);
+ skipped += left;
+ left = 0;
+ advance();
+ }
+ }
+ }
+
+ return skipped;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/MergedIntIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/MergedIntIterator.java
new file mode 100644
index 0000000..e238285
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/MergedIntIterator.java
@@ -0,0 +1,97 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+/** An iterator returning the union of the integers returned by two {@link IntIterator}s.
+ * The two iterators must return integers in an increasing fashion; the resulting
+ * {@link MergedIntIterator} will do the same. Duplicates will be eliminated.
+ */
+
+public class MergedIntIterator implements LazyIntIterator {
+ /** The first component iterator. */
+ private final LazyIntIterator it0;
+ /** The second component iterator. */
+ private final LazyIntIterator it1;
+ /** The last integer returned by {@link #it0}. */
+ private int curr0;
+ /** The last integer returned by {@link #it1}. */
+ private int curr1;
+
+ /** Creates a new merged iterator by merging two given iterators; the resulting iterator will not emit more than <code>n</code> integers.
+ *
+ * @param it0 the first (monotonically nondecreasing) component iterator.
+ * @param it1 the second (monotonically nondecreasing) component iterator.
+ */
+ public MergedIntIterator(final LazyIntIterator it0, final LazyIntIterator it1) {
+ this.it0 = it0;
+ this.it1 = it1;
+ curr0 = it0.nextInt();
+ curr1 = it1.nextInt();
+ }
+
+ @Override
+ public int nextInt() {
+ if (curr0 < curr1) {
+ if (curr0 == -1) {
+ final int result = curr1;
+ curr1 = it1.nextInt();
+ return result;
+ }
+
+ final int result = curr0;
+ curr0 = it0.nextInt();
+ return result;
+ }
+ else {
+ if (curr1 == -1) {
+ final int result = curr0;
+ curr0 = it0.nextInt();
+ return result;
+ }
+
+ final int result = curr1;
+ if (curr0 == curr1) curr0 = it0.nextInt();
+ curr1 = it1.nextInt();
+ return result;
+ }
+ }
+
+ @Override
+ public int skip(final int s) {
+ int i;
+ for(i = 0; i < s; i++) {
+ if (curr0 == -1 && curr1 == -1) break;
+
+ if (curr0 < curr1) {
+ if (curr0 == -1) curr1 = it1.nextInt();
+ else curr0 = it0.nextInt();
+ }
+ else {
+ if (curr1 == -1) curr0 = it0.nextInt();
+ else {
+ if (curr0 == curr1) curr0 = it0.nextInt();
+ curr1 = it1.nextInt();
+ }
+ }
+ }
+ return i;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/NodeIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/NodeIterator.java
new file mode 100644
index 0000000..27f364c
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/NodeIterator.java
@@ -0,0 +1,87 @@
+package it.unimi.dsi.webgraph;
+
+import java.util.NoSuchElementException;
+
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+/** This interface extends {@link IntIterator} and is used to scan a graph, that is, to read its nodes and their successor lists
+ * sequentially. The {@link #nextInt()} method returns the node that will be scanned. After a call to this method, calling
+ * {@link #successors()} or {@link #successorArray()} will return the list of successors.
+ *
+ * <p>Implementing subclasses can override either {@link #successors()} or
+ * {@link #successorArray()}, but at least one of them <strong>must</strong> be implemented.
+ *
+ * <p>The {@link #copy(int)} methods is in fact optional, but should be implemented whenever the graph can be
+ * scanned more than once.
+ */
+public abstract class NodeIterator implements IntIterator {
+
+ /** An empty node iterator.
+ */
+ public static final NodeIterator EMPTY = new NodeIterator() {
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ return this;
+ }
+ @Override
+ public boolean hasNext() {
+ return false;
+ }
+ @Override
+ public int outdegree() {
+ throw new IllegalStateException();
+ }
+ @Override
+ public int nextInt() {
+ throw new NoSuchElementException();
+ }
+ };
+
+ /** Returns the outdegree of the current node.
+ *
+ * @return the outdegree of the current node.
+ */
+ public abstract int outdegree();
+
+ /** Returns a lazy iterator over the successors of the current node. The iteration terminates
+ * when -1 is returned.
+ *
+ * <P>This implementation just wraps the array returned by {@link #successorArray()}.
+ *
+ * @return a lazy iterator over the successors of the current node.
+ */
+ public LazyIntIterator successors() {
+ return LazyIntIterators.wrap(successorArray(), outdegree());
+ }
+
+ /** Returns a reference to an array containing the successors of the current node.
+ *
+ * <P>The returned array may contain more entries than the outdegree of the current node.
+ * However, only those with indices from 0 (inclusive) to the outdegree of the current node (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwrap the iterator returned by {@link #successors()}.
+ *
+ * @return an array whose first elements are the successors of the current node; the array must not
+ * be modified by the caller.
+ */
+ public int[] successorArray() {
+ final int[] successor = new int[outdegree()];
+ LazyIntIterators.unwrap(successors(), successor);
+ return successor;
+ }
+
+ /** Creates a copy of this iterator that will never return nodes &ge; the specified bound; the copy
+ * must be accessible by a different thread. Optional operation (it should be implemented by all classes that allow
+ * to scan the graph more than once).
+ *
+ * <p>This implementation just throws an {@link UnsupportedOperationException}. It should be kept
+ * in sync with the result of {@link ImmutableGraph#hasCopiableIterators()}.
+ *
+ * @param upperBound the upper bound.
+ * @return a copy of this iterator, with the given upper bound.
+ */
+ public NodeIterator copy(int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ScatteredArcsASCIIGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ScatteredArcsASCIIGraph.java
new file mode 100644
index 0000000..78519c2
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ScatteredArcsASCIIGraph.java
@@ -0,0 +1,837 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.fastutil.HashCommon.bigArraySize;
+import static it.unimi.dsi.fastutil.HashCommon.maxFill;
+import it.unimi.dsi.fastutil.BigArrays;
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.booleans.BooleanBigArrays;
+import it.unimi.dsi.fastutil.bytes.ByteArrays;
+import it.unimi.dsi.fastutil.ints.IntBigArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.objects.Object2IntFunction;
+import it.unimi.dsi.fastutil.objects.Object2LongFunction;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.mph.GOV3Function;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.Charset;
+import java.util.Iterator;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+/** An {@link ImmutableGraph} that corresponds to a graph stored as a scattered list of arcs.
+ *
+ * <p>A <em>scattered list of arcs</em> describes a graph in a fairly loose way. Each line
+ * contains an arc specified as two node identifiers separated by whitespace
+ * (but we suggest exactly one TAB character). Sources and targets can be in any order.
+ *
+ * <p>In the <em>standard</em> description, node
+ * identifiers can be in the range [-2<sup>63</sup>..2<sup>63</sup>): they will be remapped
+ * in a compact identifier space by assigning to each newly seen identifier a new node number. The
+ * list of identifiers in order of appearance is available in {@link #ids}.
+ * Lines can be empty, or comments starting with <code>#</code>. Characters following the
+ * target will be discarded with a warning.
+ *
+ * <p><strong>Warning:</strong> Lines not conforming the above specification
+ * will cause an error to be logged, but will be otherwise ignored.
+ *
+ * <p>Alternatively, you can {@linkplain #ScatteredArcsASCIIGraph(InputStream, Object2LongFunction, Charset, int, boolean, boolean, int, File, ProgressLogger) provide}
+ * an {@link Object2LongFunction Object2LongFunction&lt;String>} with default return
+ * value -1 that will be used to map identifiers to node numbers, along with a {@link Charset} to parse lines and
+ * the number of nodes of the graph (which must be a strict upper bound for the largest value returned by the function).
+ * Note that in principle an {@link Object2IntFunction} would be sufficient, but we want to make easier
+ * using functions from Sux4J such as {@link GOV3Function}.
+ *
+ * <p>Additionally, the resulting graph can be symmetrized, and its loops be removed, using
+ * {@linkplain #ScatteredArcsASCIIGraph(InputStream, boolean, boolean, int, File, ProgressLogger) suitable constructor options}.
+ *
+ * <P>This class has no load method, and its main method converts an scattered-arcs representation
+ * directly into a {@link BVGraph}.
+ *
+ * <h2>Using {@link ScatteredArcsASCIIGraph} to convert your data</h2>
+ *
+ * <p>A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by scattered arcs. Suppose you
+ * create the following file, named <code>example.arcs</code>:
+ * <pre>
+ * # My graph
+ * -1 15
+ * 15 2
+ * 2 -1 This will cause a warning to be logged
+ * OOPS! (This will cause an error to be logged)
+ * -1 2
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example &lt;example.arcs
+ * </pre>
+ * will produce a compressed graph in {@link it.unimi.dsi.webgraph.BVGraph} format
+ * with basename <code>example</code>. The file <code>example.ids</code> will contain
+ * the list of longs -1, 15, 2. The node with identifer -1 will be the node 0 in the
+ * output graph, the node with identifier 15 will be node 1, and the node with identifier 2 will be node 2.
+ * The graph <code>example</code> will thus have three nodes and four arcs (viz., &lt;0,1&gt;, &lt;0,2&gt;, &lt;1,2&gt; and &lt;2,0&gt;).
+ *
+ * <h2>Memory requirements</h2>
+ *
+ * <p>To convert node identifiers to node numbers, instances of this class use a custom map that in the
+ * worst case will require 19.5&times;2<sup><big>&lceil;</big>log(4<var>n</var>/3)<big>&rceil;</big></sup>&nbsp;&le;&nbsp;52<var>n</var> bytes,
+ * where <var>n</var> is the number of distinct identifiers. Storing batches of arc in memory requires 12 bytes per arc.
+ */
+
+
+public class ScatteredArcsASCIIGraph extends ImmutableSequentialGraph {
+ private static final Logger LOGGER = LoggerFactory.getLogger(ScatteredArcsASCIIGraph.class);
+ private final static boolean DEBUG = false;
+
+ /** The default batch size. */
+ public static final int DEFAULT_BATCH_SIZE = 1000000;
+ /** The extension of the identifier file (a binary list of longs). */
+ private static final String IDS_EXTENSION = ".ids";
+ /** The batch graph used to return node iterators. */
+ private final Transform.BatchGraph batchGraph;
+ /** The list of identifiers in order of appearance. */
+ public long[] ids;
+
+ private static final class Long2IntOpenHashBigMap implements java.io.Serializable, Cloneable, Hash {
+ public static final long serialVersionUID = 0L;
+
+ /** The big array of keys. */
+ public transient long[][] key;
+
+ /** The big array of values. */
+ public transient int[][] value;
+
+ /** The big array telling whether a position is used. */
+ protected transient boolean[][] used;
+
+ /** The acceptable load factor. */
+ protected final float f;
+
+ /** The current table size (always a power of 2). */
+ protected transient long n;
+
+ /** Threshold after which we rehash. It must be the table size times {@link #f}. */
+ protected transient long maxFill;
+
+ /** The mask for wrapping a position counter. */
+ protected transient long mask;
+
+ /** The mask for wrapping a segment counter. */
+ protected transient int segmentMask;
+
+ /** The mask for wrapping a base counter. */
+ protected transient int baseMask;
+
+ /** Number of entries in the set. */
+ protected long size;
+
+ /** Initialises the mask values. */
+ private void initMasks() {
+ mask = n - 1;
+ /*
+ * Note that either we have more than one segment, and in this case all segments are
+ * BigArrays.SEGMENT_SIZE long, or we have exactly one segment whose length is a power of
+ * two.
+ */
+ segmentMask = key[0].length - 1;
+ baseMask = key.length - 1;
+ }
+
+ /**
+ * Creates a new hash big set.
+ *
+ * <p>The actual table size will be the least power of two greater than
+ * <code>expected</code>/<code>f</code>.
+ *
+ * @param expected the expected number of elements in the set.
+ * @param f the load factor.
+ */
+ public Long2IntOpenHashBigMap(final long expected, final float f) {
+ if (f <= 0 || f > 1) throw new IllegalArgumentException("Load factor must be greater than 0 and smaller than or equal to 1");
+ if (n < 0) throw new IllegalArgumentException("The expected number of elements must be nonnegative");
+ this.f = f;
+ n = bigArraySize(expected, f);
+ maxFill = maxFill(n, f);
+ key = LongBigArrays.newBigArray(n);
+ value = IntBigArrays.newBigArray(n);
+ used = BooleanBigArrays.newBigArray(n);
+ initMasks();
+ }
+
+ /**
+ * Creates a new hash big set with initial expected {@link Hash#DEFAULT_INITIAL_SIZE} elements
+ * and {@link Hash#DEFAULT_LOAD_FACTOR} as load factor.
+ */
+
+ public Long2IntOpenHashBigMap() {
+ this(DEFAULT_INITIAL_SIZE, DEFAULT_LOAD_FACTOR);
+ }
+
+ public int put(final long k, final int v) {
+ final long h = it.unimi.dsi.fastutil.HashCommon.murmurHash3(k);
+
+ // The starting point.
+ int displ = (int)(h & segmentMask);
+ int base = (int)((h & mask) >>> BigArrays.SEGMENT_SHIFT);
+
+ // There's always an unused entry.
+ while (used[base][displ]) {
+ if (k == key[base][displ]) {
+ final int oldValue = value[base][displ];
+ value[base][displ] = v;
+ return oldValue;
+ }
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ }
+
+ used[base][displ] = true;
+ key[base][displ] = k;
+ value[base][displ] = v;
+
+ if (++size >= maxFill) rehash(2 * n);
+ return -1;
+ }
+
+ public int get(final long k) {
+ final long h = it.unimi.dsi.fastutil.HashCommon.murmurHash3(k);
+
+ // The starting point.
+ int displ = (int)(h & segmentMask);
+ int base = (int)((h & mask) >>> BigArrays.SEGMENT_SHIFT);
+
+ // There's always an unused entry.
+ while (used[base][displ]) {
+ if (k == key[base][displ]) return value[base][displ];
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ }
+
+ return -1;
+ }
+
+ protected void rehash(final long newN) {
+ final boolean used[][] = this.used;
+ final long key[][] = this.key;
+ final int[][] value = this.value;
+ final boolean newUsed[][] = BooleanBigArrays.newBigArray(newN);
+ final long newKey[][] = LongBigArrays.newBigArray(newN);
+ final int newValue[][] = IntBigArrays.newBigArray(newN);
+ final long newMask = newN - 1;
+ final int newSegmentMask = newKey[0].length - 1;
+ final int newBaseMask = newKey.length - 1;
+
+ int base = 0, displ = 0;
+ long h;
+ long k;
+
+ for (long i = size; i-- != 0;) {
+
+ while (!used[base][displ])
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0));
+
+ k = key[base][displ];
+ h = it.unimi.dsi.fastutil.HashCommon.murmurHash3(k);
+
+ // The starting point.
+ int d = (int)(h & newSegmentMask);
+ int b = (int)((h & newMask) >>> BigArrays.SEGMENT_SHIFT);
+
+ while (newUsed[b][d])
+ b = (b + ((d = (d + 1) & newSegmentMask) == 0 ? 1 : 0)) & newBaseMask;
+
+ newUsed[b][d] = true;
+ newKey[b][d] = k;
+ newValue[b][d] = value[base][displ];
+
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0));
+ }
+
+ this.n = newN;
+ this.key = newKey;
+ this.value = newValue;
+ this.used = newUsed;
+ initMasks();
+ maxFill = maxFill(n, f);
+ }
+
+ public void compact() {
+ int base = 0, displ = 0, b = 0, d = 0;
+ for(long i = size; i-- != 0;) {
+ while (! used[base][displ]) base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ key[b][d] = key[base][displ];
+ value[b][d] = value[base][displ];
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ b = (b + ((d = (d + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ }
+ }
+
+ public long size() {
+ return size;
+ }
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is) throws IOException {
+ this(is, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize) throws IOException {
+ this(is, symmetrize, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops) throws IOException {
+ this(is, symmetrize, noLoops, DEFAULT_BATCH_SIZE);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops, final int batchSize) throws IOException {
+ this(is, symmetrize, noLoops, batchSize, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir) throws IOException {
+ this(is, symmetrize, noLoops, batchSize, tempDir, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ this(is, null, null, -1, symmetrize, noLoops, batchSize, tempDir, pl);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final int n) throws IOException {
+ this(is, function, charset, n, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final int n, final boolean symmetrize) throws IOException {
+ this(is, function, charset, n, symmetrize, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final int n, final boolean symmetrize, final boolean noLoops) throws IOException {
+ this(is, function, charset, n, symmetrize, noLoops, DEFAULT_BATCH_SIZE);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final int n, final boolean symmetrize, final boolean noLoops, final int batchSize) throws IOException {
+ this(is, function, charset, n, symmetrize, noLoops, batchSize, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final int n, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir) throws IOException {
+ this(is, function, charset, n, symmetrize, noLoops, batchSize, tempDir, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, Charset charset, final int n, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ @SuppressWarnings("resource")
+ final FastBufferedInputStream fbis = new FastBufferedInputStream(is);
+ Long2IntOpenHashBigMap map = new Long2IntOpenHashBigMap();
+
+ int numNodes = -1;
+ if (charset == null) charset = Charset.forName("ISO-8859-1");
+
+ int j;
+ int[] source = new int[batchSize] , target = new int[batchSize];
+ ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ if (pl != null) {
+ pl.itemsName = "arcs";
+ pl.start("Creating sorted batches...");
+ }
+
+ j = 0;
+ long pairs = 0; // Number of pairs
+ byte[] array = new byte[1024];
+ for(long line = 1; ; line++) {
+ int start = 0, len;
+ while((len = fbis.readLine(array, start, array.length - start, FastBufferedInputStream.ALL_TERMINATORS)) == array.length - start) {
+ start += len;
+ array = ByteArrays.grow(array, array.length + 1);
+ };
+
+ if (len == -1) break; // EOF
+
+ final int lineLength = start + len;
+
+ if (DEBUG) System.err.println("Reading line " + line + "... (" + new String(array, 0, lineLength, charset) + ")");
+
+ // Skip whitespace at the start of the line.
+ int offset = 0;
+ while(offset < lineLength && array[offset] >= 0 && array[offset] <= ' ') offset++;
+
+ if (offset == lineLength) {
+ if (DEBUG) System.err.println("Skipping line " + line + "...");
+ continue; // Whitespace line
+ }
+
+ if (array[0] == '#') continue;
+
+ // Scan source id.
+ start = offset;
+ while(offset < lineLength && (array[offset] < 0 || array[offset] > ' ')) offset++;
+
+ int s;
+
+ if (function == null) {
+ final long sl;
+ try {
+ sl = getLong(array, start, offset - start);
+ }
+ catch(RuntimeException e) {
+ // Discard up to the end of line
+ LOGGER.error("Error at line " + line + ": " + e.getMessage());
+ continue;
+ }
+
+ s = map.get(sl);
+ if (s == -1) map.put(sl, s = (int)map.size());
+
+ if (DEBUG) System.err.println("Parsed source at line " + line + ": " + sl + " => " + s);
+}
+ else {
+ final String ss = new String(array, start, offset - start, charset);
+ final long sl = function.getLong(ss);
+ if (sl == -1) {
+ LOGGER.warn("Unknown source identifier " + ss + " at line " + line);
+ continue;
+ }
+ if (sl < 0 || sl >= n) throw new IllegalArgumentException("Source node number out of range for node " + ss + ": " + sl);
+ s = (int)sl;
+ if (DEBUG) System.err.println("Parsed target at line " + line + ": " + ss + " => " + s);
+ }
+
+
+ // Skip whitespace between identifiers.
+ while(offset < lineLength && array[offset] >= 0 && array[offset] <= ' ') offset++;
+
+ if (offset == lineLength) {
+ LOGGER.error("Error at line " + line + ": no target");
+ continue;
+ }
+
+ // Scan target id.
+ start = offset;
+ while(offset < lineLength && (array[offset] < 0 || array[offset] > ' ')) offset++;
+
+ int t;
+
+ if (function == null) {
+ final long tl;
+ try {
+ tl = getLong(array, start, offset - start);
+ }
+ catch(RuntimeException e) {
+ // Discard up to the end of line
+ LOGGER.error("Error at line " + line + ": " + e.getMessage());
+ continue;
+ }
+
+ t = map.get(tl);
+ if (t == -1) map.put(tl, t = (int)map.size());
+
+ if (DEBUG) System.err.println("Parsed target at line " + line + ": " + tl + " => " + t);
+ }
+ else {
+ final String ts = new String(array, start, offset - start, charset);
+ final long tl = function.getLong(ts);
+ if (tl == -1) {
+ LOGGER.warn("Unknown target identifier " + ts + " at line " + line);
+ continue;
+ }
+
+ if (tl < 0 || tl >= n) throw new IllegalArgumentException("Target node number out of range for node " + ts + ": " + tl);
+ t = (int)tl;
+ if (DEBUG) System.err.println("Parsed target at line " + line + ": " + ts + " => " + t);
+ }
+
+ // Skip whitespace after target.
+ while(offset < lineLength && array[offset] >= 0 && array[offset] <= ' ') offset++;
+
+ if (offset < lineLength) LOGGER.warn("Trailing characters ignored at line " + line);
+
+ if (DEBUG) System.err.println("Parsed arc at line " + line + ": " + s + " -> " + t);
+
+ if (s != t || ! noLoops) {
+ source[j] = s;
+ target[j++] = t;
+
+ if (j == batchSize) {
+ pairs += Transform.processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+
+ if (symmetrize && s != t) {
+ source[j] = t;
+ target[j++] = s;
+ if (j == batchSize) {
+ pairs += Transform.processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+
+ if (pl != null) pl.lightUpdate();
+ }
+ }
+
+ if (j != 0) pairs += Transform.processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ Transform.logBatches(batches, pairs, pl);
+ }
+
+ numNodes = function == null ? (int)map.size() : function.size();
+ source = null;
+ target = null;
+
+ map.compact();
+
+ final File keyFile = File.createTempFile(ScatteredArcsASCIIGraph.class.getSimpleName(), "keys", tempDir);
+ keyFile.deleteOnExit();
+ final File valueFile = File.createTempFile(ScatteredArcsASCIIGraph.class.getSimpleName(), "values", tempDir);
+ valueFile.deleteOnExit();
+
+ BinIO.storeLongs(map.key, 0, map.size(), keyFile);
+ BinIO.storeInts(map.value, 0, map.size(), valueFile);
+
+ map = null;
+
+ long[][] key = BinIO.loadLongsBig(keyFile);
+ keyFile.delete();
+ int[][] value = BinIO.loadIntsBig(valueFile);
+ valueFile.delete();
+
+ if (function == null) {
+ ids = new long[numNodes];
+
+ final long[] result = new long[numNodes];
+ for(int i = numNodes; i--!= 0;) result[IntBigArrays.get(value, i)] = LongBigArrays.get(key, i);
+ ids = result;
+ }
+
+ key = null;
+ value = null;
+
+ batchGraph = new Transform.BatchGraph(function == null ? numNodes : n, pairs, batches);
+ }
+
+ private final static long getLong(final byte[] array, int offset, int length) {
+ if (length == 0) throw new NumberFormatException("Empty number");
+ int sign = 1;
+ if(array[offset] == '-') {
+ sign = -1;
+ offset++;
+ length--;
+ }
+
+ long value = 0;
+ for(int i = 0; i < length; i++) {
+ final byte digit = array[offset + i];
+ if (digit < '0' || digit > '9') throw new NumberFormatException("Not a digit: " + (char)digit);
+ value *= 10;
+ value += digit - '0';
+ }
+
+ return sign * value;
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param arcs an iterator returning the arcs as two-element arrays.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ScatteredArcsASCIIGraph(final Iterator<long[]> arcs, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ Long2IntOpenHashBigMap map = new Long2IntOpenHashBigMap();
+
+ int numNodes = -1;
+
+ int j;
+ int[] source = new int[batchSize] , target = new int[batchSize];
+ ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ if (pl != null) {
+ pl.itemsName = "arcs";
+ pl.start("Creating sorted batches...");
+ }
+
+ j = 0;
+ long pairs = 0; // Number of pairs
+ while(arcs.hasNext()) {
+ long[] arc = arcs.next();
+ final long sl = arc[0];
+ int s = map.get(sl);
+ if (s == -1) map.put(sl, s = (int)map.size());
+ final long tl = arc[1];
+ int t = map.get(tl);
+ if (t == -1) map.put(tl, t = (int)map.size());
+
+ if (s != t || ! noLoops) {
+ source[j] = s;
+ target[j++] = t;
+
+ if (j == batchSize) {
+ pairs += Transform.processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+
+ if (symmetrize && s != t) {
+ source[j] = t;
+ target[j++] = s;
+ if (j == batchSize) {
+ pairs += Transform.processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+
+ if (pl != null) pl.lightUpdate();
+ }
+ }
+
+ if (j != 0) pairs += Transform.processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ Transform.logBatches(batches, pairs, pl);
+ }
+
+ numNodes = (int)map.size();
+ source = null;
+ target = null;
+
+ map.compact();
+
+ final File keyFile = File.createTempFile(ScatteredArcsASCIIGraph.class.getSimpleName(), "keys", tempDir);
+ keyFile.deleteOnExit();
+ final File valueFile = File.createTempFile(ScatteredArcsASCIIGraph.class.getSimpleName(), "values", tempDir);
+ valueFile.deleteOnExit();
+
+ BinIO.storeLongs(map.key, 0, map.size(), keyFile);
+ BinIO.storeInts(map.value, 0, map.size(), valueFile);
+
+ map = null;
+
+ long[][] key = BinIO.loadLongsBig(keyFile);
+ keyFile.delete();
+ int[][] value = BinIO.loadIntsBig(valueFile);
+ valueFile.delete();
+
+ ids = new long[numNodes];
+
+ final long[] result = new long[numNodes];
+ for(int i = numNodes; i--!= 0;) result[IntBigArrays.get(value, i)] = LongBigArrays.get(key, i);
+ ids = result;
+
+ key = null;
+ value = null;
+
+ batchGraph = new Transform.BatchGraph(numNodes, pairs, batches);
+ }
+
+ @Override
+ public int numNodes() {
+ if (batchGraph == null) throw new UnsupportedOperationException("The number of nodes is unknown (you need to exhaust the input)");
+ return batchGraph.numNodes();
+ }
+
+ @Override
+ public long numArcs() {
+ if (batchGraph == null) throw new UnsupportedOperationException("The number of arcs is unknown (you need to exhaust the input)");
+ return batchGraph.numArcs();
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ return batchGraph.nodeIterator(from);
+ }
+
+ @SuppressWarnings("unchecked")
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, IOException, JSAPException, ClassNotFoundException {
+ String basename;
+ SimpleJSAP jsap = new SimpleJSAP(ScatteredArcsASCIIGraph.class.getName(), "Converts a scattered list of arcs from standard input into a BVGraph. The list of" +
+ "identifiers in order of appearance will be saved with extension \"" + IDS_EXTENSION + "\", unless a translation function has been specified.",
+ new Parameter[] {
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("batchSize", JSAP.INTSIZE_PARSER, Integer.toString(DEFAULT_BATCH_SIZE), JSAP.NOT_REQUIRED, 's', "batch-size", "The maximum size of a batch, in arcs."),
+ new FlaggedOption("tempDir", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'T', "temp-dir", "A directory for all temporary batch files."),
+ new Switch("symmetrize", 'S', "symmetrize", "Force the output graph to be symmetric."),
+ new Switch("noLoops", 'L', "no-loops", "Remove loops from the output graph."),
+ new FlaggedOption("function", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "function", "A serialised function from strings to longs that will be used to translate identifiers to node numbers."),
+ new FlaggedOption("charset", JSAP.STRING_PARSER, "ISO-8859-1", JSAP.NOT_REQUIRED, 'C', "charset", "The charset used to read the list of arcs."),
+ new FlaggedOption("n", JSAP.INTSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "n", "The number of nodes of the graph (only if you specified a function that does not return the size of the key set, or if you want to override that size)."),
+ new FlaggedOption("comp", JSAP.STRING_PARSER, null, JSAP.NOT_REQUIRED, 'c', "comp", "A compression flag (may be specified several times).").setAllowMultipleDeclarations(true),
+ new FlaggedOption("windowSize", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_WINDOW_SIZE), JSAP.NOT_REQUIRED, 'w', "window-size", "Reference window size (0 to disable)."),
+ new FlaggedOption("maxRefCount", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_MAX_REF_COUNT), JSAP.NOT_REQUIRED, 'm', "max-ref-count", "Maximum number of backward references (-1 for ∞)."),
+ new FlaggedOption("minIntervalLength", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_MIN_INTERVAL_LENGTH), JSAP.NOT_REQUIRED, 'i', "min-interval-length", "Minimum length of an interval (0 to disable)."),
+ new FlaggedOption("zetaK", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_ZETA_K), JSAP.NOT_REQUIRED, 'k', "zeta-k", "The k parameter for zeta-k codes."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the output graph"),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ basename = jsapResult.getString("basename");
+
+ int flags = 0;
+ for(String compressionFlag: jsapResult.getStringArray("comp")) {
+ try {
+ flags |= BVGraph.class.getField(compressionFlag).getInt(BVGraph.class);
+ }
+ catch (Exception notFound) {
+ throw new JSAPException("Compression method " + compressionFlag + " unknown.");
+ }
+ }
+
+ final int windowSize = jsapResult.getInt("windowSize");
+ final int zetaK = jsapResult.getInt("zetaK");
+ int maxRefCount = jsapResult.getInt("maxRefCount");
+ if (maxRefCount == -1) maxRefCount = Integer.MAX_VALUE;
+ final int minIntervalLength = jsapResult.getInt("minIntervalLength");
+
+ Object2LongFunction<String> function = null;
+ Charset charset = null;
+ int n = -1;
+ if (jsapResult.userSpecified("function")) {
+ function = (Object2LongFunction<String>)BinIO.loadObject(jsapResult.getString("function"));
+ charset = Charset.forName(jsapResult.getString("charset"));
+ if (function.size() == -1) {
+ if (! jsapResult.userSpecified("n")) throw new IllegalArgumentException("You must specify a graph size if you specify a translation function that does not return the size of the key set.");
+ n = jsapResult.getInt("n");
+ }
+ else n = function.size();
+ }
+
+ File tempDir = null;
+ if (jsapResult.userSpecified("tempDir")) tempDir = new File(jsapResult.getString("tempDir"));
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ ScatteredArcsASCIIGraph graph = new ScatteredArcsASCIIGraph(System.in, function, charset, n, jsapResult.userSpecified("symmetrize"), jsapResult.userSpecified("noLoops"), jsapResult.getInt("batchSize"), tempDir, pl);
+ BVGraph.store(graph, basename, windowSize, maxRefCount, minIntervalLength, zetaK, flags, pl);
+ if (function == null) BinIO.storeLongs(graph.ids, basename + IDS_EXTENSION);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ShiftedByOneArcListASCIIGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ShiftedByOneArcListASCIIGraph.java
new file mode 100644
index 0000000..65e0c55
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/ShiftedByOneArcListASCIIGraph.java
@@ -0,0 +1,99 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+/** An {@link ArcListASCIIGraph} with fixed shift -1. Very useful to read
+ * graphs specified as pairs of arcs with node numbering starting from one.
+ *
+ * <h2>Using {@link ArcListASCIIGraph} with MatLab-like sparse matrix files</h2>
+ *
+ * <p>The main intended usage of this class is that of interfacing easily with MatLab-like
+ * sparse matrix files. Note that for this to happen it is necessary to shift by one all
+ * indices. Assume you have a file named <code>example.arcs</code>:
+ * <pre>
+ * 1 2
+ * 2 3
+ * 3 2
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -1 -g ShiftedByOneArcListASCIIGraph dummy bvexample &lt;example.arcs
+ * </pre>
+ * will generate a {@link BVGraph} as expected (e.g, there is an arc from 0 to 1).
+ */
+
+public final class ShiftedByOneArcListASCIIGraph extends ArcListASCIIGraph {
+
+ protected ShiftedByOneArcListASCIIGraph(InputStream is, int shift) throws NumberFormatException, IOException {
+ super(is, shift);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ArcListASCIIGraph loadOnce(final InputStream is) throws IOException {
+ return new ArcListASCIIGraph(is, -1);
+ }
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(basename, null);
+ }
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger unused) throws IOException {
+ return new ArrayListMutableGraph(loadOnce(new FastBufferedInputStream(new FileInputStream(basename.toString())))).immutableView();
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename, ProgressLogger unused) throws IOException {
+ store(graph, basename, 1);
+ }
+
+ public static void main(final String arg[]) throws NoSuchMethodException {
+ throw new NoSuchMethodException("Please use the main method of " + ArcListASCIIGraph.class.getSimpleName() + ".");
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Stats.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Stats.java
new file mode 100644
index 0000000..686c717
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Stats.java
@@ -0,0 +1,297 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.algo.StronglyConnectedComponents;
+
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.lang.reflect.InvocationTargetException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.math.RoundingMode;
+import java.util.Arrays;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Computes basic statistical data about a given graph.
+ *
+ * <p>This class loads a graph of given basename, and computes the following data:
+ * <ol>
+ * <li>an ASCII file containing the <em>outdegree distribution</em>; line <var>n</var> contains the number of nodes with outdegree <var>n</var> (starting from 0);
+ * <li>an ASCII file containing the <em>indegree distribution</em>; line <var>n</var> contains the number of nodes with indegree <var>n</var> (starting from 0);
+ * <li>a property file containing several self-descriptive data, such as the average indegree/outdegree (which should be identical), sample nodes with minimum
+ * or maximum indegree/outdegree, and so on; additional data will be computed if files produced by {@link StronglyConnectedComponents} are present
+ * with the same basename (in particular, buckets and component sizes);
+ * <li>if files produced by {@link StronglyConnectedComponents} are present with the same basename, an ASCII file containing the <em>distribution
+ * of strongly connected components</em>, specified as a sequence of lines each containing a pair of integer &lt;<var>size</var>, <var>count</var>&gt;.
+ * </ol>
+ *
+ * <p>The graph is loaded {@linkplain ImmutableGraph#loadOffline(CharSequence) offline}: the only memory allocated is for indegree count (one integer
+ * per node) and for storing the actual counts (one integer per indegree/outdegree value).
+ */
+
+public class Stats {
+
+ private Stats() {}
+
+ /** Computes stats for the given graph using a single traversal, storing the results in files with given basename.
+ *
+ * @param graph the graph to be examined.
+ * @param buckets the set of buckets of this graph, or <code>null</code> if this information is not available.
+ * @param sccsize the sizes of strongly connected components, or <code>null</code> if this information is not available.
+ * @param resultsBasename the basename for result files (see the {@linkplain Stats class description}).
+ * @param pl a progress logger.
+ */
+
+ public static void run(final ImmutableGraph graph, final LongArrayBitVector buckets, final int[] sccsize, final CharSequence resultsBasename, final ProgressLogger pl) throws IOException {
+ run(graph, buckets, sccsize, resultsBasename, false, pl);
+ }
+
+ /** Computes stats for the given graph using a single traversal, storing the results in files with given basename.
+ *
+ * @param graph the graph to be examined.
+ * @param buckets the set of buckets of this graph, or <code>null</code> if this information is not available.
+ * @param sccsize the sizes of strongly connected components, or <code>null</code> if this information is not available.
+ * @param resultsBasename the basename for result files (see the {@linkplain Stats class description}).
+ * @param saveDegrees if true, indegrees and outdegrees will be saved.
+ * @param pl a progress logger.
+ */
+
+ public static void run(final ImmutableGraph graph, final LongArrayBitVector buckets, final int[] sccsize, final CharSequence resultsBasename, final boolean saveDegrees, final ProgressLogger pl) throws IOException {
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ int[] count = IntArrays.EMPTY_ARRAY, indegree = new int[graph.numNodes()], successor;
+ int curr, d, maxd = 0, maxNode = 0, mind = Integer.MAX_VALUE, minNode = 0;
+ long dangling = 0, terminal = 0, loops = 0, numArcs = 0, numGaps = 0;
+ BigInteger totLoc = BigInteger.ZERO, totGap = BigInteger.ZERO;
+
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Scanning...");
+ }
+
+ final PrintWriter outdegreesPrintWriter = saveDegrees ? new PrintWriter(new BufferedWriter(new FileWriter(resultsBasename + ".outdegrees"))) : null;
+
+ /** Statistics for the gap width of successor lists (exponentially binned). */
+ final long[] successorDeltaStats = new long[32];
+
+ for(int i = graph.numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ if (saveDegrees) outdegreesPrintWriter.println(d);
+ successor = nodeIterator.successorArray();
+
+ if (d > 1) {
+ totGap = totGap.add(BigInteger.valueOf(successor[d - 1] - successor[0]));
+ totGap = totGap.add(BigInteger.valueOf(Fast.int2nat(successor[0] - curr)));
+ numGaps += d ;
+ }
+ for(int s = d; s-- != 0;) {
+ totLoc = totLoc.add(BigInteger.valueOf(Math.abs(successor[s] - curr)));
+
+ if (successor[s] != curr) successorDeltaStats[Fast.mostSignificantBit(Math.abs(curr - successor[s]))]++;
+ else loops++;
+
+ indegree[successor[s]]++;
+ }
+
+ if (d == 0) {
+ dangling++;
+ terminal++;
+ }
+
+ if (d == 1 && successor[0] == curr) terminal++;
+
+ if (d < mind) {
+ mind = d;
+ minNode = curr;
+ }
+
+ if (d > maxd){
+ maxd = d;
+ maxNode = curr;
+ }
+
+ numArcs += d;
+
+ if (d >= count.length) count = IntArrays.grow(count, d + 1);
+ count[d]++;
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+
+ if (saveDegrees) {
+ outdegreesPrintWriter.close();
+ TextIO.storeInts(indegree, resultsBasename + ".indegrees");
+ }
+
+ PrintWriter properties = new PrintWriter(new FileWriter(resultsBasename + ".stats"));
+ properties.println("nodes=" + graph.numNodes());
+ properties.println("arcs=" + numArcs);
+ properties.println("loops=" + loops);
+ properties.println("successoravggap=" + new BigDecimal(totGap).divide(BigDecimal.valueOf(Math.max(1, numGaps)), 3, RoundingMode.HALF_EVEN));
+ properties.println("avglocality=" + new BigDecimal(totLoc).divide(BigDecimal.valueOf(Math.max(1, numArcs)), 3, RoundingMode.HALF_EVEN));
+ properties.println("minoutdegree=" + mind);
+ properties.println("maxoutdegree=" + maxd);
+ properties.println("minoutdegreenode=" + minNode);
+ properties.println("maxoutdegreenode=" + maxNode);
+ properties.println("dangling=" + dangling);
+ properties.println("terminal=" + terminal);
+ properties.println("percdangling=" + 100.0 * dangling / graph.numNodes());
+ properties.println("avgoutdegree=" + (double)numArcs/graph.numNodes());
+
+ int l;
+ for(l = successorDeltaStats.length; l-- != 0;) if (successorDeltaStats[l] != 0) break;
+ StringBuilder s = new StringBuilder();
+ double totLogDelta = 0;
+ long numDelta = 0;
+
+ long g = 1;
+ for(int i = 0; i <= l; i++) {
+ if (i != 0) s.append(',');
+ s.append(successorDeltaStats[i]);
+ numDelta += successorDeltaStats[i];
+ totLogDelta += (Fast.log2(g * 2 + g + 1) - 1) * successorDeltaStats[i];
+ g *= 2;
+ }
+
+ properties.println("successorlogdeltastats=" + s.toString());
+ properties.println("successoravglogdelta=" + (numDelta == 0 ? "0" : new BigDecimal(totLogDelta).divide(BigDecimal.valueOf(Math.max(1, numDelta * 2)), 3, RoundingMode.HALF_EVEN).toString()));
+
+ TextIO.storeInts(count, 0, maxd + 1, resultsBasename + ".outdegree");
+
+ Arrays.fill(count, 0);
+
+ maxd = maxNode = minNode = 0;
+ mind = Integer.MAX_VALUE;
+ for(int i = indegree.length; i-- != 0;) {
+ d = indegree[i];
+ if (d >= count.length) count = IntArrays.grow(count, d + 1);
+ if (d < mind) {
+ mind = d;
+ minNode = i;
+ }
+
+ if (d > maxd){
+ maxd = d;
+ maxNode = i;
+ }
+
+ count[d]++;
+ }
+
+ TextIO.storeInts(count, 0, maxd + 1, resultsBasename + ".indegree");
+
+ properties.println("minindegree=" + mind);
+ properties.println("maxindegree=" + maxd);
+ properties.println("minindegreenode=" + minNode);
+ properties.println("maxindegreenode=" + maxNode);
+ properties.println("avgindegree=" + (double)numArcs/graph.numNodes());
+
+ if (buckets != null) {
+ final long numBuckets = buckets.count();
+ properties.println("buckets=" + numBuckets);
+ properties.println("percbuckets=" + 100.0 * numBuckets / graph.numNodes());
+ }
+
+ if (sccsize != null) {
+ IntArrays.parallelQuickSort(sccsize);
+ final int m = sccsize.length;
+ int maxSize = sccsize[m - 1];
+ int minSize = sccsize[0];
+
+ properties.println("sccs=" + m);
+ properties.println("maxsccsize=" + maxSize);
+ properties.println("percmaxscc=" + 100.0 * maxSize / graph.numNodes());
+ properties.println("minsccsize=" + minSize);
+ properties.println("percminscc=" + 100.0 * minSize / graph.numNodes());
+
+ PrintWriter pw = new PrintWriter(resultsBasename + ".sccdistr");
+ int current = maxSize;
+ int c = 0;
+ for(int i = sccsize.length; i-- != 0;) {
+ if(sccsize[i] != current) {
+ pw.println(current + "\t" + c);
+ current = sccsize[i];
+ c = 0;
+ }
+ c++;
+ }
+ pw.println(current + "\t" + c);
+
+ pw.flush();
+ pw.close();
+ }
+
+ properties.close();
+ }
+
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException, IOException, ClassNotFoundException {
+ SimpleJSAP jsap = new SimpleJSAP(Stats.class.getName(), "Computes statistical data of a given graph.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("saveDegrees", 's', "save-degrees", "Save indegrees and outdegrees in text format."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final String basename = jsapResult.getString("basename");
+ final String resultsBasename = jsapResult.userSpecified("resultsBasename") ? jsapResult.getString("resultsBasename") : basename;
+
+ final ProgressLogger pl = new ProgressLogger();
+ pl.logInterval = jsapResult.getLong("logInterval");
+
+ final ImmutableGraph graph;
+
+ if (graphClass != null) graph = (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class).invoke(null, basename);
+ else graph = ImmutableGraph.loadOffline(basename, pl);
+
+ final LongArrayBitVector buckets = (LongArrayBitVector)(new File(basename + ".buckets").exists() ? BinIO.loadObject(basename + ".buckets") : null);
+ final int[] sccsize = new File(basename + ".sccsizes").exists() ? BinIO.loadInts(basename + ".sccsizes") : null;
+
+ run(graph, buckets, sccsize, resultsBasename, jsapResult.getBoolean("saveDegrees"), pl);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Transform.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Transform.java
new file mode 100644
index 0000000..41f11a8
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/Transform.java
@@ -0,0 +1,2782 @@
+package it.unimi.dsi.webgraph;
+
+import java.io.BufferedOutputStream;
+import java.io.DataInput;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.NoSuchElementException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.ints.Int2ObjectOpenHashMap;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntComparator;
+import it.unimi.dsi.fastutil.ints.IntHeapSemiIndirectPriorityQueue;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.fastutil.objects.ObjectArrays;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableSequentialGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.webgraph.labelling.BitStreamArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.Label;
+import it.unimi.dsi.webgraph.labelling.LabelMergeStrategy;
+import it.unimi.dsi.webgraph.labelling.LabelSemiring;
+import it.unimi.dsi.webgraph.labelling.Labels;
+import it.unimi.dsi.webgraph.labelling.UnionArcLabelledImmutableGraph;
+
+/** Static methods that manipulate immutable graphs.
+ *
+ * <P>Most methods take an {@link
+ * it.unimi.dsi.webgraph.ImmutableGraph} (along with some other data, that
+ * depend on the kind of transformation), and return another {@link
+ * it.unimi.dsi.webgraph.ImmutableGraph} that represents the transformed
+ * version.
+ */
+
+public class Transform {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(Transform.class);
+
+ private static final boolean DEBUG = false;
+ private static final boolean ASSERTS = false;
+
+ private Transform() {}
+
+ /** Provides a method to accept or reject an arc.
+ *
+ * <P>Note that arc filters are usually stateless. Thus, their declaration
+ * should comprise a static singleton (e.g., {@link Transform#NO_LOOPS}).
+ */
+ public interface ArcFilter {
+
+ /**
+ * Tells if the arc <code>(i,j)</code> has to be accepted or not.
+ *
+ * @param i the source of the arc.
+ * @param j the destination of the arc.
+ * @return if the arc has to be accepted.
+ */
+ public boolean accept(int i, int j);
+ }
+
+ /** Provides a method to accept or reject a labelled arc.
+ *
+ * <P>Note that arc filters are usually stateless. Thus, their declaration
+ * should comprise a static singleton (e.g., {@link Transform#NO_LOOPS}).
+ */
+ public interface LabelledArcFilter {
+
+ /**
+ * Tells if the arc <code>(i,j)</code> with label <code>label</code> has to be accepted or not.
+ *
+ * @param i the source of the arc.
+ * @param j the destination of the arc.
+ * @param label the label of the arc.
+ * @return if the arc has to be accepted.
+ */
+ public boolean accept(int i, int j, Label label);
+ }
+
+ /** An arc filter that rejects loops. */
+ final static private class NoLoops implements ArcFilter, LabelledArcFilter {
+ private NoLoops() {}
+ /** Returns true if the two arguments differ.
+ *
+ * @return <code>i != j</code>.
+ */
+ @Override
+ public boolean accept(final int i, final int j) {
+ return i != j;
+ }
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return i != j;
+ }
+ }
+
+ /** An arc filter that only accepts arcs whose endpoints belong to the same
+ * (if the parameter <code>keepOnlySame</code> is true) or to
+ * different (if <code>keepOnlySame</code> is false) classes.
+ * Classes are specified by one integer per node, read from a given file in {@link DataInput} format. */
+ public final static class NodeClassFilter implements ArcFilter, LabelledArcFilter {
+ private final boolean keepOnlySame;
+ private final int[] nodeClass;
+
+ /** Creates a new instance.
+ *
+ * @param classFile name of the class file.
+ * @param keepOnlySame whether to keep nodes in the same class.
+ */
+ public NodeClassFilter(final String classFile, final boolean keepOnlySame) {
+ try {
+ nodeClass = BinIO.loadInts(classFile);
+ }
+ catch (final IOException e) {
+ throw new RuntimeException(e);
+ }
+ this.keepOnlySame = keepOnlySame;
+ }
+
+ /** Creates a new instance.
+ *
+ * <p>This constructor has the same arguments as {@link it.unimi.dsi.webgraph.Transform.NodeClassFilter#NodeClassFilter(String, boolean)},
+ * but it can be used with an {@link ObjectParser}.
+ *
+ * @param classFile name of the class file.
+ * @param keepOnlySame whether to keep nodes in the same class.
+ */
+ public NodeClassFilter(String classFile, String keepOnlySame) {
+ this(classFile, Boolean.parseBoolean(keepOnlySame));
+ }
+
+ @Override
+ public boolean accept(final int i, final int j) {
+ return keepOnlySame == (nodeClass[i] == nodeClass[j]);
+ }
+
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return keepOnlySame == (nodeClass[i] == nodeClass[j]);
+ }
+ }
+
+ /** An arc filter that rejects arcs whose well-known attribute has a value smaller than a given threshold. */
+ final static public class LowerBound implements LabelledArcFilter {
+ private final int lowerBound;
+
+ public LowerBound(final int lowerBound) {
+ this.lowerBound = lowerBound;
+ }
+
+ public LowerBound(String lowerBound) {
+ this(Integer.parseInt(lowerBound));
+ }
+ /** Returns true if the integer value associated to the well-known attribute of the label is larger than the threshold.
+ *
+ * @return true if <code>label.{@link Label#getInt()}</code> is larger than the threshold.
+ */
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return label.getInt() >= lowerBound;
+ }
+ }
+
+
+ /** A singleton providing an arc filter that rejects loops. */
+ final static public NoLoops NO_LOOPS = new NoLoops();
+
+ /** A class that exposes an immutable graph viewed through a filter. */
+ private static final class FilteredImmutableGraph extends ImmutableGraph {
+ private final class FilteredImmutableGraphNodeIterator extends NodeIterator {
+ private final NodeIterator nodeIterator;
+ private int nextNode;
+ private int outdegree;
+ private @SuppressWarnings("hiding") int[] succ;
+
+ public FilteredImmutableGraphNodeIterator(final NodeIterator nodeIterator) {
+ this(nodeIterator, 0, -1, IntArrays.EMPTY_ARRAY);
+ }
+
+ public FilteredImmutableGraphNodeIterator(final NodeIterator nodeIterator, final int nextNode, final int outdegree, final int[] succ) {
+ this.nodeIterator = nodeIterator;
+ this.nextNode = nextNode;
+ this.outdegree = outdegree;
+ this.succ = succ;
+ }
+
+ @Override
+ public int outdegree() {
+ if (outdegree == -1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ public int nextInt() {
+ final int currNode = nodeIterator.nextInt();
+ assert nextNode == currNode;
+ nextNode++;
+ final int oldOutdegree = nodeIterator.outdegree();
+ final int[] oldSucc = nodeIterator.successorArray();
+ succ = IntArrays.ensureCapacity(succ, oldOutdegree, 0);
+ outdegree = 0;
+ for(int i = 0; i < oldOutdegree; i++) if (filter.accept(currNode, oldSucc[i])) succ[outdegree++] = oldSucc[i];
+ return currNode;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (outdegree == -1) throw new IllegalStateException();
+ return succ;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ return new FilteredImmutableGraphNodeIterator(nodeIterator.copy(upperBound), nextNode, outdegree, Arrays.copyOf(succ, Math.max(0, outdegree)));
+ }
+ }
+
+ private final ArcFilter filter;
+ private final ImmutableGraph graph;
+ private int succ[];
+ private int cachedNode = -1;
+
+ private FilteredImmutableGraph(ArcFilter filter, ImmutableGraph graph) {
+ this.filter = filter;
+ this.graph = graph;
+ }
+
+ @Override
+ public int numNodes() {
+ return graph.numNodes();
+ }
+
+ @Override
+ public FilteredImmutableGraph copy() {
+ return new FilteredImmutableGraph(filter, graph.copy());
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return graph.randomAccess();
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return graph.hasCopiableIterators();
+ }
+
+ @Override
+ public LazyIntIterator successors(final int x) {
+ return new AbstractLazyIntIterator() {
+
+ private final LazyIntIterator s = graph.successors(x);
+
+ @Override
+ public int nextInt() {
+ int t;
+ while ((t = s.nextInt()) != -1) if (filter.accept(x, t)) return t;
+ return -1;
+ }
+ };
+ }
+
+ private void fillCache(final int x) {
+ if (x == cachedNode) return;
+ succ = LazyIntIterators.unwrap(successors(x));
+ cachedNode = x;
+ }
+
+ @Override
+ public int[] successorArray(int x) {
+ fillCache(x);
+ return succ;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ fillCache(x);
+ return succ.length;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ return new FilteredImmutableGraphNodeIterator(graph.nodeIterator());
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ return new FilteredImmutableGraphNodeIterator(graph.nodeIterator(from), from, -1, IntArrays.EMPTY_ARRAY);
+ }
+
+ }
+
+ /** A class that exposes an arc-labelled immutable graph viewed through a filter. */
+ private static final class FilteredArcLabelledImmutableGraph extends ArcLabelledImmutableGraph {
+ private final LabelledArcFilter filter;
+ private final ArcLabelledImmutableGraph graph;
+ private int succ[];
+ private Label label[];
+ private int cachedNode = -1;
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return graph.hasCopiableIterators();
+ }
+
+ private final class FilterdArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ private final ArcLabelledNodeIterator nodeIterator;
+ private final int upperBound;
+ private int currNode;
+ private int outdegree;
+
+ public FilterdArcLabelledNodeIterator(final int upperBound) {
+ this(upperBound, graph.nodeIterator(), -1, -1);
+ }
+
+ public FilterdArcLabelledNodeIterator(final int upperBound, final ArcLabelledNodeIterator nodeIterator, final int currNode, final int outdegree) {
+ this.upperBound = upperBound;
+ this.nodeIterator = nodeIterator;
+ this.currNode = currNode;
+ this.outdegree = outdegree;
+ }
+
+ @Override
+ public int outdegree() {
+ if (currNode == -1) throw new IllegalStateException();
+ if (outdegree == -1) {
+ int d = 0;
+ final LabelledArcIterator successors = successors();
+ while(successors.nextInt() != -1) d++;
+ outdegree = d;
+ }
+ return outdegree;
+ }
+
+ @Override
+ public int nextInt() {
+ outdegree = -1;
+ currNode = nodeIterator.nextInt();
+ return currNode;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return currNode + 1 < upperBound && nodeIterator.hasNext();
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ return new FilteredLabelledArcIterator(currNode, nodeIterator.successors());
+ }
+
+ @Override
+ public ArcLabelledNodeIterator copy(final int upperBound) {
+ return new FilterdArcLabelledNodeIterator(upperBound, nodeIterator.copy(upperBound), currNode, outdegree);
+ }
+ }
+
+ private final class FilteredLabelledArcIterator extends AbstractLazyIntIterator implements LabelledArcIterator {
+ private final int x;
+
+ private final LabelledArcIterator successors;
+
+ private FilteredLabelledArcIterator(final int x, final LabelledArcIterator successors) {
+ this.x = x;
+ this.successors = successors;
+ }
+
+ @Override
+ public int nextInt() {
+ int t;
+ while ((t = successors.nextInt()) != -1) if (filter.accept(x, t, successors.label())) return t;
+ return -1;
+ }
+
+ @Override
+ public Label label() {
+ return successors.label();
+ }
+ }
+
+ private FilteredArcLabelledImmutableGraph(LabelledArcFilter filter, ArcLabelledImmutableGraph graph) {
+ this.filter = filter;
+ this.graph = graph;
+ }
+
+ @Override
+ public int numNodes() {
+ return graph.numNodes();
+ }
+
+ @Override
+ public ArcLabelledImmutableGraph copy() {
+ return new FilteredArcLabelledImmutableGraph(filter, graph.copy());
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return graph.randomAccess();
+ }
+
+ @Override
+ public Label prototype() {
+ return graph.prototype();
+ }
+
+ private void fillCache(final int x) {
+ if (x == cachedNode) return;
+ cachedNode = x;
+ succ = LazyIntIterators.unwrap(successors(x));
+ label = super.labelArray(x);
+ }
+
+ @Override
+ public LabelledArcIterator successors(final int x) {
+ return new FilteredLabelledArcIterator(x, graph.successors(x));
+ }
+
+ @Override
+ public int[] successorArray(final int x) {
+ fillCache(x);
+ return succ;
+ }
+
+ @Override
+ public Label[] labelArray(final int x) {
+ fillCache(x);
+ return label;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ fillCache(x);
+ return succ.length;
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return new FilterdArcLabelledNodeIterator(Integer.MAX_VALUE);
+ }
+
+ }
+
+ /** Returns a graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @return the filtered graph.
+ */
+ public static ImmutableGraph filterArcs(final ImmutableGraph graph, final ArcFilter filter) {
+ return new FilteredImmutableGraph(filter, graph);
+ }
+
+ /** Returns a graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @param ignored a progress logger.
+ * @return the filtered graph.
+ */
+ public static ImmutableGraph filterArcs(final ImmutableGraph graph, final ArcFilter filter, final ProgressLogger ignored) {
+ return filterArcs(graph, filter);
+ }
+
+ /** Returns a labelled graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a labelled graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @return the filtered graph.
+ */
+ public static ArcLabelledImmutableGraph filterArcs(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter) {
+ return new FilteredArcLabelledImmutableGraph(filter, graph);
+ }
+
+ /** Returns a labelled graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a labelled graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @param ignored a progress logger.
+ * @return the filtered graph.
+ */
+ public static ArcLabelledImmutableGraph filterArcs(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final ProgressLogger ignored) {
+ return filterArcs(graph, filter);
+ }
+
+ private static final class RemappedImmutableGraph extends ImmutableGraph {
+ private final int[] map;
+ private final ImmutableGraph g;
+ private final boolean isInjective;
+ private final boolean isPermutation;
+ private final int remappedNodes;
+ private final int destNumNodes;
+ private final int[] pseudoInverse;
+ private int[] succ;
+ private int outdegree;
+ private int currentNode = -1;
+
+ private RemappedImmutableGraph(int[] map, ImmutableGraph g, boolean isInjective, boolean isPermutation, int remappedNodes, int destNumNodes, int[] pseudoInverse) {
+ this.map = map;
+ this.g = g;
+ this.isInjective = isInjective;
+ this.isPermutation = isPermutation;
+ this.remappedNodes = remappedNodes;
+ this.destNumNodes = destNumNodes;
+ this.pseudoInverse = pseudoInverse;
+ }
+
+ @Override
+ public RemappedImmutableGraph copy() {
+ return new RemappedImmutableGraph(map, g.copy(), isInjective, isPermutation, remappedNodes, destNumNodes, pseudoInverse);
+ }
+
+ @Override
+ public int numNodes() {
+ return destNumNodes;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return true;
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return true;
+ }
+
+ @Override
+ public int[] successorArray(int x) {
+ if (currentNode != x) {
+ final IntSet succSet = new IntOpenHashSet();
+ succSet.clear();
+
+ if (isPermutation) {
+ final LazyIntIterator i = g.successors(pseudoInverse[x]);
+ for(int d = g.outdegree(pseudoInverse[x]); d-- != 0;) succSet.add(map[i.nextInt()]);
+ }
+ else {
+ int low = 0, high = remappedNodes - 1, mid = 0;
+ while (low <= high) {
+ mid = (low + high) >>> 1;
+ final int midVal = map[pseudoInverse[mid]];
+ if (midVal < x)low = mid + 1;
+ else if (midVal > x) high = mid - 1;
+ else break;
+ }
+ int t, p;
+ if (isInjective) {
+ if (map[p = pseudoInverse[mid]] == x) {
+ final LazyIntIterator i = g.successors(p);
+ for(int d = g.outdegree(p); d-- != 0;) if ((t = map[i.nextInt()]) != -1) succSet.add(t);
+ }
+ }
+ else {
+ while (mid > 0 && map[pseudoInverse[mid - 1]] == x) mid--;
+ while (mid < remappedNodes && map[p = pseudoInverse[mid]] == x) {
+ final LazyIntIterator i = g.successors(p);
+ for(int d = g.outdegree(p); d-- != 0;) if ((t = map[i.nextInt()]) != -1) succSet.add(t);
+ mid++;
+ }
+ }
+ }
+ outdegree = succSet.size();
+ currentNode = x;
+ succ = succSet.toIntArray();
+ if (outdegree > 0) IntArrays.quickSort(succ, 0, outdegree);
+ }
+ return succ;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ if (currentNode != x) successorArray(x);
+ return outdegree;
+ }
+ }
+
+ /** Remaps the the graph nodes through a partial function specified via
+ * an array. More specifically, <code>map.length=g.numNodes()</code>,
+ * and <code>map[i]</code> is the new name of node <code>i</code>, or -1 if the node
+ * should not be mapped. If some
+ * index appearing in <code>map</code> is larger than or equal to the
+ * number of nodes of <code>g</code>, the resulting graph is enlarged correspondingly.
+ *
+ * <P>Arcs are mapped in the obvious way; in other words, there is
+ * an arc from <code>map[i]</code> to <code>map[j]</code> (both nonnegative)
+ * in the transformed
+ * graph iff there was an arc from <code>i</code> to <code>j</code>
+ * in the original graph.
+ *
+ * <P>Note that if <code>map</code> is bijective, the returned graph
+ * is simply a permutation of the original graph.
+ * Otherwise, the returned graph is obtained by deleting nodes mapped
+ * to -1, quotienting nodes w.r.t. the equivalence relation induced by the fibres of <code>map</code>
+ * and renumbering the result, always according to <code>map</code>.
+ *
+ * <P>This method <strong>requires</strong> {@linkplain ImmutableGraph#randomAccess()} random access.
+ *
+ * @param g the graph to be transformed.
+ * @param map the transformation map.
+ * @param pl a progress logger to be used during the precomputation, or <code>null</code>.
+ * @return the transformed graph (provides {@linkplain ImmutableGraph#randomAccess() random access}.
+ */
+ public static ImmutableGraph map(final ImmutableGraph g, final int map[], final ProgressLogger pl) {
+ int i, j;
+ if (! g.randomAccess()) throw new IllegalArgumentException("Graph mapping requires random access");
+
+ final int sourceNumNodes = g.numNodes();
+ if (map.length != sourceNumNodes) throw new IllegalArgumentException("The graph to be mapped has " + sourceNumNodes + " whereas the map contains " + map.length + " entries");
+
+ int max = -1;
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.start("Storing identity...");
+ }
+
+ // Compute the number of actually remapped nodes (those with f[] != -1)
+ for (i = j = 0; i < sourceNumNodes; i++) if (map[i] >= 0) j++;
+ final int remappedNodes = j;
+ final boolean everywhereDefined = remappedNodes == sourceNumNodes;
+
+ /* The pseudoinverse array: for each node of the transformed graph that is image of a node
+ * of the source graph, it contains the index of that node. */
+ final int pseudoInverse[] = new int[remappedNodes];
+
+ for (i = j = 0; i < sourceNumNodes; i++) {
+ if (max < map[i]) max = map[i];
+ //if (f[i] < 0) throw new IllegalArgumentException("The supplied map contains a negative value (" + f[i] +") at index " + i);
+ if (map[i] >= 0) pseudoInverse[j++] = i;
+ }
+
+ final int destNumNodes = max + 1;
+ final boolean notEnlarged = destNumNodes <= sourceNumNodes;
+
+ if (pl != null) {
+ pl.count = remappedNodes;
+ pl.done();
+ }
+
+ // sort sf[]
+ if (pl != null) pl.start("Sorting to obtain pseudoinverse...");
+ IntArrays.radixSortIndirect(pseudoInverse, map, 0, remappedNodes, false);
+ if (pl != null) {
+ pl.count = sourceNumNodes;
+ pl.done();
+ }
+
+ // check if f is injective
+ if (pl != null) pl.start("Checking whether it is injective...");
+ int k = remappedNodes - 1;
+ // Note that we need the first check for the empty graph.
+ if (k >= 0) while(k-- != 0) if (map[pseudoInverse[k]] == map[pseudoInverse[k + 1]]) break;
+ final boolean isInjective = k == -1;
+ if (pl != null) {
+ pl.count = sourceNumNodes;
+ pl.stop("(It is" + (isInjective ? "" : " not") + " injective.)");
+ pl.done();
+ }
+
+ final boolean isPermutation = isInjective && everywhereDefined && notEnlarged;
+
+ return new RemappedImmutableGraph(map, g, isInjective, isPermutation, remappedNodes, destNumNodes, pseudoInverse);
+ }
+
+ /** Remaps the the graph nodes through a function specified via
+ * an array.
+ *
+ * @param g the graph to be transformed.
+ * @param f the transformation map.
+ * @return the transformed graph.
+ * @see #map(ImmutableGraph, int[], ProgressLogger)
+ */
+ public static ImmutableGraph map(final ImmutableGraph g, final int f[]) {
+ return map(g, f, null);
+ }
+
+ /** Returns a symmetrized graph using an offline transposition.
+ *
+ * @param g the source graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @return the symmetrized graph.
+ * @see #symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ImmutableGraph symmetrizeOffline(final ImmutableGraph g, final int batchSize) throws IOException {
+ return symmetrizeOffline(g, batchSize, null, null);
+ }
+
+ /** Returns a symmetrized graph using an offline transposition.
+ *
+ * @param g the source graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return the symmetrized graph.
+ * @see #symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ImmutableGraph symmetrizeOffline(final ImmutableGraph g, final int batchSize, final File tempDir) throws IOException {
+ return symmetrizeOffline(g, batchSize, tempDir, null);
+ }
+
+ /** Returns a symmetrized graph using an offline transposition.
+ *
+ * <P>The symmetrized graph is the union of a graph and of its transpose. This method will
+ * compute the transpose on the fly using {@link #transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)}.
+ *
+ * @param g the source graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return the symmetrized graph.
+ */
+ public static ImmutableGraph symmetrizeOffline(final ImmutableGraph g, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ return union(g, transposeOffline(g, batchSize, tempDir, pl));
+ }
+
+
+ /** Returns a symmetrized graph.
+ *
+ * <P>The symmetrized graph is the union of a graph and of its transpose. This method will
+ * use the provided transposed graph, if any, instead of computing it on the fly.
+ *
+ * @param g the source graph.
+ * @param t the graph <code>g</code> transposed; if <code>null</code>, the transposed graph will be computed on the fly.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return the symmetrized graph.
+ */
+ public static ImmutableGraph symmetrize(final ImmutableGraph g, final ImmutableGraph t, final ProgressLogger pl) {
+ return t == null ?
+ union(g, transpose(g, pl)) :
+ union(g, t);
+ }
+
+ /** Returns a symmetrized graph.
+ *
+ * <P>The symmetrized graph is the union of a graph and of its transpose. This method will
+ * use the provided transposed graph, if any, instead of computing it on the fly.
+ *
+ * @param g the source graph.
+ * @param t the graph <code>g</code> transposed; if <code>null</code>, the transposed graph will be computed on the fly.
+ * @return the symmetrized graph.
+ */
+ public static ImmutableGraph symmetrize(final ImmutableGraph g, final ImmutableGraph t) {
+ return symmetrize(g, t, null);
+ }
+
+ /** Returns a symmetrized graph.
+ *
+ * @param g the source graph.
+ * @param pl a progress logger.
+ * @return the symmetryzed graph.
+ * @see #symmetrize(ImmutableGraph, ImmutableGraph, ProgressLogger)
+ */
+ public static ImmutableGraph symmetrize(final ImmutableGraph g, final ProgressLogger pl) {
+ return symmetrize(g, null, pl);
+ }
+
+ /** Returns a symmetrized graph.
+ *
+ * @param g the source graph.
+ * @return the symmetryzed graph.
+ * @see #symmetrize(ImmutableGraph, ImmutableGraph, ProgressLogger)
+ */
+ public static ImmutableGraph symmetrize(final ImmutableGraph g) {
+ return symmetrize(g, null, null);
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>.
+ *
+ * <P>This method can process {@linkplain ImmutableGraph#loadOffline(CharSequence) offline graphs}).
+ *
+ * @param g an immutable graph.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an immutable graph obtained by transposing <code>g</code>.
+ */
+
+ public static ImmutableGraph transpose(final ImmutableGraph g, final ProgressLogger pl) {
+
+ int i, j, d, a[];
+
+ final int n = g.numNodes();
+ final int numPred[] = new int[n];
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start("Counting predecessors...");
+ }
+
+ NodeIterator nodeIterator = g.nodeIterator();
+
+ long m = 0; // Number of arcs, computed on the fly.
+
+ for(i = n; i-- != 0;) {
+ nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ a = nodeIterator.successorArray();
+ m += d;
+ while(d-- != 0) numPred[a[d]]++;
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+
+ final int pred[][] = new int[n][];
+
+ if (pl != null) {
+ pl.expectedUpdates = n;
+ pl.start("Allocating memory for predecessors...");
+ }
+
+ for(i = n; i-- != 0;) {
+ if (numPred[i] != 0) pred[i] = new int[numPred[i]];
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+
+ Arrays.fill(numPred, 0);
+
+ if (pl != null) {
+ pl.expectedUpdates = n;
+ pl.start("Computing predecessors...");
+ }
+
+ nodeIterator = g.nodeIterator();
+
+ for(i = n; i-- != 0;) {
+ j = nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ a = nodeIterator.successorArray();
+ while(d-- != 0) pred[a[d]][numPred[a[d]]++] = j;
+ if (pl != null) pl.update();
+ }
+
+ if (pl != null) pl.done();
+
+ if (pl != null) {
+ pl.expectedUpdates = n;
+ pl.start("Sorting predecessors...");
+ }
+
+ for(i = n; i-- != 0;) {
+ if (pred[i] != null) Arrays.sort(pred[i]);
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+
+ final long numArcs = m;
+ return new ImmutableGraph() {
+ @Override
+ public int numNodes() { return n; }
+ @Override
+ public long numArcs() { return numArcs; }
+ @Override
+ public ImmutableGraph copy() { return this; }
+ @Override
+ public boolean randomAccess() { return true; }
+ @Override
+ public int[] successorArray(final int x) { return pred[x] != null ? pred[x] : IntArrays.EMPTY_ARRAY; }
+ @Override
+ public int outdegree(final int x) { return successorArray(x).length; }
+ };
+ }
+
+
+
+ /* Provides a sequential immutable graph by merging batches on the fly. */
+ public final static class BatchGraph extends ImmutableSequentialGraph {
+ private final static class BatchGraphNodeIterator extends NodeIterator {
+ /** The buffer size. We can't make it too big&mdash;there's one per batch, per thread. */
+ private static final int STD_BUFFER_SIZE = 128 * 1024;
+ /** The indirect queue used to merge the batches. */
+ private final IntHeapSemiIndirectPriorityQueue queue;
+ /** The reference array for {@link #queue}. */
+ private final int[] refArray;
+ /** The input bit streams over the batches. */
+ private final InputBitStream[] batchIbs;
+ /** An upper bound on the last node to be returned. */
+ private final int upperBound;
+ /** The number of elements in each each {@linkplain #batchIbs batch}. */
+ private final int[] inputStreamLength;
+ /** The target of the lastly returned arcs */
+ private final int[] prevTarget;
+ /** The last returned node (-1 if no node has been returned yet). */
+ private int last;
+ /** The outdegree of the current node (valid if {@link #last} is not -1). */
+ private int outdegree;
+ /** The successors of the current node (valid if {@link #last} is not -1);
+ * only the first {@link #outdegree} entries are meaningful. */
+ private int[] successor;
+ /** The batches underlying this iterator. */
+ private final ObjectArrayList<File> batches;
+ /** The number of nodes in the graph. */
+ private final int n;
+
+ private BatchGraphNodeIterator(final int n, final ObjectArrayList<File> batches, final int upperBound) throws IOException {
+ this(n, batches, upperBound, null, null, null, null, -1, 0, IntArrays.EMPTY_ARRAY);
+ }
+
+ private BatchGraphNodeIterator(final int n, final ObjectArrayList<File> batches, final int upperBound, final InputBitStream[] baseIbs, final int[] refArray, final int[] prevTarget, int[] inputStreamLength, final int last, final int outdegree, final int successor[]) throws IOException {
+ this.n = n;
+ this.batches = batches;
+ this.upperBound = Math.min(n, upperBound);
+ this.last = last;
+ this.outdegree = outdegree;
+ this.successor = successor;
+ batchIbs = new InputBitStream[batches.size()];
+
+ if (refArray == null) {
+ this.refArray = new int[batches.size()];
+ this.prevTarget = new int[batches.size()];
+ this.inputStreamLength = new int[batches.size()];
+ Arrays.fill(this.prevTarget, -1);
+ queue = new IntHeapSemiIndirectPriorityQueue(this.refArray);
+ // We open all files and load the first element into the reference array.
+ for(int i = 0; i < batches.size(); i++) {
+ batchIbs[i] = new InputBitStream(batches.get(i), STD_BUFFER_SIZE);
+ this.inputStreamLength[i] = batchIbs[i].readDelta();
+ this.refArray[i] = batchIbs[i].readDelta();
+ queue.enqueue(i);
+ }
+ }
+ else {
+ this.refArray = refArray;
+ this.prevTarget = prevTarget;
+ this.inputStreamLength = inputStreamLength;
+ queue = new IntHeapSemiIndirectPriorityQueue(refArray);
+
+ for(int i = 0; i < refArray.length; i++) {
+ if (baseIbs[i] != null) {
+ batchIbs[i] = new InputBitStream(batches.get(i), STD_BUFFER_SIZE);
+ batchIbs[i].position(baseIbs[i].position());
+ queue.enqueue(i);
+ }
+ }
+ }
+ }
+
+ @Override
+ public NodeIterator copy(int upperBound) {
+ try {
+ if (last == -1) return new BatchGraphNodeIterator(n, batches, upperBound);
+ else return new BatchGraphNodeIterator(n, batches, upperBound, batchIbs, refArray.clone(), prevTarget.clone(), inputStreamLength.clone(), last, outdegree, Arrays.copyOf(successor, outdegree));
+ }
+ catch (final IOException e) {
+ throw new RuntimeException(e.getMessage(), e);
+ }
+ }
+
+ @Override
+ public int outdegree() {
+ if (last == -1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return last < upperBound - 1;
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ last++;
+ int d = 0;
+ int i;
+
+ try {
+ /* We extract elements from the queue as long as their target is equal
+ * to last. If during the process we exhaust a batch, we close it. */
+
+ while(! queue.isEmpty() && refArray[i = queue.first()] == last) {
+ successor = IntArrays.grow(successor, d + 1);
+ successor[d] = (prevTarget[i] += batchIbs[i].readDelta() + 1);
+ if (--inputStreamLength[i] == 0) {
+ queue.dequeue();
+ batchIbs[i].close();
+ batchIbs[i] = null;
+ }
+ else {
+ // We read a new source and update the queue.
+ final int sourceDelta = batchIbs[i].readDelta();
+ if (sourceDelta != 0) {
+ refArray[i] += sourceDelta;
+ prevTarget[i] = -1;
+ queue.changed();
+ }
+ }
+ d++;
+ }
+ // Neither quicksort nor heaps are stable, so we reestablish order here.
+ IntArrays.quickSort(successor, 0, d);
+ if (d != 0) {
+ int p = 0;
+ for(int j = 1; j < d; j++) if (successor[p] != successor[j]) successor[++p] = successor[j];
+ d = p + 1;
+ }
+ }
+ catch(final IOException e) {
+ e.printStackTrace();
+ throw new RuntimeException(this + " " + e);
+ }
+
+ outdegree = d;
+ return last;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (last == -1) throw new IllegalStateException();
+ return successor;
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(final InputBitStream ibs: batchIbs) if (ibs != null) ibs.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ }
+
+ private final ObjectArrayList<File> batches;
+ private final int n;
+ private final long numArcs;
+
+ public BatchGraph(final int n, final long m, final ObjectArrayList<File> batches) {
+ this.batches = batches;
+ this.n = n;
+ this.numArcs = m;
+ }
+
+ @Override
+ public int numNodes() { return n; }
+ @Override
+ public long numArcs() { return numArcs; }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return true;
+ }
+
+ @Override
+ public BatchGraph copy() {
+ return this;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ try {
+ return new BatchGraphNodeIterator(n, batches, n);
+ }
+ catch(final IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(final File f : batches) f.delete();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ };
+
+
+ /** Sorts the given source and target arrays w.r.t. the target and stores them in a temporary file.
+ *
+ * @param n the index of the last element to be sorted (exclusive).
+ * @param source the source array.
+ * @param target the target array.
+ * @param tempDir a temporary directory where to store the sorted arrays, or <code>null</code>
+ * @param batches a list of files to which the batch file will be added.
+ * @return the number of pairs in the batch (might be less than <code>n</code> because duplicates are eliminated).
+ */
+
+ public static int processBatch(final int n, final int[] source, final int[] target, final File tempDir, final List<File> batches) throws IOException {
+
+ IntArrays.parallelQuickSort(source, target, 0, n);
+
+ final File batchFile = File.createTempFile("batch", ".bitstream", tempDir);
+ batchFile.deleteOnExit();
+ batches.add(batchFile);
+ final OutputBitStream batch = new OutputBitStream(batchFile);
+ int u = 0;
+ if (n != 0) {
+ // Compute unique pairs
+ u = 1;
+ for(int i = n - 1; i-- != 0;) if (source[i] != source[i + 1] || target[i] != target[i + 1]) u++;
+ batch.writeDelta(u);
+ int prevSource = source[0];
+ batch.writeDelta(prevSource);
+ batch.writeDelta(target[0]);
+
+ for(int i = 1; i < n; i++) {
+ if (source[i] != prevSource) {
+ batch.writeDelta(source[i] - prevSource);
+ batch.writeDelta(target[i]);
+ prevSource = source[i];
+ }
+ else if (target[i] != target[i - 1]) {
+ // We don't write duplicate pairs
+ batch.writeDelta(0);
+ if (ASSERTS) assert target[i] > target[i - 1] : target[i] + "<=" + target[i - 1];
+ batch.writeDelta(target[i] - target[i - 1] - 1);
+ }
+ }
+ }
+ else batch.writeDelta(0);
+
+ batch.close();
+ return u;
+ }
+
+ /** Sorts the given source and target arrays w.r.t. the target and stores them in two temporary files.
+ * An additional positionable input bit stream is provided that contains labels, starting at given positions.
+ * Labels are also written onto the appropriate file.
+ *
+ * @param n the index of the last element to be sorted (exclusive).
+ * @param source the source array.
+ * @param target the target array.
+ * @param start the array containing the bit position (within the given input stream) where the label of the arc starts.
+ * @param labelBitStream the positionable bit stream containing the labels.
+ * @param tempDir a temporary directory where to store the sorted arrays.
+ * @param batches a list of files to which the batch file will be added.
+ * @param labelBatches a list of files to which the label batch file will be added.
+ */
+
+ private static void processTransposeBatch(final int n, final int[] source, final int[] target, final long[] start,
+ final InputBitStream labelBitStream, final File tempDir, final List<File> batches, final List<File> labelBatches,
+ final Label prototype) throws IOException {
+ it.unimi.dsi.fastutil.Arrays.quickSort(0, n, (x,y) -> {
+ final int t = source[x] - source[y];
+ if (t != 0) return t;
+ return target[x] - target[y];
+ },
+ (x, y) -> {
+ int t = source[x];
+ source[x] = source[y];
+ source[y] = t;
+ t = target[x];
+ target[x] = target[y];
+ target[y] = t;
+ final long u = start[x];
+ start[x] = start[y];
+ start[y] = u;
+ });
+
+ final File batchFile = File.createTempFile("batch", ".bitstream", tempDir);
+ batchFile.deleteOnExit();
+ batches.add(batchFile);
+ final OutputBitStream batch = new OutputBitStream(batchFile);
+
+ if (n != 0) {
+ // Compute unique pairs
+ batch.writeDelta(n);
+ int prevSource = source[0];
+ batch.writeDelta(prevSource);
+ batch.writeDelta(target[0]);
+
+ for(int i = 1; i < n; i++) {
+ if (source[i] != prevSource) {
+ batch.writeDelta(source[i] - prevSource);
+ batch.writeDelta(target[i]);
+ prevSource = source[i];
+ }
+ else if (target[i] != target[i - 1]) {
+ // We don't write duplicate pairs
+ batch.writeDelta(0);
+ batch.writeDelta(target[i] - target[i - 1] - 1);
+ }
+ }
+ }
+ else batch.writeDelta(0);
+
+ batch.close();
+
+ final File labelFile = File.createTempFile("label-", ".bits", tempDir);
+ labelFile.deleteOnExit();
+ labelBatches.add(labelFile);
+ final OutputBitStream labelObs = new OutputBitStream(labelFile);
+ for (int i = 0; i < n; i++) {
+ labelBitStream.position(start[i]);
+ prototype.fromBitStream(labelBitStream, source[i]);
+ prototype.toBitStream(labelObs, target[i]);
+ }
+ labelObs.close();
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+
+ public static ImmutableSequentialGraph transposeOffline(final ImmutableGraph g, final int batchSize) throws IOException {
+ return transposeOffline(g, batchSize, null);
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+
+ public static ImmutableSequentialGraph transposeOffline(final ImmutableGraph g, final int batchSize, final File tempDir) throws IOException {
+ return transposeOffline(g, batchSize, tempDir, null);
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * <p>This method should be used to transpose very large graph in case {@link #transpose(ImmutableGraph)}
+ * requires too much memory. It creates a number of sorted batches on disk containing arcs
+ * represented by a pair of gap-compressed integers ordered by target
+ * and returns an {@link ImmutableGraph}
+ * that can be accessed only using a {@link ImmutableGraph#nodeIterator() node iterator}. The node iterator
+ * merges on the fly the batches, providing a transposed graph. The files are marked with
+ * {@link File#deleteOnExit()}, so they should disappear when the JVM exits. An additional safety-net
+ * finaliser tries to delete the batches, too.
+ *
+ * <p>Note that each {@link NodeIterator} returned by the transpose requires opening all batches at the same time.
+ * The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case,
+ * another safety-net finaliser closes all files when the iterator is collected.
+ *
+ * <P>This method can process {@linkplain ImmutableGraph#loadOffline(CharSequence) offline graphs}.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ */
+
+ public static ImmutableSequentialGraph transposeOffline(final ImmutableGraph g, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+
+ int j, currNode;
+ final int[] source = new int[batchSize], target = new int[batchSize];
+ final ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ final int n = g.numNodes();
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start("Creating sorted batches...");
+ }
+
+ final NodeIterator nodeIterator = g.nodeIterator();
+
+ // Phase one: we scan the graph, accumulating pairs <source,target> and dumping them on disk.
+ int succ[];
+ long m = 0; // Number of arcs, computed on the fly.
+ j = 0;
+ for(long i = n; i-- != 0;) {
+ currNode = nodeIterator.nextInt();
+ final int d = nodeIterator.outdegree();
+ succ = nodeIterator.successorArray();
+ m += d;
+
+ for(int k = 0; k < d; k++) {
+ target[j] = currNode;
+ source[j++] = succ[k];
+
+ if (j == batchSize) {
+ processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (j != 0) processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, m, pl);
+ }
+
+ return new BatchGraph(n, m, batches);
+ }
+
+ protected static void logBatches(final ObjectArrayList<File> batches, final long pairs, final ProgressLogger pl) {
+ long length = 0;
+ for(final File f : batches) length += f.length();
+ pl.logger().info("Created " + batches.size() + " batches using " + Util.format((double)Byte.SIZE * length / pairs) + " bits/arc.");
+ }
+
+ /** Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via an array.
+ *
+ * @param g an immutable graph.
+ * @param map the transformation map.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @return an immutable, sequentially accessible graph obtained by transforming <code>g</code>.
+ * @see #mapOffline(ImmutableGraph, int[], int, File, ProgressLogger)
+ */
+ public static ImmutableSequentialGraph mapOffline(final ImmutableGraph g, final int map[], final int batchSize) throws IOException {
+ return mapOffline(g, map, batchSize, null);
+ }
+
+ /** Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via an array.
+ *
+ * @param g an immutable graph.
+ * @param map the transformation map.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return an immutable, sequentially accessible graph obtained by transforming <code>g</code>.
+ * @see #mapOffline(ImmutableGraph, int[], int, File, ProgressLogger)
+ */
+ public static ImmutableSequentialGraph mapOffline(final ImmutableGraph g, final int map[], final int batchSize, final File tempDir) throws IOException {
+ return mapOffline(g, map, batchSize, tempDir, null);
+ }
+
+ /** Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via an array.
+ *
+ * See {@link #map(ImmutableGraph, int[], ProgressLogger)} for the semantics of this method and {@link #transpose(ImmutableGraph, ProgressLogger)} for
+ * implementation and performance-related details.
+ *
+ * @param g an immutable graph.
+ * @param map the transformation map.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an immutable, sequentially accessible graph obtained by transforming <code>g</code>.
+ */
+ public static ImmutableSequentialGraph mapOffline(final ImmutableGraph g, final int map[], final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+
+ int j, currNode;
+ final int[] source = new int[batchSize], target = new int[batchSize];
+ final ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ //final int n = g.numNodes();
+
+ int max = -1;
+ for (final int x: map) if (max < x) max = x;
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = g.numNodes();
+ pl.start("Creating sorted batches...");
+ }
+
+ final NodeIterator nodeIterator = g.nodeIterator();
+
+ // Phase one: we scan the graph, accumulating pairs <map[source],map[target]> (if we have to) and dumping them on disk.
+ int succ[];
+ j = 0;
+ long pairs = 0; // Number of pairs
+ for(long i = g.numNodes(); i-- != 0;) {
+ currNode = nodeIterator.nextInt();
+ if (map[currNode] != -1) {
+ final int d = nodeIterator.outdegree();
+ succ = nodeIterator.successorArray();
+
+ for(int k = 0; k < d; k++) {
+ if (map[succ[k]] != -1) {
+ source[j] = map[currNode];
+ target[j++] = map[succ[k]];
+
+ if (j == batchSize) {
+ pairs += processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+ }
+ }
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ // At this point the number of nodes is always known (a traversal has been completed).
+ if (g.numNodes() != map.length) throw new IllegalArgumentException("Mismatch between number of nodes (" + g.numNodes() + ") and map length (" + map.length + ")");
+
+ if (j != 0) pairs += processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, pairs, pl);
+ }
+
+ return new BatchGraph(max + 1, -1, batches);
+ }
+
+ /** Returns an arc-labelled immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method,
+ * plus an additional {@link FastByteArrayOutputStream} needed to store all the labels for a batch.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ArcLabelledImmutableGraph transposeOffline(final ArcLabelledImmutableGraph g, final int batchSize) throws IOException {
+ return transposeOffline(g, batchSize, null);
+ }
+
+ /** Returns an arc-labelled immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method,
+ * plus an additional {@link FastByteArrayOutputStream} needed to store all the labels for a batch.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ArcLabelledImmutableGraph transposeOffline(final ArcLabelledImmutableGraph g, final int batchSize, final File tempDir) throws IOException {
+ return transposeOffline(g, batchSize, tempDir, null);
+ }
+
+
+ /** Returns an arc-labelled immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * <p>This method should be used to transpose very large graph in case {@link #transpose(ImmutableGraph)}
+ * requires too much memory. It creates a number of sorted batches on disk containing arcs
+ * represented by a pair of integers in {@link java.io.DataInput} format ordered by target
+ * and returns an {@link ImmutableGraph}
+ * that can be accessed only using a {@link ImmutableGraph#nodeIterator() node iterator}. The node iterator
+ * merges on the fly the batches, providing a transposed graph. The files are marked with
+ * {@link File#deleteOnExit()}, so they should disappear when the JVM exits. An additional safety-net
+ * finaliser tries to delete the batches, too. As far as labels are concerned, they are temporarily stored in
+ * an in-memory bit stream, that is permuted when it is stored on the disk
+ *
+ * <p>Note that each {@link NodeIterator} returned by the transpose requires opening all batches at the same time.
+ * The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case,
+ * another safety-net finaliser closes all files when the iterator is collected.
+ *
+ * <P>This method can process {@linkplain ArcLabelledImmutableGraph#loadOffline(CharSequence) offline graphs}. Note that
+ * no method to transpose on-line arc-labelled graph is provided currently.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method,
+ * plus an additional {@link FastByteArrayOutputStream} needed to store all the labels for a batch.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ */
+
+ public static ArcLabelledImmutableGraph transposeOffline(final ArcLabelledImmutableGraph g, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+
+ int i, j, d, currNode;
+ final int[] source = new int[batchSize], target = new int[batchSize];
+ final long[] start = new long[batchSize];
+ FastByteArrayOutputStream fbos = new FastByteArrayOutputStream();
+ OutputBitStream obs = new OutputBitStream(fbos);
+ final ObjectArrayList<File> batches = new ObjectArrayList<>(), labelBatches = new ObjectArrayList<>();
+ final Label prototype = g.prototype().copy();
+
+ final int n = g.numNodes();
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start("Creating sorted batches...");
+ }
+
+ final ArcLabelledNodeIterator nodeIterator = g.nodeIterator();
+
+ // Phase one: we scan the graph, accumulating pairs <source,target> and dumping them on disk.
+ int succ[];
+ Label label[] = null;
+ long m = 0; // Number of arcs, computed on the fly.
+ j = 0;
+ for(i = n; i-- != 0;) {
+ currNode = nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ succ = nodeIterator.successorArray();
+ label = nodeIterator.labelArray();
+ m += d;
+
+ for(int k = 0; k < d; k++) {
+ source[j] = succ[k];
+ target[j] = currNode;
+ start[j] = obs.writtenBits();
+ label[k].toBitStream(obs, currNode);
+ j++;
+
+ if (j == batchSize) {
+ obs.flush();
+ processTransposeBatch(batchSize, source, target, start, new InputBitStream(fbos.array), tempDir, batches, labelBatches, prototype);
+ fbos = new FastByteArrayOutputStream();
+ obs = new OutputBitStream(fbos); //ALERT here we should re-use
+ j = 0;
+ }
+ }
+
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (j != 0) {
+ obs.flush();
+ processTransposeBatch(j, source, target, start, new InputBitStream(fbos.array), tempDir, batches, labelBatches, prototype);
+ }
+
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, m, pl);
+ }
+
+ final long numArcs = m;
+
+ // Now we return an immutable graph whose nodeIterator() merges the batches on the fly.
+ return new ArcLabelledImmutableSequentialGraph() {
+ @Override
+ public int numNodes() { return n; }
+ @Override
+ public long numArcs() { return numArcs; }
+ @Override
+ public boolean hasCopiableIterators() { return true; }
+
+ class InternalArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ /** The buffer size. We can't make it too big&mdash;there's two per batch, per thread. */
+ private static final int STD_BUFFER_SIZE = 64 * 1024;
+ private final int[] refArray;
+ private final InputBitStream[] batchIbs;
+ private final InputBitStream[] labelInputBitStream;
+ private final int[] inputStreamLength;
+ private final int[] prevTarget;
+
+ // The indirect queue used to merge the batches.
+ private final IntHeapSemiIndirectPriorityQueue queue;
+ private final int upperBound;
+
+ /** The last returned node. */
+ private int last;
+ /** The outdegree of the current node (valid if {@link #last} is not -1). */
+ private int outdegree;
+ /** The successors of the current node (valid if {@link #last} is not -1);
+ * only the first {@link #outdegree} entries are meaningful. */
+ private int[] successor;
+ /** The labels of the arcs going out of the current node (valid if {@link #last} is not -1);
+ * only the first {@link #outdegree} entries are meaningful. */
+ @SuppressWarnings("hiding")
+ private Label[] label;
+
+ public InternalArcLabelledNodeIterator(final int upperBound) throws IOException {
+ this(upperBound, null, null, null, null, null, -1, 0, IntArrays.EMPTY_ARRAY, new Label[0]);
+ }
+
+ public InternalArcLabelledNodeIterator(final int upperBound, final InputBitStream[] baseIbs, final InputBitStream[] baseLabelInputBitStream, final int[] refArray, final int[] prevTarget, int[] inputStreamLength, final int last, final int outdegree, final int successor[], final Label[] label) throws IOException {
+ this.upperBound = Math.min(n, upperBound);
+ this.last = last;
+ this.outdegree = outdegree;
+ this.successor = successor;
+ this.label = label;
+ batchIbs = new InputBitStream[batches.size()];
+ labelInputBitStream = new InputBitStream[batches.size()];
+
+ if (refArray == null) {
+ this.refArray = new int[batches.size()];
+ this.prevTarget = new int[batches.size()];
+ this.inputStreamLength = new int[batches.size()];
+ Arrays.fill(this.prevTarget, -1);
+ queue = new IntHeapSemiIndirectPriorityQueue(this.refArray);
+ // We open all files and load the first element into the reference array.
+ for(int i = 0; i < batches.size(); i++) {
+ batchIbs[i] = new InputBitStream(batches.get(i), STD_BUFFER_SIZE);
+ labelInputBitStream[i] = new InputBitStream(labelBatches.get(i), STD_BUFFER_SIZE);
+ this.inputStreamLength[i] = batchIbs[i].readDelta();
+ this.refArray[i] = batchIbs[i].readDelta();
+ queue.enqueue(i);
+ }
+ }
+ else {
+ this.refArray = refArray;
+ this.prevTarget = prevTarget;
+ this.inputStreamLength = inputStreamLength;
+ queue = new IntHeapSemiIndirectPriorityQueue(refArray);
+
+ for(int i = 0; i < refArray.length; i++) {
+ if (baseIbs[i] != null) {
+ batchIbs[i] = new InputBitStream(batches.get(i), STD_BUFFER_SIZE);
+ batchIbs[i].position(baseIbs[i].position());
+ labelInputBitStream[i] = new InputBitStream(labelBatches.get(i), STD_BUFFER_SIZE);
+ labelInputBitStream[i].position(baseLabelInputBitStream[i].position());
+ queue.enqueue(i);
+ }
+ }
+ }
+ }
+
+ @Override
+ public int outdegree() {
+ if (last == -1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return last < Math.min(n - 1, upperBound - 1);
+ }
+
+ @Override
+ public int nextInt() {
+ last++;
+ int d = 0;
+ int i;
+
+ try {
+ /* We extract elements from the queue as long as their target is equal
+ * to last. If during the process we exhaust a batch, we close it. */
+
+ while(! queue.isEmpty() && refArray[i = queue.first()] == last) {
+ successor = IntArrays.grow(successor, d + 1);
+ successor[d] = (prevTarget[i] += batchIbs[i].readDelta() + 1);
+ label = ObjectArrays.grow(label, d + 1);
+ label[d] = prototype.copy();
+ label[d].fromBitStream(labelInputBitStream[i], last);
+
+ if (--inputStreamLength[i] == 0) {
+ queue.dequeue();
+ batchIbs[i].close();
+ labelInputBitStream[i].close();
+ batchIbs[i] = null;
+ labelInputBitStream[i] = null;
+ }
+ else {
+ // We read a new source and update the queue.
+ final int sourceDelta = batchIbs[i].readDelta();
+ if (sourceDelta != 0) {
+ refArray[i] += sourceDelta;
+ prevTarget[i] = -1;
+ queue.changed();
+ }
+ }
+ d++;
+ }
+ // Neither quicksort nor heaps are stable, so we reestablish order here.
+ it.unimi.dsi.fastutil.Arrays.quickSort(0, d, (x,y) -> successor[x] - successor[y],
+ (x, y) -> {
+ final int t = successor[x];
+ successor[x] = successor[y];
+ successor[y] = t;
+ final Label l = label[x];
+ label[x] = label[y];
+ label[y] = l;
+ });
+ }
+ catch(final IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ outdegree = d;
+ return last;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (last == -1) throw new IllegalStateException();
+ return successor;
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(final InputBitStream ibs: batchIbs) if (ibs != null) ibs.close();
+ for(final InputBitStream ibs: labelInputBitStream) if (ibs != null) ibs.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ if (last == -1) throw new IllegalStateException();
+ return new LabelledArcIterator() {
+ @SuppressWarnings("hiding")
+ int last = -1;
+
+ @Override
+ public Label label() {
+ return label[last];
+ }
+
+ @Override
+ public int nextInt() {
+ if (last + 1 == outdegree) return -1;
+ return successor[++last];
+ }
+
+ @Override
+ public int skip(int k) {
+ final int toSkip = Math.min(k, outdegree - last - 1);
+ last += toSkip;
+ return toSkip;
+ }
+ };
+ }
+
+
+ @Override
+ public ArcLabelledNodeIterator copy(final int upperBound) {
+ try {
+ if (last == -1) return new InternalArcLabelledNodeIterator(upperBound);
+ else return new InternalArcLabelledNodeIterator(upperBound, batchIbs, labelInputBitStream,
+ refArray.clone(), prevTarget.clone(), inputStreamLength.clone(), last, outdegree, Arrays.copyOf(successor, outdegree), Arrays.copyOf(label, outdegree));
+ }
+ catch (final IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ };
+
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ try {
+ return new InternalArcLabelledNodeIterator(Integer.MAX_VALUE);
+ }
+ catch (final IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(final File f : batches) f.delete();
+ for(final File f : labelBatches) f.delete();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+ };
+ }
+
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>.
+ *
+ * <P>This method can process {@linkplain ImmutableGraph#loadOffline(CharSequence) offline graphs}.
+ *
+ * @param g an immutable graph.
+ * @return an immutable graph obtained by transposing <code>g</code>.
+ * @see #transpose(ImmutableGraph, ProgressLogger)
+ */
+
+ public static ImmutableGraph transpose(ImmutableGraph g) {
+ return transpose(g, null);
+ }
+
+ /** Returns the union of two arc-labelled immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @param labelMergeStrategy the strategy used to merge labels when the same arc
+ * is present in both graphs; if <code>null</code>, {@link Labels#KEEP_FIRST_MERGE_STRATEGY}
+ * is used.
+ * @return the union of the two graphs.
+ */
+ public static ArcLabelledImmutableGraph union(final ArcLabelledImmutableGraph g0, final ArcLabelledImmutableGraph g1, final LabelMergeStrategy labelMergeStrategy) {
+ return new UnionArcLabelledImmutableGraph(g0, g1, labelMergeStrategy == null? Labels.KEEP_FIRST_MERGE_STRATEGY : labelMergeStrategy);
+ }
+
+ /** Returns the union of two immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @return the union of the two graphs.
+ */
+ public static ImmutableGraph union(final ImmutableGraph g0, final ImmutableGraph g1) {
+ return g0 instanceof ArcLabelledImmutableGraph && g1 instanceof ArcLabelledImmutableGraph
+ ? union((ArcLabelledImmutableGraph)g0, (ArcLabelledImmutableGraph)g1, (LabelMergeStrategy)null)
+ : new UnionImmutableGraph(g0, g1);
+ }
+
+
+ private static final class ComposedGraph extends ImmutableSequentialGraph {
+ private final class ComposedGraphNodeIterator extends NodeIterator {
+ private final NodeIterator it0;
+ private final int upperBound;
+ private int[] succ;
+ private final IntOpenHashSet successors;
+ private int outdegree; // -1 means that the cache is empty
+ private int nextNode;
+
+ public ComposedGraphNodeIterator(final int upperBound) {
+ this(upperBound, g0.nodeIterator(), IntArrays.EMPTY_ARRAY, new IntOpenHashSet(Hash.DEFAULT_INITIAL_SIZE, Hash.FAST_LOAD_FACTOR), -1, 0);
+ }
+
+ public ComposedGraphNodeIterator(final int upperBound, final NodeIterator it, final int[] succ, final IntOpenHashSet successors, final int outdegree, final int nextNode) {
+ this.it0 = it;
+ this.upperBound = upperBound;
+ this.succ = succ;
+ this.successors = successors;
+ this.outdegree = outdegree;
+ this.nextNode = nextNode;
+ }
+
+ @Override
+ public int nextInt() {
+ outdegree = -1;
+ final int result = it0.nextInt();
+ assert result == nextNode;
+ nextNode++;
+ return result;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return nextNode < upperBound && it0.hasNext();
+ }
+
+ @Override
+ public int outdegree() {
+ if (outdegree < 0) successorArray();
+ return outdegree;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (outdegree < 0) {
+ final int d = it0.outdegree();
+ final int[] s = it0.successorArray();
+ successors.clear();
+ for (int i = 0; i < d; i++) {
+ final LazyIntIterator s1 = g1.successors(s[i]);
+ int x;
+ while ((x = s1.nextInt()) >= 0) successors.add(x);
+ }
+ outdegree = successors.size();
+ succ = IntArrays.ensureCapacity(succ, outdegree, 0);
+ successors.toArray(succ);
+ IntArrays.quickSort(succ, 0, outdegree);
+ }
+ return succ;
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ return new ComposedGraphNodeIterator(upperBound, it0.copy(Integer.MAX_VALUE), Arrays.copyOf(succ, succ.length), new IntOpenHashSet(successors), outdegree, nextNode);
+ }
+ }
+
+ private final ImmutableGraph g0;
+ private final ImmutableGraph g1;
+
+ private ComposedGraph(ImmutableGraph g0, ImmutableGraph g1) {
+ this.g0 = g0;
+ this.g1 = g1;
+ }
+
+ @Override
+ public int numNodes() {
+ return Math.max(g0.numNodes(), g1.numNodes());
+ }
+
+ @Override
+ public ImmutableSequentialGraph copy() {
+ // Note that only the second graph needs duplication.
+ return new ComposedGraph(g0, g1.copy());
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return true;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ return new ComposedGraphNodeIterator(Integer.MAX_VALUE);
+ }
+ }
+
+ /** Returns the composition (a.k.a. matrix product) of two immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @return the composition of the two graphs.
+ */
+ public static ImmutableGraph compose(final ImmutableGraph g0, final ImmutableGraph g1) {
+ return new ComposedGraph(g0, g1);
+ }
+
+
+ /** Returns the composition (a.k.a. matrix product) of two arc-labelled immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @param strategy a label semiring.
+ * @return the composition of the two graphs.
+ */
+ public static ArcLabelledImmutableGraph compose(final ArcLabelledImmutableGraph g0, final ArcLabelledImmutableGraph g1, final LabelSemiring strategy) {
+ if (g0.prototype().getClass() != g1.prototype().getClass()) throw new IllegalArgumentException("The two graphs have different label classes (" + g0.prototype().getClass().getSimpleName() + ", " +g1.prototype().getClass().getSimpleName() + ")");
+
+ return new ArcLabelledImmutableSequentialGraph() {
+
+ class InternalArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ private final int upperBound;
+ private int next = 0;
+ private int[] succ = IntArrays.EMPTY_ARRAY;
+ private Label[] label = new Label[0];
+ private int maxOutDegree;
+ private int smallCount;
+ private Int2ObjectOpenHashMap<Label> successors = new Int2ObjectOpenHashMap<>(Hash.DEFAULT_INITIAL_SIZE, Hash.FAST_LOAD_FACTOR);
+ private int outdegree = -1; // -1 means that the cache is empty
+ private ArcLabelledNodeIterator it0;
+
+ public InternalArcLabelledNodeIterator(final int upperBond){
+ successors.defaultReturnValue(strategy.zero());
+ it0 = g0.nodeIterator();
+ this.upperBound = upperBond;
+ }
+
+ @Override
+ public int nextInt() {
+ outdegree = -1;
+ final int result = it0.nextInt();
+ assert result == next;
+ next++;
+ return result;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return next < upperBound && it0.hasNext();
+ }
+
+
+ @Override
+ public int outdegree() {
+ if (outdegree < 0) successorArray();
+ return outdegree;
+ }
+
+ private void ensureCache() {
+ if (outdegree < 0) {
+ final int d = it0.outdegree();
+ final LabelledArcIterator s = it0.successors();
+ if (successors.size() < maxOutDegree / 2 && smallCount++ > 100) {
+ smallCount = 0;
+ maxOutDegree = 0;
+ successors = new Int2ObjectOpenHashMap<>(Hash.DEFAULT_INITIAL_SIZE, Hash.FAST_LOAD_FACTOR);
+ successors.defaultReturnValue(strategy.zero());
+ }
+ else successors.clear();
+
+ for (int i = 0; i < d; i++) {
+ final LabelledArcIterator s1 = g1.successors(s.nextInt());
+ int x;
+ while ((x = s1.nextInt()) >= 0) successors.put(x, strategy.add(strategy.multiply(s.label(), s1.label()), successors.get(x)));
+ }
+ outdegree = successors.size();
+ succ = IntArrays.ensureCapacity(succ, outdegree, 0);
+ label = ObjectArrays.ensureCapacity(label, outdegree, 0);
+ successors.keySet().toArray(succ);
+ IntArrays.quickSort(succ, 0, outdegree);
+ for(int i = outdegree; i-- != 0;) label[i] = successors.get(succ[i]);
+ if (outdegree > maxOutDegree) maxOutDegree = outdegree;
+ }
+ }
+
+ @Override
+ public int[] successorArray() {
+ ensureCache();
+ return succ;
+ }
+
+ @Override
+ public Label[] labelArray() {
+ ensureCache();
+ return label;
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ ensureCache();
+ return new LabelledArcIterator() {
+ int i = -1;
+ @Override
+ public Label label() {
+ return label[i];
+ }
+
+ @Override
+ public int nextInt() {
+ return i < outdegree - 1 ? succ[++i] : -1;
+ }
+
+ @Override
+ public int skip(final int n) {
+ final int incr = Math.min(n, outdegree - i - 1);
+ i += incr;
+ return incr;
+ }
+ };
+ }
+
+ @Override
+ public ArcLabelledNodeIterator copy(int upperBound) {
+ final InternalArcLabelledNodeIterator result = new InternalArcLabelledNodeIterator(upperBound);
+ result.it0 = it0.copy(upperBound);
+ result.next = next;
+ result.succ = Arrays.copyOf(succ, succ.length);
+ result.label = Arrays.copyOf(label, label.length);
+ result.maxOutDegree = maxOutDegree;
+ result.smallCount = smallCount;
+ result.successors = new Int2ObjectOpenHashMap<>(successors);
+ result.successors.defaultReturnValue(successors.defaultReturnValue());
+ result.outdegree = outdegree;
+ return result;
+ }
+ };
+
+ @Override
+ public Label prototype() {
+ return g0.prototype();
+ }
+
+ @Override
+ public int numNodes() {
+ return Math.max(g0.numNodes(), g1.numNodes());
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return g0.hasCopiableIterators() && g1.hasCopiableIterators();
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return new InternalArcLabelledNodeIterator(Integer.MAX_VALUE);
+ }
+ };
+ }
+
+ /** Computes the line graph of a given symmetric graph. The line graph of <var>g</var> is a graph, whose nodes are
+ * identified with pairs of the form &lt;<var>x</var>,&nbsp;<var>y</var>&gt; where <var>x</var> and <var>y</var> are nodes of <var>g</var>
+ * and &lt;<var>x</var>,&nbsp;<var>y</var>&gt; is an arc of <var>g</var>. Moreover, there is an arc from &lt;<var>x</var>,&nbsp;<var>y</var>&gt; to
+ * &lt;<var>y</var>,&nbsp;<var>z</var>&gt;.
+ *
+ * <P>Two additional files are created, with names stemmed from <code>mapBasename</code>; the <var>i</var>-th entries of the two files
+ * identify the source and target node (in the original graph) corresponding the node <var>i</var> in the line graph.
+ *
+ * @param g the graph (it must be symmetric and loopless).
+ * @param mapBasename the basename of two files that will, at the end, contain as many integers as the number of nodes in the line graph: the <var>i</var>-th
+ * integer in the file <code><var>mapBasename</var>.source</code> will contain the source of the arc corresponding to the <var>i</var>-th
+ * node in the line graph, and similarly <code><var>mapBasename</var>.target</code> will give the target.
+ * @param tempDir the temporary directory to be used.
+ * @param batchSize the size used for batches.
+ * @param pl the progress logger to be used.
+ * @return the line graph of <code>g</code>.
+ * @throws IOException
+ */
+ public static ImmutableSequentialGraph line(final ImmutableGraph g, final String mapBasename, final File tempDir, final int batchSize, final ProgressLogger pl) throws IOException {
+ final int n = g.numNodes();
+ final int[] source = new int[batchSize], target = new int[batchSize];
+ int currBatch = 0, pairs = 0;
+ final ObjectArrayList<File> batches = new ObjectArrayList<>();
+ final long[] edge = new long[(int)g.numArcs()];
+ int edgesSoFar = 0;
+ NodeIterator nodeIterator = g.nodeIterator();
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start("Producing batches for line graph");
+ }
+ long expNumberOfArcs = 0;
+ while (nodeIterator.hasNext()) {
+ final int x = nodeIterator.nextInt();
+ final int d = nodeIterator.outdegree();
+ expNumberOfArcs += d * d;
+ final int[] succ = nodeIterator.successorArray();
+ // New edges
+ for (int i = 0; i < d; i++) {
+ if (succ[i] == x) throw new IllegalArgumentException("The graph contains a loop on node " + x);
+ edge[edgesSoFar++] = ((long)x << 32) | succ[i];
+ }
+ }
+ LOGGER.info("Expected number of arcs: " + expNumberOfArcs);
+ LongArrays.parallelQuickSort(edge);
+ nodeIterator = g.nodeIterator();
+
+ while (nodeIterator.hasNext()) {
+ final int x = nodeIterator.nextInt();
+ final int d = nodeIterator.outdegree();
+ final int[] succ = nodeIterator.successorArray().clone();
+ for (int i = 0; i < d; i++) {
+ final int from0 = x; //Math.min(x, succ[i]);
+ final int to0 = succ[i]; //Math.max(x, succ[i]);
+ final int edge0 = LongArrays.binarySearch(edge, 0, edgesSoFar, ((long)from0 << 32) | to0);
+ if (ASSERTS) assert edge0 >= 0;
+ final int dNext = g.outdegree(to0);
+ final int[] succNext = g.successorArray(to0);
+ for (int j = 0; j < dNext; j++) {
+ final int from1 = to0; //Math.min(x, succ[j]);
+ final int to1 = succNext[j]; //Math.max(x, succ[j]);
+ final int edge1 = LongArrays.binarySearch(edge, 0, edgesSoFar, ((long)from1 << 32) | to1);
+ if (ASSERTS) assert edge1 >= 0;
+ if (currBatch == batchSize) {
+ pairs += processBatch(batchSize, source, target, tempDir, batches);
+ currBatch = 0;
+ }
+ source[currBatch] = edge0;
+ target[currBatch++] = edge1;
+ }
+ }
+ if (pl != null) pl.lightUpdate();
+ }
+ if (currBatch > 0) {
+ pairs += processBatch(currBatch, source, target, tempDir, batches);
+ currBatch = 0;
+ }
+ if (edgesSoFar != edge.length) throw new IllegalArgumentException("Something went wrong (probably the graph was not symmetric)");
+ final DataOutputStream dosSource = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(mapBasename + ".source")));
+ final DataOutputStream dosTarget = new DataOutputStream(new BufferedOutputStream(new FileOutputStream(mapBasename + ".target")));
+ for (final long e: edge) {
+ dosSource.writeInt((int)(e >> 32));
+ dosTarget.writeInt((int)(e & 0xFFFFFFFF));
+ }
+ dosSource.close();
+ dosTarget.close();
+ if (DEBUG)
+ for (int i = 0; i < edgesSoFar; i++) {
+ System.out.println(i + " <- (" + (edge[i] >> 32) + "," + (edge[i] & 0xFFFFFFFF) +")");
+ }
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, pairs, pl);
+ }
+ return new BatchGraph(edgesSoFar, -1, batches);
+ }
+
+ /** Returns a permutation that would make the given graph adjacency lists in Gray-code order.
+ *
+ * <P>Gray codes list all sequences of <var>n</var> zeros and ones in such a way that
+ * adjacent lists differ by exactly one bit. If we assign to each row of the adjacency matrix of
+ * a graph its index as a Gray code, we obtain a permutation that will make similar lines
+ * nearer.
+ *
+ * <P>Note that since a graph permutation permutes <em>both</em> rows and columns, this transformation is
+ * not idempotent: the Gray-code permutation produced from a matrix that has been Gray-code sorted will
+ * <em>not</em> be, in general, the identity.
+ *
+ * <P>The important feature of Gray-code ordering is that it is completely endogenous (e.g., determined
+ * by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge
+ * of the URL associated to each node).
+ *
+ * @param g an immutable graph.
+ * @return the permutation that would order the graph adjacency lists by Gray order
+ * (you can just pass it to {@link #map(ImmutableGraph, int[], ProgressLogger)}).
+ */
+ public static int[] grayCodePermutation(final ImmutableGraph g) {
+ final int n = g.numNodes();
+ final int[] perm = new int[n];
+ int i = n;
+ while(i-- != 0) perm[i] = i;
+
+ final IntComparator grayComparator = (x, y) -> {
+ final LazyIntIterator i1 = g.successors(x), j = g.successors(y);
+ int a, b;
+
+ /* This code duplicates eagerly of the behaviour of the lazy comparator
+ below. It is here for documentation and debugging purposes.
+
+ byte[] g1 = new byte[g.numNodes()], g2 = new byte[g.numNodes()];
+ while(i.hasNext()) g1[g.numNodes() - 1 - i.nextInt()] = 1;
+ while(j.hasNext()) g2[g.numNodes() - 1 - j.nextInt()] = 1;
+ for(int k = g.numNodes() - 2; k >= 0; k--) {
+ g1[k] ^= g1[k + 1];
+ g2[k] ^= g2[k + 1];
+ }
+ for(int k = g.numNodes() - 1; k >= 0; k--) if (g1[k] != g2[k]) return g1[k] - g2[k];
+ return 0;
+ */
+
+ boolean parity = false; // Keeps track of the parity of number of arcs before the current ones.
+ for(;;) {
+ a = i1.nextInt();
+ b = j.nextInt();
+ if (a == -1 && b == -1) return 0;
+ if (a == -1) return parity ? 1 : -1;
+ if (b == -1) return parity ? -1 : 1;
+ if (a != b) return parity ^ (a < b) ? 1 : -1;
+ parity = ! parity;
+ }
+ };
+
+ IntArrays.parallelQuickSort(perm, 0, n, grayComparator);
+
+ if (ASSERTS) for(int k = 0; k < n - 1; k++) assert grayComparator.compare(perm[k], perm[k + 1]) <= 0;
+
+ final int[] invPerm = new int[n];
+ i = n;
+ while(i-- != 0) invPerm[perm[i]] = i;
+
+ return invPerm;
+ }
+
+ /** Returns a random permutation for a given graph.
+ *
+ * @param g an immutable graph.
+ * @param seed for {@link XoRoShiRo128PlusRandom}.
+ * @return a random permutation for the given graph
+ */
+ public static int[] randomPermutation(final ImmutableGraph g, final long seed) {
+ return IntArrays.shuffle(Util.identity(g.numNodes()), new XoRoShiRo128PlusRandom(seed));
+ }
+
+ /** Returns a permutation that would make the given graph adjacency lists in host-by-host Gray-code order.
+ *
+ * <p>This permutation differs from {@link #grayCodePermutation(ImmutableGraph)} in that Gray codes
+ * are computed host by host. There are two variants, <em>strict</em> and <em>loose</em>. In the first case,
+ * we restrict the adjacency matrix to the submatrix corresponding to a host and compute the ordering. In
+ * the second case, we just restrict to the rows corresponding to a host, but then entire rows are used
+ * to compute the ordering.
+ *
+ * @param g an immutable graph.
+ * @param hostMap an array mapping each URL to its host (it is sufficient that each host is assigned a distinct number).
+ * @param strict if true, host-by-host Gray code computation will be strict, that is, the order is computed only
+ * between columns of the same host of the rows.
+ * @return the permutation that would order the graph adjacency lists by host-by-host Gray order
+ * (you can just pass it to {@link #map(ImmutableGraph, int[], ProgressLogger)}).
+ */
+ public static int[] hostByHostGrayCodePermutation(final ImmutableGraph g, final int[] hostMap, final boolean strict) {
+ final int n = g.numNodes();
+ final int[] perm = new int[n];
+ int i = n;
+ while(i-- != 0) perm[i] = i;
+
+ final IntComparator hostByHostGrayComparator = (x, y) -> {
+ final int t = hostMap[x] - hostMap[y];
+ if (t != 0) return t;
+ final LazyIntIterator i1 = g.successors(x), j = g.successors(y);
+ int a, b;
+
+ boolean parity = false; // Keeps track of the parity of number of arcs before the current ones.
+ for(;;) {
+ if (strict) {
+ final int h = hostMap[x];
+ do a = i1.nextInt(); while(a != -1 && hostMap[a] != h);
+ do b = j.nextInt(); while(b != -1 && hostMap[b] != h);
+ }
+ else {
+ a = i1.nextInt();
+ b = j.nextInt();
+ }
+ if (a == -1 && b == -1) return 0;
+ if (a == -1) return parity ? 1 : -1;
+ if (b == -1) return parity ? -1 : 1;
+ if (a != b) return parity ^ (a < b) ? 1 : -1;
+ parity = ! parity;
+ }
+ };
+
+ IntArrays.parallelQuickSort(perm, 0, n, hostByHostGrayComparator);
+
+ if (ASSERTS) for(int k = 0; k < n - 1; k++) assert hostByHostGrayComparator.compare(perm[k], perm[k + 1]) <= 0;
+
+ final int[] invPerm = new int[n];
+ i = n;
+ while(i-- != 0) invPerm[perm[i]] = i;
+
+ return invPerm;
+ }
+
+
+
+ /** Returns a permutation that would make the given graph adjacency lists in lexicographical order.
+ *
+ * <P>Note that since a graph permutation permutes <em>both</em> rows and columns, this transformation is
+ * not idempotent: the lexicographical permutation produced from a matrix that has been
+ * lexicographically sorted will
+ * <em>not</em> be, in general, the identity.
+ *
+ * <P>The important feature of lexicographical ordering is that it is completely endogenous (e.g., determined
+ * by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge
+ * of the URL associated to each node).
+ *
+ * <p><strong>Warning</strong>: rows are numbered from zero <em>from the left</em>. This means,
+ * for instance, that nodes with an arc towards node zero are lexicographically smaller
+ * than nodes without it.
+ *
+ * @param g an immutable graph.
+ * @return the permutation that would order the graph adjacency lists by lexicographical order
+ * (you can just pass it to {@link #map(ImmutableGraph, int[], ProgressLogger)}).
+ */
+ public static int[] lexicographicalPermutation(final ImmutableGraph g) {
+ final int n = g.numNodes();
+ final int[] perm = new int[n];
+ int i = n;
+ while(i-- != 0) perm[i] = i;
+
+ final IntComparator lexicographicalComparator = (x, y) -> {
+ final LazyIntIterator i1 = g.successors(x), j = g.successors(y);
+ int a, b;
+ for(;;) {
+ a = i1.nextInt();
+ b = j.nextInt();
+ if (a == -1 && b == -1) return 0;
+ if (a == -1) return -1;
+ if (b == -1) return 1;
+ if (a != b) return b - a;
+ }
+ };
+
+ IntArrays.parallelQuickSort(perm, 0, n, lexicographicalComparator);
+
+ if (ASSERTS) for(int k = 0; k < n - 1; k++) assert lexicographicalComparator.compare(perm[k], perm[k + 1]) <= 0;
+
+ final int[] invPerm = new int[n];
+ i = n;
+ while(i-- != 0) invPerm[perm[i]] = i;
+
+ return invPerm;
+ }
+
+
+
+ /** Ensures that the arguments are exactly <code>n</code>, if <code>n</code> is nonnegative, or
+ * at least -<code>n</code>, otherwise.
+ */
+
+ protected static boolean ensureNumArgs(String param[], int n) {
+ if (n >= 0 && param.length != n || n < 0 && param.length < -n) {
+ return false;
+ }
+ return true;
+ }
+
+ /** Loads a graph with given data and returns it.
+ *
+ * @param graphClass the class of the graph to be loaded.
+ * @param baseName the graph basename.
+ * @param offline whether the graph is to be loaded in an offline fashion.
+ * @param pl a progress logger.
+ * @return the loaded graph.
+ */
+ protected static ImmutableGraph load(Class<?> graphClass, String baseName, boolean offline, ProgressLogger pl) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException {
+ ImmutableGraph graph = null;
+
+ if (graphClass != null) {
+ if (offline) graph = (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class).invoke(null, baseName);
+ else graph = (ImmutableGraph)graphClass.getMethod("load", CharSequence.class, ProgressLogger.class).invoke(null, baseName, pl);
+ }
+ else graph = offline ? ImmutableGraph.loadOffline(baseName) : ImmutableGraph.load(baseName, pl);
+
+ return graph;
+ }
+
+
+ public static void main(String args[]) throws IOException, IllegalArgumentException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, ClassNotFoundException, JSAPException {
+ Class<?> sourceGraphClass = null, destGraphClass = BVGraph.class;
+ boolean offline = false, ascii = false;
+
+ final Field[] field = Transform.class.getDeclaredFields();
+ final List<String> filterList = new ArrayList<>();
+ final List<String> labelledFilterList = new ArrayList<>();
+
+ for(final Field f: field) {
+ if (ArcFilter.class.isAssignableFrom(f.getType())) filterList.add(f.getName());
+ if (LabelledArcFilter.class.isAssignableFrom(f.getType())) labelledFilterList.add(f.getName());
+ }
+
+ final SimpleJSAP jsap = new SimpleJSAP(Transform.class.getName(),
+ "Transforms one or more graphs. All transformations require, after the name,\n" +
+ "some parameters specified below:\n" +
+ "\n" +
+ "identity sourceBasename destBasename\n" +
+ "map sourceBasename destBasename map [cutoff]\n" +
+ "mapOffline sourceBasename destBasename map [batchSize] [tempDir]\n" +
+ "transpose sourceBasename destBasename\n" +
+ "transposeOffline sourceBasename destBasename [batchSize] [tempDir]\n" +
+ "symmetrize sourceBasename [transposeBasename] destBasename\n" +
+ "symmetrizeOffline sourceBasename destBasename [batchSize] [tempDir]\n" +
+ "union source1Basename source2Basename destBasename [strategy]\n" +
+ "compose source1Basename source2Basename destBasename [semiring]\n" +
+ "gray sourceBasename destBasename\n" +
+ "grayPerm sourceBasename dest\n" +
+ "strictHostByHostGray sourceBasename destBasename hostMap\n" +
+ "strictHostByHostGrayPerm sourceBasename dest hostMap\n" +
+ "looseHostByHostGray sourceBasename destBasename hostMap\n" +
+ "looseHostByHostGrayPerm sourceBasename dest hostMap\n" +
+ "lex sourceBasename destBasename\n" +
+ "lexPerm sourceBasename dest\n" +
+ "line sourceBasename destBasename mapName [batchSize]\n" +
+ "random sourceBasename destBasename [seed]\n" +
+ "arcfilter sourceBasename destBasename arcFilter (available filters: " + filterList + ")\n" +
+ "larcfilter sourceBasename destBasename arcFilter (available filters: " + labelledFilterList + ")\n" +
+ "\n" +
+ "Please consult the Javadoc documentation for more information on each transform.",
+ new Parameter[] {
+ new FlaggedOption("sourceGraphClass", GraphClassParser.getParser(), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "source-graph-class", "Forces a Java class to load the source graph."),
+ new FlaggedOption("destGraphClass", GraphClassParser.getParser(), BVGraph.class.getName(), JSAP.NOT_REQUIRED, 'd', "dest-graph-class", "Forces a Java class to store the destination graph."),
+ new FlaggedOption("destArcLabelledGraphClass", GraphClassParser.getParser(), BitStreamArcLabelledImmutableGraph.class.getName(), JSAP.NOT_REQUIRED, 'L', "dest-arc-labelled-graph-class", "Forces a Java class to store the labels of the destination graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("offline", 'o', "offline", "Use the offline load method to reduce memory consumption."),
+ new Switch("sequential", 'S', "sequential", "Equivalent to offline."),
+ new Switch("ascii", 'a', "ascii", "Maps are in ASCII form (one integer per line)."),
+ new UnflaggedOption("transform", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The transformation to be applied."),
+ new UnflaggedOption("param", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.GREEDY, "The remaining parameters."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ sourceGraphClass = jsapResult.getClass("sourceGraphClass");
+ destGraphClass = jsapResult.getClass("destGraphClass");
+ offline = jsapResult.getBoolean("offline") || jsapResult.getBoolean("sequential");
+ ascii = jsapResult.getBoolean("ascii");
+ final String transform = jsapResult.getString("transform");
+ final String[] param = jsapResult.getStringArray("param");
+
+ String source[] = null, dest = null, map = null;
+ ArcFilter arcFilter = null;
+ LabelledArcFilter labelledArcFilter = null;
+ LabelSemiring labelSemiring = null;
+ LabelMergeStrategy labelMergeStrategy = null;
+ int batchSize = 1000000, cutoff = -1;
+ long seed = 0;
+ File tempDir = null;
+
+ if (! ensureNumArgs(param, -2)) return;
+
+ if (transform.equals("identity") || transform.equals("transpose") || transform.equals("removeDangling") || transform.equals("gray") || transform.equals("grayPerm") || transform.equals("lex") || transform.equals("lexPerm")) {
+ source = new String[] { param[0] };
+ dest = param[1];
+ if (! ensureNumArgs(param, 2)) return;
+ }
+ else if (transform.equals("map") || transform.equals("strictHostByHostGray") || transform.equals("strictHostByHostGrayPerm") || transform.equals("looseHostByHostGray") || transform.equals("looseHostByHostGrayPerm")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ map = param[2];
+ if (param.length == 4) cutoff = Integer.parseInt(param[3]);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("mapOffline")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ map = param[2];
+ if (param.length >= 4) {
+ batchSize = ((Integer)JSAP.INTSIZE_PARSER.parse(param[3])).intValue();
+ if (param.length == 5) tempDir = new File(param[4]);
+ else if (! ensureNumArgs(param, 4)) return;
+ }
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("symmetrize")) {
+ if (param.length == 2) {
+ source = new String[] { param[0], null };
+ dest = param[1];
+ }
+ else if (ensureNumArgs(param, 3)) {
+ source = new String[] { param[0], param[1] };
+ dest = param[2];
+ }
+ else return;
+ }
+ else if (transform.equals("random")) {
+ if (param.length == 2) {
+ source = new String[] { param[0], null };
+ dest = param[1];
+ }
+ else if (ensureNumArgs(param, 3)) {
+ source = new String[] { param[0] };
+ dest = param[1];
+ seed = Long.parseLong(param[2]);
+ }
+ else return;
+ }
+ else if (transform.equals("arcfilter")) {
+ if (ensureNumArgs(param, 3)) {
+ try {
+ // First try: a public field
+ arcFilter = (ArcFilter) Transform.class.getField(param[2]).get(null);
+ }
+ catch(final NoSuchFieldException e) {
+ // No chance: let's try with a class
+ arcFilter = ObjectParser.fromSpec(param[2], ArcFilter.class, GraphClassParser.PACKAGE);
+ }
+ source = new String[] { param[0], null };
+ dest = param[1];
+ }
+ else return;
+ }
+ else if (transform.equals("larcfilter")) {
+ if (ensureNumArgs(param, 3)) {
+ try {
+ // First try: a public field
+ labelledArcFilter = (LabelledArcFilter) Transform.class.getField(param[2]).get(null);
+ }
+ catch(final NoSuchFieldException e) {
+ // No chance: let's try with a class
+ labelledArcFilter = ObjectParser.fromSpec(param[2], LabelledArcFilter.class, GraphClassParser.PACKAGE);
+ }
+ source = new String[] { param[0], null };
+ dest = param[1];
+ }
+ else return;
+ }
+ else if (transform.equals("union")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0], param[1] };
+ dest = param[2];
+ if (param.length == 4) labelMergeStrategy = ObjectParser.fromSpec(param[3], LabelMergeStrategy.class, GraphClassParser.PACKAGE);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("compose")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0], param[1] };
+ dest = param[2];
+ if (param.length == 4) labelSemiring = ObjectParser.fromSpec(param[3], LabelSemiring.class, GraphClassParser.PACKAGE);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("transposeOffline") || transform.equals("symmetrizeOffline")) {
+ if (! ensureNumArgs(param, -2)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ if (param.length >= 3) {
+ batchSize = ((Integer)JSAP.INTSIZE_PARSER.parse(param[2])).intValue();
+ if (param.length == 4) tempDir = new File(param[3]);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (! ensureNumArgs(param, 2)) return;
+ }
+ else if (transform.equals("line")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ map = param[2];
+ if (param.length == 4) batchSize = Integer.parseInt(param[3]);
+ }
+ else {
+ System.err.println("Unknown transform: " + transform);
+ return;
+ }
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ final ImmutableGraph[] graph = new ImmutableGraph[source.length];
+ final ImmutableGraph result;
+ final Class<?> destLabelledGraphClass = jsapResult.getClass("destArcLabelledGraphClass");
+ if (! ArcLabelledImmutableGraph.class.isAssignableFrom(destLabelledGraphClass)) throw new IllegalArgumentException("The arc-labelled destination class " + destLabelledGraphClass.getName() + " is not an instance of ArcLabelledImmutableGraph");
+
+ for (int i = 0; i < source.length; i++)
+ // Note that composition requires the second graph to be always random access.
+ if (source[i] == null) graph[i] = null;
+ else graph[i] = load(sourceGraphClass, source[i], offline && ! (i == 1 && transform.equals("compose")), pl);
+
+ final boolean graph0IsLabelled = graph[0] instanceof ArcLabelledImmutableGraph;
+ final ArcLabelledImmutableGraph graph0Labelled = graph0IsLabelled ? (ArcLabelledImmutableGraph)graph[0] : null;
+ final boolean graph1IsLabelled = graph.length > 1 && graph[1] instanceof ArcLabelledImmutableGraph;
+
+ final String notForLabelled = "This transformation will just apply to the unlabelled graph; label information will be absent";
+
+ if (transform.equals("identity")) result = graph[0];
+ else if (transform.equals("map") || transform.equals("mapOffline")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ pl.start("Reading map...");
+
+ final int n = graph[0].numNodes();
+ final int[] f = new int[n];
+ final long loaded;
+ if (ascii) loaded = TextIO.loadInts(map, f);
+ else loaded = BinIO.loadInts(map, f);
+
+ if (n != loaded) throw new IllegalArgumentException("The source graph has " + n + " nodes, but the permutation contains " + loaded + " longs");
+
+ // Delete from the graph all nodes whose index is above the cutoff, if any.
+ if (cutoff != -1) for(int i = f.length; i-- != 0;) if (f[i] >= cutoff) f[i] = -1;
+
+ pl.count = n;
+ pl.done();
+
+ result = transform.equals("map") ? map(graph[0], f, pl) : mapOffline(graph[0], f, batchSize, tempDir, pl);
+ LOGGER.info("Transform computation completed.");
+ }
+ else if (transform.equals("arcfilter")) {
+ if (graph0IsLabelled && ! (arcFilter instanceof LabelledArcFilter)) {
+ LOGGER.warn(notForLabelled);
+ result = filterArcs(graph[0], arcFilter, pl);
+ }
+ else result = filterArcs(graph[0], arcFilter, pl);
+ }
+ else if (transform.equals("larcfilter")) {
+ if (! graph0IsLabelled) throw new IllegalArgumentException("Filtering on labelled arcs requires a labelled graph");
+ result = filterArcs(graph0Labelled, labelledArcFilter, pl);
+ }
+ else if (transform.equals("symmetrize")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = symmetrize(graph[0], graph[1], pl);
+ }
+ else if (transform.equals("symmetrizeOffline")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = symmetrizeOffline(graph[0], batchSize, tempDir, pl);
+ }
+ else if (transform.equals("removeDangling")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+
+ final int n = graph[0].numNodes();
+ LOGGER.info("Finding dangling nodes...");
+
+ final int[] f = new int[n];
+ final NodeIterator nodeIterator = graph[0].nodeIterator();
+ int c = 0;
+ for(int i = 0; i < n; i++) {
+ nodeIterator.nextInt();
+ f[i] = nodeIterator.outdegree() != 0 ? c++ : -1;
+ }
+ result = map(graph[0], f, pl);
+ }
+ else if (transform.equals("transpose")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = transpose(graph[0], pl);
+ }
+ else if (transform.equals("transposeOffline")) {
+ result = graph0IsLabelled ? transposeOffline(graph0Labelled, batchSize, tempDir, pl) : transposeOffline(graph[0], batchSize, tempDir, pl);
+ }
+ else if (transform.equals("union")) {
+ if (graph0IsLabelled && graph1IsLabelled) {
+ if (labelMergeStrategy == null) throw new IllegalArgumentException("Uniting labelled graphs requires a merge strategy");
+ result = union(graph0Labelled, (ArcLabelledImmutableGraph)graph[1], labelMergeStrategy);
+ }
+ else {
+ if (graph0IsLabelled || graph1IsLabelled) LOGGER.warn(notForLabelled);
+ result = union(graph[0], graph[1]);
+ }
+ }
+ else if (transform.equals("compose")) {
+ if (graph0IsLabelled && graph1IsLabelled) {
+ if (labelSemiring == null) throw new IllegalArgumentException("Composing labelled graphs requires a composition strategy");
+ result = compose(graph0Labelled, (ArcLabelledImmutableGraph)graph[1], labelSemiring);
+ }
+ else {
+ if (graph0IsLabelled || graph1IsLabelled) LOGGER.warn(notForLabelled);
+ result = compose(graph[0], graph[1]);
+ }
+ }
+ else if (transform.equals("gray")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = map(graph[0], grayCodePermutation(graph[0]));
+ }
+ else if (transform.equals("grayPerm")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ BinIO.storeInts(grayCodePermutation(graph[0]), param[1]);
+ return;
+ }
+ else if (transform.equals("strictHostByHostGray")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ final int[] f = new int[graph[0].numNodes()];
+ if (ascii) TextIO.loadInts(map, f);
+ else BinIO.loadInts(map, f);
+ result = map(graph[0], hostByHostGrayCodePermutation(graph[0], f, true));
+ }
+ else if (transform.equals("strictHostByHostGrayPerm")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ final int[] f = new int[graph[0].numNodes()];
+ if (ascii) TextIO.loadInts(map, f);
+ else BinIO.loadInts(map, f);
+ BinIO.storeInts(hostByHostGrayCodePermutation(graph[0], f, true), param[1]);
+ return;
+ }
+ else if (transform.equals("looseHostByHostGray")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ final int[] f = new int[graph[0].numNodes()];
+ if (ascii) TextIO.loadInts(map, f);
+ else BinIO.loadInts(map, f);
+ result = map(graph[0], hostByHostGrayCodePermutation(graph[0], f, false));
+ }
+ else if (transform.equals("looseHostByHostGrayPerm")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ final int[] f = new int[graph[0].numNodes()];
+ if (ascii) TextIO.loadInts(map, f);
+ else BinIO.loadInts(map, f);
+ BinIO.storeInts(hostByHostGrayCodePermutation(graph[0], f, false), param[1]);
+ return;
+ }
+ else if (transform.equals("lex")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = map(graph[0], lexicographicalPermutation(graph[0]));
+ }
+ else if (transform.equals("lexPerm")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ BinIO.storeInts(lexicographicalPermutation(graph[0]), param[1]);
+ return;
+ }
+ else if (transform.equals("random")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = map(graph[0], randomPermutation(graph[0], seed));
+ }
+ else if (transform.equals("line")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = line(graph[0], map, tempDir, batchSize, pl);
+ } else result = null;
+
+ if (result instanceof ArcLabelledImmutableGraph) {
+ // Note that we derelativise non-absolute pathnames to build the underlying graph name.
+ LOGGER.info("The result is a labelled graph (class: " + destLabelledGraphClass.getName() + ")");
+ final File destFile = new File(dest);
+ final String underlyingName = (destFile.isAbsolute() ? dest : destFile.getName()) + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX;
+ destLabelledGraphClass.getMethod("store", ArcLabelledImmutableGraph.class, CharSequence.class, CharSequence.class, ProgressLogger.class).invoke(null, result, dest, underlyingName, pl);
+ ImmutableGraph.store(destGraphClass, result, dest + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX, pl);
+ }
+ else ImmutableGraph.store(destGraphClass, result, dest, pl);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/UnionImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/UnionImmutableGraph.java
new file mode 100644
index 0000000..22f0c4f
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/UnionImmutableGraph.java
@@ -0,0 +1,198 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.util.Arrays;
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** An immutable graph representing the union of two given graphs. Here by &ldquo;union&rdquo;
+ * we mean that an arc will belong to the union iff it belongs to at least one of the two graphs (the number of
+ * nodes of the union is taken to be the maximum among the number of nodes of each graph).
+ */
+public class UnionImmutableGraph extends ImmutableGraph {
+ @SuppressWarnings("unused")
+ private static final Logger LOGGER = LoggerFactory.getLogger(Transform.class);
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ @SuppressWarnings("unused")
+ private static final boolean ASSERTS = false;
+
+ private static final int INITIAL_ARRAY_SIZE = 16;
+
+ private final ImmutableGraph g0, g1;
+ private final int n0, n1, numNodes;
+
+ /** The node whose successors are cached, or -1 if no successors are currently cached. */
+ private int cachedNode = -1;
+
+ /** The outdegree of the cached node, if any. */
+ private int outdegree ;
+
+ /** The successors of the cached node, if any; note that the array might be larger. */
+ private int cache[];
+
+ /** Creates the union of two given graphs.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ */
+ public UnionImmutableGraph(ImmutableGraph g0, ImmutableGraph g1) {
+ this.g0 = g0;
+ this.g1 = g1;
+ n0 = g0.numNodes();
+ n1 = g1.numNodes();
+ numNodes = Math.max(n0, n1);
+ }
+
+ @Override
+ public UnionImmutableGraph copy() {
+ return new UnionImmutableGraph(g0.copy(), g1.copy());
+ }
+
+
+ private static class InternalNodeIterator extends NodeIterator {
+ /** If outdegree is nonnegative, the successors of the current node (this array may be, however, larger). */
+ private int cache[];
+ /** The outdegree of the current node, or -1 if the successor array for the current node has not been computed yet. */
+ private int outdegree;
+ private NodeIterator i0;
+ private NodeIterator i1;
+
+ public InternalNodeIterator(final NodeIterator i0, final NodeIterator i1) {
+ this(i0, i1, -1, IntArrays.EMPTY_ARRAY);
+ }
+
+ public InternalNodeIterator(final NodeIterator i0, final NodeIterator i1, final int outdegree, final int[] cache) {
+ this.i0 = i0;
+ this.i1 = i1;
+ this.outdegree = outdegree;
+ this.cache = cache;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return i0 != null && i0.hasNext() || i1 != null && i1.hasNext();
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ outdegree = -1;
+ int result = -1;
+ if (i0 != null) {
+ if (i0.hasNext()) result = i0.nextInt();
+ else i0 = null;
+ }
+ if (i1 != null) {
+ if (i1.hasNext()) result = i1.nextInt();
+ else i1 = null;
+ }
+ return result;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (outdegree != -1) return cache;
+ if (i0 == null) {
+ outdegree = i1.outdegree();
+ return cache = i1.successorArray();
+ }
+ if (i1 == null) {
+ outdegree = i0.outdegree();
+ return cache = i0.successorArray();
+ }
+
+ MergedIntIterator merge = new MergedIntIterator(i0.successors(), i1.successors());
+ outdegree = LazyIntIterators.unwrap(merge, cache);
+ int upto, t;
+ while ((t = merge.nextInt()) != -1) {
+ upto = cache.length;
+ cache = IntArrays.grow(cache, upto + 1);
+ cache[upto++] = t;
+ outdegree++;
+ outdegree += LazyIntIterators.unwrap(merge, cache, upto, cache.length - upto);
+ }
+ return cache;
+ }
+
+ @Override
+ public int outdegree() {
+ successorArray(); // So that the cache is filled up
+ return outdegree;
+ }
+
+ @Override
+ public NodeIterator copy(final int upperBound) {
+ return new InternalNodeIterator(i0.copy(upperBound), i1.copy(upperBound), outdegree, Arrays.copyOf(cache, Math.max(outdegree, 0)));
+ }
+
+ }
+ @Override
+ public NodeIterator nodeIterator(final int from) {
+ return new InternalNodeIterator(from < n0 ? g0.nodeIterator(from) : null, from < n1 ? g1.nodeIterator(from) : null);
+ }
+
+ @Override
+ public int numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return g0.randomAccess() && g1.randomAccess();
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return g0.hasCopiableIterators() && g1.hasCopiableIterators();
+ }
+
+ private void fillCache(int x) {
+ if (x == cachedNode) return;
+ MergedIntIterator merge = new MergedIntIterator(x < n0? g0.successors(x) : LazyIntIterators.EMPTY_ITERATOR, x < n1? g1.successors(x) : LazyIntIterators.EMPTY_ITERATOR);
+ outdegree = 0;
+ cache = new int[INITIAL_ARRAY_SIZE];
+ outdegree += LazyIntIterators.unwrap(merge, cache);
+ int upto, t;
+ while ((t = merge.nextInt()) != -1) {
+ upto = cache.length;
+ cache = IntArrays.grow(cache, upto + 1);
+ cache[upto++] = t;
+ outdegree++;
+ outdegree += LazyIntIterators.unwrap(merge, cache, upto, cache.length - upto);
+ }
+ cachedNode = x;
+ }
+
+ @Override
+ public int[] successorArray(int x) {
+ fillCache(x);
+ return cache;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ fillCache(x);
+ return outdegree;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ApproximateNeighbourhoodFunctions.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ApproximateNeighbourhoodFunctions.java
new file mode 100644
index 0000000..4f89a2c
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ApproximateNeighbourhoodFunctions.java
@@ -0,0 +1,172 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.fastutil.objects.ObjectIterators;
+import it.unimi.dsi.fastutil.objects.ObjectList;
+import it.unimi.dsi.stat.Jackknife;
+import it.unimi.dsi.stat.Jackknife.Statistic;
+
+import java.math.BigDecimal;
+import java.math.MathContext;
+import java.util.Arrays;
+
+/** Static methods and objects that manipulate approximate neighbourhood functions.
+ *
+ * <p>A number of {@linkplain Statistic statistics} that can be used with {@link Jackknife}, such as
+ * {@link #CDF}, {@link #AVERAGE_DISTANCE}, {@link #HARMONIC_DIAMETER} and {@link #SPID} are available.
+ */
+public class ApproximateNeighbourhoodFunctions {
+
+ private ApproximateNeighbourhoodFunctions() {}
+
+ /** Combines several approximate neighbourhood functions for the same
+ * graph by averaging their values.
+ *
+ * <p>Note that the resulting approximate neighbourhood function has its standard
+ * deviation reduced by the square root of the number of samples (the standard error). However,
+ * if the cumulative distribution function has to be computed instead, calling this method and dividing
+ * all values by the last value is not the best approach, as it leads to a biased estimate.
+ * Rather, the samples should be combined using the {@linkplain Jackknife jackknife} and
+ * the {@link #CDF} statistic.
+ *
+ * <p>If you want to obtain estimates on the standard error of each data point, please consider using
+ * the {@linkplain Jackknife jackknife} with the {@linkplain Jackknife#IDENTITY identity} statistic instead of this method.
+ *
+ * @param anf an iterable object returning arrays of doubles representing approximate neighbourhood functions.
+ * @return a combined approximate neighbourhood functions.
+ */
+ public static double[] combine(final Iterable<double[]> anf) {
+ final Object[] t = ObjectIterators.unwrap(anf.iterator());
+ final double a[][] = Arrays.copyOf(t, t.length, double[][].class);
+
+ final int n = a.length;
+
+ int length = 0;
+ for(double[] b : a) length = Math.max(length, b.length);
+ final double[] result = new double[length];
+
+ BigDecimal last = BigDecimal.ZERO, curr;
+
+ for(int i = 0; i < length; i++) {
+ curr = BigDecimal.ZERO;
+ for(int j = 0; j < n; j++) curr = curr.add(BigDecimal.valueOf(a[j][i < a[j].length ? i : a[j].length - 1]));
+ if (curr.compareTo(last) < 0) curr = last;
+ result[i] = curr.doubleValue() / n;
+ last = curr;
+ }
+
+ return result;
+ }
+
+ /** Evens out several approximate neighbourhood functions for the same
+ * graph by extending them to the same length (by copying the last value). This is usually a
+ * preparatory step for the {@linkplain Jackknife jackknife}.
+ *
+ * @param anf an iterable object returning arrays of doubles representing approximate neighbourhood functions.
+ * @return a list containing the same approximate neighbourhood functions, extended to the same length.
+ */
+ public static ObjectList<double[]> evenOut(final Iterable<double[]> anf) {
+ final Object[] u = ObjectIterators.unwrap(anf.iterator());
+ final double t[][] = Arrays.copyOf(u, u.length, double[][].class);
+ final int n = t.length;
+ int max = 0;
+ for(double[] a: t) max = Math.max(max, a.length);
+
+ final ObjectArrayList<double[]> result = new ObjectArrayList<>(n);
+ for(int i = 0; i < n; i++) {
+ final double[] a = new double[max];
+ System.arraycopy(t[i], 0, a, 0, t[i].length);
+ for(int j = t[i].length; j < max; j++) a[j] = a[j - 1];
+ result.add(a);
+ }
+
+ return result;
+ }
+
+ /** A statistic that computes the {@linkplain NeighbourhoodFunction#spid(double[]) spid}. */
+ public static Jackknife.Statistic SPID = new Jackknife.Statistic() {
+ @Override
+ public BigDecimal[] compute(final BigDecimal[] sample, final MathContext mc) {
+ BigDecimal sumDistances = BigDecimal.ZERO;
+ BigDecimal sumSquareDistances = BigDecimal.ZERO;
+ for(int i = sample.length; i-- != 1;) {
+ final BigDecimal delta = sample[i].subtract(sample[i - 1]);
+ sumDistances = sumDistances.add(delta.multiply(BigDecimal.valueOf(i)));
+ sumSquareDistances = sumSquareDistances.add(delta.multiply(BigDecimal.valueOf((long)i * i)));
+ }
+ return new BigDecimal[] { sumSquareDistances.divide(sumDistances, mc).subtract(sumDistances.divide(sample[sample.length - 1], mc)) };
+ }
+ };
+
+ /** A statistic that computes the {@linkplain NeighbourhoodFunction#averageDistance(double[]) average distance}. */
+ public static Jackknife.Statistic AVERAGE_DISTANCE = new Jackknife.Statistic() {
+ @Override
+ public BigDecimal[] compute(final BigDecimal[] sample, final MathContext mc) {
+ BigDecimal mean = BigDecimal.ZERO;
+ for(int i = sample.length; i-- != 1;) mean = mean.add(sample[i].subtract(sample[i - 1]).multiply(BigDecimal.valueOf(i)));
+ return new BigDecimal[] { mean.divide(sample[sample.length - 1], mc) };
+ }
+ };
+
+ /** A statistic that computes the {@linkplain NeighbourhoodFunction#harmonicDiameter(int, double[]) harmonic diameter}. */
+ public static Jackknife.Statistic HARMONIC_DIAMETER = new Jackknife.Statistic() {
+ @Override
+ public BigDecimal[] compute(final BigDecimal[] sample, final MathContext mc) {
+ BigDecimal sumInverseDistances = BigDecimal.ZERO;
+ for(int i = sample.length; i-- != 1;) sumInverseDistances = sumInverseDistances.add(sample[i].subtract(sample[i - 1]).divide(BigDecimal.valueOf(i), mc));
+ return new BigDecimal[] { sample[0].multiply(sample[0]).divide(sumInverseDistances, mc) };
+ }
+ };
+
+ /** A statistic that computes the {@linkplain NeighbourhoodFunction#effectiveDiameter(double[]) effective diameter}. */
+ public static Jackknife.Statistic EFFECTIVE_DIAMETER = new Jackknife.AbstractStatistic() {
+ @Override
+ public double[] compute(final double[] sample) {
+ return new double[] { NeighbourhoodFunction.effectiveDiameter(sample) };
+ }
+ };
+
+ /** A statistic that divides all values of a sample (an approximate neighbourhood function)
+ * by the last value. Useful for moving from neighbourhood functions to cumulative distribution functions. */
+ public static Jackknife.Statistic CDF = new Jackknife.Statistic() {
+ @Override
+ public BigDecimal[] compute(final BigDecimal[] sample, final MathContext mc) {
+ final BigDecimal[] result = new BigDecimal[sample.length];
+ final BigDecimal norm = BigDecimal.ONE.divide(sample[sample.length - 1], mc);
+ for(int i = result.length; i-- != 0;) result[i] = sample[i].multiply(norm);
+ return result;
+ }
+ };
+
+ /** A statistic that computes differences between consecutive elements of a sample (an approximate neighbourhood function)
+ * and divide them by the last value. Useful for moving from neighbourhood functions or cumulative distribution functions
+ * to probability mass functions. */
+ public static Jackknife.Statistic PMF = new Jackknife.Statistic() {
+ @Override
+ public BigDecimal[] compute(final BigDecimal[] sample, final MathContext mc) {
+ final BigDecimal[] result = new BigDecimal[sample.length];
+ final BigDecimal norm = BigDecimal.ONE.divide(sample[sample.length - 1], mc);
+ result[0] = sample[0].multiply(norm);
+ for(int i = result.length - 1; i-- != 0;) result[i + 1] = sample[i + 1].subtract(sample[i]).multiply(norm);
+ return result;
+ }
+ };
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/BetweennessCentrality.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/BetweennessCentrality.java
new file mode 100644
index 0000000..289775c
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/BetweennessCentrality.java
@@ -0,0 +1,319 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2012-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorCompletionService;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Computes the betweenness centrality using an implementation of Brandes's algorithm
+ * (Ulrik Brandes, &ldquo;A Faster Algorithm for Betweenness Centrality&rdquo;, <i>Journal of
+ * Mathematical Sociology</i> 25(2):163&minus;177, 2001)
+ * that uses multiple parallel breadth-first visits.
+ *
+ * <p>To use this class you first create an instance, and then invoke {@link #compute()}.
+ * After that, you can peek at the field {@link #betweenness} to discover the betweenness of each node.
+ *
+ * <p>For every three distinct nodes <var>s</var>, <var>t</var> and <var>v</var>, let <var>&sigma;</var><sub><var>s</var><var>t</var></sub> be
+ * the number of shortest paths from <var>s</var> to <var>t</var>, and <var>&sigma;</var><sub><var>s</var><var>t</var></sub>(<var>v</var>) the
+ * number of such paths on which <var>v</var> lies. The betweenness centrality of node <var>v</var> is defined to be the sum of
+ * <var>&delta;</var><sub><var>s</var><var>t</var></sub>(<var>v</var>)=<var>&sigma;</var><sub><var>s</var><var>t</var></sub>(<var>v</var>) / <var>&sigma;</var><sub><var>s</var><var>t</var></sub> over all
+ * pairs of distinct nodes <var>s</var>, <var>t</var> different from <var>v</var> (the summand is assumed to be zero whenever the denominator
+ * is zero).
+ *
+ * <p>Brandes's approach consists in performing a breadth-first visit from every node, recording the
+ * distance of the node from the current source. After each visit, nodes are considered in decreasing order of
+ * distance, and for each of them we consider the arcs (<var>v</var>,<var>w</var>) such that the distance of <var>w</var>
+ * is exactly one plus the distance of <var>v</var>: in this case we say that <var>v</var> is a parent of <var>w</var>.
+ * Such parents are used to compute the values of <var>&delta;</var> (exactly as in the original algorithm, but without
+ * any need to keep an explicit set of parents, which is important since this class is memory intensive).
+ *
+ * <p>Every visit is independent and is carried out by a separate thread. The only contention point
+ * is the update of the array accumulating the betweenness score, which is negligible. The downside is
+ * that running on <var>k</var> cores requires approximately <var>k</var> times the memory of the
+ * sequential algorithm, as only the graph and the betweenness array will be shared.
+ *
+ * <p>This class keeps carefully track of overflows in path counters, and will throw an exception in case they happen.
+ * Thanks to David Gleich for making me note this serious problem, which is often overlooked.
+ */
+
+public class BetweennessCentrality {
+ private final static Logger LOGGER = LoggerFactory.getLogger(BetweennessCentrality.class);
+
+ /** An exception telling that the path count exceeded 64-bit integer arithmetic. */
+ public static final class PathCountOverflowException extends RuntimeException {
+ public PathCountOverflowException() {}
+
+ public PathCountOverflowException(String s) {
+ super(s);
+ }
+
+ private static final long serialVersionUID = 1L;
+ }
+
+ /** The graph under examination. */
+ private final ImmutableGraph graph;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be visited. */
+ protected final AtomicInteger nextNode;
+ /** Whether to stop abruptly the visiting process. */
+ protected volatile boolean stop;
+ /** The array of betweenness value. */
+ public final double[] betweenness;
+
+ /** Creates a new class for computing betweenness centrality.
+ *
+ * @param graph a graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param pl a progress logger, or {@code null}.
+ */
+ public BetweennessCentrality(final ImmutableGraph graph, final int requestedThreads, final ProgressLogger pl) {
+ this.pl = pl;
+ this.graph = graph;
+ this.betweenness = new double[graph.numNodes()];
+ this.nextNode = new AtomicInteger();
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ }
+
+ /** Creates a new class for computing betweenness centrality, using as many threads as
+ * the number of available processors.
+ *
+ * @param graph a graph.
+ * @param pl a progress logger, or {@code null}.
+ */
+ public BetweennessCentrality(final ImmutableGraph graph, final ProgressLogger pl) {
+ this(graph, 0, pl);
+ }
+
+ /** Creates a new class for computing betweenness centrality.
+ *
+ * @param graph a graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ */
+ public BetweennessCentrality(final ImmutableGraph graph, final int requestedThreads) {
+ this(graph, 1, null);
+ }
+
+ /** Creates a new class for computing betweenness centrality, using as many threads as
+ * the number of available processors.
+ *
+ * @param graph a graph.
+ */
+ public BetweennessCentrality(final ImmutableGraph graph) {
+ this(graph, 0);
+ }
+
+ private final class IterationThread implements Callable<Void> {
+ /** The queue of visited nodes. */
+ private final IntArrayList queue;
+ /** At the end of a visit, the cutpoints of {@link #queue}. The <var>d</var>-th cutpoint is the first node in the queue at distance <var>d</var>. The
+ * last cutpoint is the queue size. */
+ private final IntArrayList cutPoints;
+ /** The array containing the distance of each node from the current source (or -1 if the node has not yet been reached by the visit). */
+ private final int[] distance;
+ /** The array containing the values of &sigma; incremented for each parent/child pair during each visit, as explained in Brandes's algorithm. */
+ private final long[] sigma;
+ /** The array of dependencies (computed at the end of each visit). */
+ private final double[] delta;
+
+ private IterationThread() {
+ this.distance = new int[graph.numNodes()];
+ this.sigma = new long[graph.numNodes()];
+ this.delta = new double[graph.numNodes()];
+ this.queue = new IntArrayList(graph.numNodes());
+ this.cutPoints = new IntArrayList();
+ }
+
+ private boolean checkOverflow(final long[] sigma, final int node, final long currSigma, int s) {
+ if (sigma[s] > Long.MAX_VALUE - currSigma) throw new PathCountOverflowException(sigma[s] + " > " + (Long.MAX_VALUE - currSigma) + " (" + node + " -> " + s + ")");
+ return true;
+ }
+
+ @Override
+ public Void call() {
+ // We cache frequently used fields.
+ final int[] distance = this.distance;
+ final double[] delta = this.delta;
+ final long[] sigma = this.sigma;
+ final IntArrayList queue = this.queue;
+ final ImmutableGraph graph = BetweennessCentrality.this.graph.copy();
+
+ for(;;) {
+ final int curr = nextNode.getAndIncrement();
+ if (BetweennessCentrality.this.stop || curr >= graph.numNodes()) return null;
+ queue.clear();
+ queue.add(curr);
+ cutPoints.clear();
+ cutPoints.add(0);
+ Arrays.fill(distance, -1);
+ Arrays.fill(sigma, 0);
+ distance[curr] = 0;
+ sigma[curr] = 1;
+ boolean overflow = false;
+
+ int d;
+ for(d = 0; queue.size() != cutPoints.getInt(cutPoints.size() - 1); d++) {
+ cutPoints.add(queue.size());
+ final int start = cutPoints.getInt(d);
+ final int end = cutPoints.getInt(d + 1);
+
+ for(int pos = start; pos < end; pos++) {
+ final int node = queue.getInt(pos);
+ final long currSigma = sigma[node];
+ final LazyIntIterator successors = graph.successors(node);
+ for(int s; (s = successors.nextInt()) != -1;) {
+ if (distance[s] == -1) {
+ distance[s] = d + 1;
+ delta[s] = 0;
+ queue.add(s);
+ assert checkOverflow(sigma, node, currSigma, s);
+ overflow |= sigma[s] > Long.MAX_VALUE - currSigma;
+ sigma[s] += currSigma;
+ }
+ else if (distance[s] == d + 1) {
+ assert checkOverflow(sigma, node, currSigma, s);
+ overflow |= sigma[s] > Long.MAX_VALUE - currSigma;
+ sigma[s] += currSigma;
+ }
+ }
+ }
+ }
+
+ if (overflow) throw new PathCountOverflowException();
+
+ while(--d > 0) {
+ final int start = cutPoints.getInt(d);
+ final int end = cutPoints.getInt(d + 1);
+
+ for(int pos = start; pos < end; pos++) {
+ final int node = queue.getInt(pos);
+ double sigmaNode = sigma[node];
+ final LazyIntIterator succ = graph.successors(node);
+ for(int s; (s = succ.nextInt()) != -1;)
+ if (distance[s] == d + 1) delta[node] += (1 + delta[s]) * sigmaNode / sigma[s];
+ }
+
+ synchronized (BetweennessCentrality.this) {
+ for(int pos = start; pos < end; pos++) {
+ final int node = queue.getInt(pos);
+ betweenness[node] += delta[node];
+ }
+ }
+ }
+
+ if (BetweennessCentrality.this.pl != null)
+ synchronized (BetweennessCentrality.this.pl) {
+ BetweennessCentrality.this.pl.update();
+ }
+ }
+ }
+ }
+
+
+ /** Computes betweenness centrality. Results can be found in {@link BetweennessCentrality#betweenness}. */
+ public void compute() throws InterruptedException {
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = 0; i < thread.length; i++) thread[i] = new IterationThread();
+
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.expectedUpdates = graph.numNodes();
+ pl.itemsName = "nodes";
+ }
+
+ final ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
+ final ExecutorCompletionService<Void> executorCompletionService = new ExecutorCompletionService<>(executorService);
+
+ for(int i = thread.length; i-- != 0;) executorCompletionService.submit(thread[i]);
+
+ try {
+ for(int i = thread.length; i-- != 0;) executorCompletionService.take().get();
+ }
+ catch(ExecutionException e) {
+ stop = true;
+ Throwable cause = e.getCause();
+ throw cause instanceof RuntimeException ? (RuntimeException)cause : new RuntimeException(cause.getMessage(), cause);
+ }
+ finally {
+ executorService.shutdown();
+ }
+
+ if (pl != null) pl.done();
+ }
+
+
+ public static void main(final String[] arg) throws IOException, InterruptedException, JSAPException {
+
+ SimpleJSAP jsap = new SimpleJSAP(BetweennessCentrality.class.getName(), "Computes the betweenness centrality a graph using an implementation of Brandes's algorithm based on multiple parallel breadth-first visits.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("rankFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the resulting rank (doubles in binary form) are stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final String rankFilename = jsapResult.getString("rankFilename");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+ progressLogger.displayFreeMemory = true;
+ progressLogger.displayLocalSpeed = true;
+
+ ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+ if (jsapResult.userSpecified("expand")) graph = new ArrayListMutableGraph(graph).immutableView();
+
+ BetweennessCentrality betweennessCentralityMultipleVisits = new BetweennessCentrality(graph, threads, progressLogger);
+ betweennessCentralityMultipleVisits.compute();
+
+ BinIO.storeDoubles(betweennessCentralityMultipleVisits.betweenness, rankFilename);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ConnectedComponents.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ConnectedComponents.java
new file mode 100644
index 0000000..6fab876
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ConnectedComponents.java
@@ -0,0 +1,214 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.UnionImmutableGraph;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/**
+ * Computes the connected components of a <em>symmetric</em> (a.k.a&#46; <em>undirected</em>) graph
+ * using a {@linkplain ParallelBreadthFirstVisit parallel breadth-first visit}.
+ *
+ * <p>The {@link #compute(ImmutableGraph, int, ProgressLogger)} method of this class will return an
+ * instance that contains the data computed by visiting the graph (using an instance of
+ * {@link ParallelBreadthFirstVisit}). Note that it is your responsibility to pass a symmetric graph
+ * to {@link #compute(ImmutableGraph, int, ProgressLogger)}. Otherwise, results will be
+ * unpredictable.
+ *
+ * <p>After getting an instance, it is possible to run the {@link #computeSizes()} and
+ * {@link #sortBySize(int[])} methods to obtain further information. This scheme has been devised to
+ * exploit the available memory as much as possible&mdash;after the components have been computed,
+ * the returned instance keeps no track of the graph, and the related memory can be freed by the
+ * garbage collector.
+ *
+ * <p>Furthermore, it is possible to remove all components except the biggest one from a graph,
+ * using the function {@link #getLargestComponent}.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class uses an instance of {@link ParallelBreadthFirstVisit} to ensure a high degree of
+ * parallelism (see its documentation for memory requirements).
+ */
+
+public class ConnectedComponents {
+ private static final Logger LOGGER = LoggerFactory.getLogger(ConnectedComponents.class);
+
+ /** The number of connected components. */
+ public final int numberOfComponents;
+
+ /** The component of each node. */
+ public final int component[];
+
+ protected ConnectedComponents(final int numberOfComponents, final int[] component) {
+ this.numberOfComponents = numberOfComponents;
+ this.component = component;
+ }
+
+ /**
+ * Computes the connected components of a symmetric graph.
+ *
+ * @param symGraph a symmetric graph.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static ConnectedComponents compute(final ImmutableGraph symGraph, final int threads, final ProgressLogger pl) {
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(symGraph, threads, false, pl);
+ visit.visitAll();
+ final AtomicIntegerArray visited = visit.marker;
+ final int numberOfComponents = visit.round + 1;
+ visit = null;
+ final int[] component = new int[visited.length()];
+ for (int i = component.length; i-- != 0;)
+ component[i] = visited.get(i);
+ return new ConnectedComponents(numberOfComponents, component);
+ }
+
+ /**
+ * Returns the largest connected components of a symmetric graph.
+ *
+ * @param symGraph a symmetric graph.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an ImmutableGraph containing the largest connected component of the input graph.
+ */
+ public static ImmutableGraph getLargestComponent(final ImmutableGraph symGraph, final int threads, final ProgressLogger pl) {
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(symGraph, threads, false, pl);
+ visit.visitAll();
+ final AtomicIntegerArray visited = visit.marker;
+ final int numberOfComponents = visit.round + 1;
+ visit = null;
+ final int[] component = new int[visited.length()];
+ final int[] componentSizes = new int [numberOfComponents];
+ final int[] map = new int[symGraph.numNodes()];
+ int largestCC = 0, largestCCSize = Integer.MIN_VALUE;
+
+ for (int i = component.length; i-- != 0;) {
+ component[i] = visited.get(i);
+ componentSizes[component[i]]++;
+ }
+ for (int i = 0; i < componentSizes.length; i++)
+ if (componentSizes[i] > largestCCSize) {
+ largestCC = i;
+ largestCCSize = componentSizes[i];
+ }
+
+ for (int i = symGraph.numNodes(); i-- != 0;) {
+ if (component[i] == largestCC) {
+ map[i] = --largestCCSize;
+ } else {
+ map[i] = -1;
+ }
+ }
+
+ return Transform.map(symGraph, map, pl);
+ }
+
+ /**
+ * Returns the size array for this set of connected components.
+ *
+ * @return the size array for this set of connected components.
+ */
+ public int[] computeSizes() {
+ final int[] size = new int[numberOfComponents];
+ for (int i = component.length; i-- != 0;)
+ size[component[i]]++;
+ return size;
+ }
+
+ /**
+ * Renumbers by decreasing size the components of this set.
+ *
+ * <p>After a call to this method, both the internal status of this class and the argument array
+ * are permuted so that the sizes of connected components are decreasing in the component index.
+ *
+ * @param size the components sizes, as returned by {@link #computeSizes()}.
+ */
+ public void sortBySize(final int[] size) {
+ final int[] perm = Util.identity(size.length);
+ IntArrays.parallelRadixSortIndirect(perm, size, false);
+ IntArrays.reverse(perm);
+ final int[] copy = size.clone();
+ for (int i = size.length; i-- != 0;)
+ size[i] = copy[perm[i]];
+ Util.invertPermutationInPlace(perm);
+ for (int i = component.length; i-- != 0;)
+ component[i] = perm[component[i]];
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(ConnectedComponents.class.getName(),
+ "Computes the connected components of a symmetric graph of given basename. The resulting data is saved " +
+ "in files stemmed from the given basename with extension .wcc (a list of binary integers specifying the " +
+ "component of each node) and .wccsizes (a list of binary integer specifying the size of each component). " +
+ "The symmetric graph can also be specified using a generic (non-symmetric) graph and its transpose.",
+ new Parameter[] {
+ new Switch("sizes", 's', "sizes", "Compute component sizes."),
+ new Switch("renumber", 'r', "renumber", "Renumber components in decreasing-size order."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("mapped", 'm', "mapped", "Do not load the graph in main memory, but rather memory-map it."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("basenamet", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 't', "transpose", "The basename of the transpose, in case the graph is not symmetric."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of a symmetric graph (or of a generic graph, if the transpose is provided, too)."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final String basenamet = jsapResult.getString("basenamet");
+ final String resultsBasename = jsapResult.getString("resultsBasename", basename);
+ final int threads = jsapResult.getInt("threads");
+ ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ ImmutableGraph graph = jsapResult.userSpecified("mapped") ? ImmutableGraph.loadMapped(basename) : ImmutableGraph.load(basename, pl);
+ ImmutableGraph grapht = basenamet == null ? null : jsapResult.userSpecified("mapped") ? ImmutableGraph.loadMapped(basenamet) : ImmutableGraph.load(basenamet, pl);
+ final ConnectedComponents components = ConnectedComponents.compute(basenamet != null ? new UnionImmutableGraph(graph, grapht) : graph, threads, pl);
+
+ if (jsapResult.getBoolean("sizes") || jsapResult.getBoolean("renumber")) {
+ final int size[] = components.computeSizes();
+ if (jsapResult.getBoolean("renumber")) components.sortBySize(size);
+ if (jsapResult.getBoolean("sizes")) BinIO.storeInts(size, resultsBasename + ".wccsizes");
+ }
+ BinIO.storeInts(components.component, resultsBasename + ".wcc");
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/EliasFanoCumulativeOutdegreeList.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/EliasFanoCumulativeOutdegreeList.java
new file mode 100644
index 0000000..57d2ef9
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/EliasFanoCumulativeOutdegreeList.java
@@ -0,0 +1,155 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.bits.BitVector;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.sux4j.bits.SimpleSelectZero;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+
+/**<p>A content-addressable representation of the cumulative function of outdegrees that uses a stripped-down
+ * implementation of Elias&ndash;Fano's representation of monotone sequences partially taken from {@link EliasFanoMonotoneLongBigList}.
+ *
+ * <p>The purpose of this class is that of storing quasi-succinctly the outdegrees of a graph so that it
+ * is easy to find quickly a batch of nodes whose overall outdegree is a given quantity. It is most effective
+ * in multicore computations depending on the outdegree, as usually the transposed graph has some very high-degree nodes, and often
+ * in web graphs, due to crawling artifacts, these nodes are very close. As a result, a node-based job assignment
+ * ends up in creating batches of nodes that are incredibly expensive, which in turns produced an unbalanced
+ * iteration (e.g., in the last part few processors are actually working).
+ *
+ * <p>The main access method is {@link #skipTo(long)}, which will return a value of the cumulative function larger than or equal to
+ * its argument. At that point, {@link #currentIndex()} returns the index of the node that realize that value.
+ */
+
+public final class EliasFanoCumulativeOutdegreeList {
+ /** The number of lower bits. */
+ private final int l;
+ /** The mask used to round up returned {@link #currentIndex} values when {@link HyperLogLogCounterArray#m} &lt; 64, 0 otherwise. */
+ private final long roundingMask;
+ /** The upper-bits array. */
+ private final long[] upperBits;
+ /** The lower-bits list. */
+ private final LongBigList lowerBits;
+ /** The number of nodes, cached. */
+ private final long numNodes;
+ /** The 64-bit window. */
+ private long window;
+ /** The current word position in the list of upper bits. */
+ private int curr;
+ /** The index of the current prefix sum. */
+ private int currentIndex;
+ /** A zero-selection structure on {@link #upperBits}. */
+ private SimpleSelectZero simpleSelectZero;
+
+ /** Creates a cumulative outdegree list with no rounding mask.
+ *
+ * @param graph a graph.
+ */
+ public EliasFanoCumulativeOutdegreeList(final ImmutableGraph graph) {
+ this(graph, graph.numArcs());
+ }
+
+ /** Creates a cumulative outdegree list with no rounding mask.
+ *
+ * @param graph a graph.
+ * @param numArcs the number of arcs in the graph (this parameter can be useful as some {@link ImmutableGraph} implementations
+ * do not support {@link ImmutableGraph#numArcs()}).
+ */
+ public EliasFanoCumulativeOutdegreeList(final ImmutableGraph graph, final long numArcs) {
+ this(graph, numArcs, 0);
+ }
+
+ /** Creates a cumulative outdegree list with specified rounding mask.
+ *
+ * @param graph a graph.
+ * @param numArcs the number of arcs in the graph (this parameter can be useful as some {@link ImmutableGraph} implementations
+ * do not support {@link ImmutableGraph#numArcs()}).
+ * @param roundingMask a number of the form 2<sup><var>k</var></sup> &minus; 1. After each call to {@link #skipTo(long)},
+ * {@link #currentIndex()} is guaranteed to return a multiple of 2<sup><var>k</var></sup>, unless {@link #currentIndex()} is
+ * equal to the number of nodes in {@code graph}.
+ */
+ public EliasFanoCumulativeOutdegreeList(final ImmutableGraph graph, final long numArcs, final int roundingMask) {
+ if (roundingMask + 1 != Integer.highestOneBit(roundingMask + 1)) throw new IllegalArgumentException("Illegal rounding mask: " + roundingMask);
+ this.roundingMask = roundingMask;
+ final long length = numNodes = graph.numNodes();
+ final long upperBound = numArcs;
+ l = length == 0 ? 0 : Math.max(0, Fast.mostSignificantBit(upperBound / length));
+ final long lowerBitsMask = (1L << l) - 1;
+ final LongBigList lowerBitsList = LongArrayBitVector.getInstance().asLongBigList(l);
+ lowerBitsList.size(length);
+ final BitVector upperBitsVector = LongArrayBitVector.getInstance().length(length + (upperBound >>> l) + 1);
+ long v = 0;
+ for(int i = 0; i < length; i++) {
+ v += graph.outdegree(i);
+ if (v > upperBound) throw new IllegalArgumentException("Too large value: " + v + " > " + upperBound);
+ if (l != 0) lowerBitsList.set(i, v & lowerBitsMask);
+ upperBitsVector.set((v >>> l) + i);
+ }
+
+ lowerBits = lowerBitsList;
+ upperBits = upperBitsVector.bits();
+ simpleSelectZero = new SimpleSelectZero(upperBitsVector);
+ currentIndex = -1;
+ }
+
+ private long getNextUpperBits() {
+ assert currentIndex < numNodes;
+ while(window == 0) window = upperBits[++curr];
+ final long upperBits = curr * (long)Long.SIZE + Long.numberOfTrailingZeros(window) - currentIndex++;
+ window &= window - 1;
+ return upperBits;
+ }
+
+ /** Returns the index realizing the last value returned by {@link #skipTo(long)}, that is,
+ * an index <var>x</var> such that the sum of the outdegrees of the nodes of index (strictly) smaller
+ * than <var>x</var> is equal to the last value returned by {@link #skipTo(long)}.
+ *
+ * @return the index of the node realizing the last value returned by {@link #skipTo(long)}, or -1 if {@link #skipTo(long)} has never been called.
+ */
+ public int currentIndex() {
+ return currentIndex;
+ }
+
+ /** Returns the first value of the cumulative function of outdegrees that is larger than or equal to the provided bound and
+ * that respect the rounding mask provided at construction time.
+ *
+ * @param lowerBound a lower bound on the returned value.
+ * @return the first value of the cumulative function of outdegrees that is larger than or equal to {@code lowerBound} and
+ * that respect the rounding mask provided at construction time.
+ */
+
+ public long skipTo(final long lowerBound) {
+ final long zeroesToSkip = (lowerBound >>> l) - 1;
+ final long position = zeroesToSkip == -1 ? 0 : simpleSelectZero.selectZero(zeroesToSkip);
+ window = upperBits[curr = (int)(position / Long.SIZE)];
+ window &= -1L << (position % Long.SIZE);
+ assert zeroesToSkip == -1 || position - zeroesToSkip <= Integer.MAX_VALUE : position - zeroesToSkip;
+ currentIndex = (int)(zeroesToSkip == -1 ? 0 : position - zeroesToSkip);
+
+ for(;;) {
+ final long lower = lowerBits.getLong(currentIndex);
+ final long last = getNextUpperBits() << l | lower;
+ if (last >= lowerBound && (currentIndex & roundingMask) == 0 || currentIndex == numNodes) return last;
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/FourSweepIterativeFringeDiameter.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/FourSweepIterativeFringeDiameter.java
new file mode 100644
index 0000000..35dceca
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/FourSweepIterativeFringeDiameter.java
@@ -0,0 +1,285 @@
+package it.unimi.dsi.webgraph.algo;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph.LoadMethod;
+
+/** Computes the diameter of a <em>symmetric</em> (a.k.a&#46; <em>undirected</em>) graph.
+ *
+ * <p>This class implements a variant of the heuristic algorithm proposed by Pierluigi Crescenzi, Roberto Grossi, Michel Habib,
+ * Leonardo Lanzi and Andrea Marino in &ldquo;On computing the diameter of real-world undirected graphs&rdquo;, presented
+ * at the <i>Workshop on Graph Algorithms and Applications</i> (Zurich, July 3,2011-2014), which extends
+ * the double-sweep heuristic for bounding the diameter suggested by Cl&eacute;mence Magnien,
+ * Matthieu Latapy and Michel Habib in &ldquo;Fast computation of empirically tight bounds for the diameter of massive graphs&rdquo;,
+ * <i>J. Exp. Algorithmics</i>, 13:1.10:1&minus;1.10:9, ACM, 2009.
+ *
+ * <p>To understand why the following algorithm works, recall that the <em>eccentricity</em> of a node <var>x</var> is the
+ * maximum distance <i>d</i>(<var>x</var>, <var>y</var>). The minimum eccentricity over all nodes is called the <em>radius</em> of the graph, and
+ * a node with minimum eccentricity is called a <em>center</em>. The diameter is just the maximum eccentricity, so
+ * the diameter is bounded by twice the radius (but it might not be equal: a line with an even number of nodes is a counterexample).
+ * The following two observations are obvious:
+ * <ul>
+ * <li>the eccentricity of a node is a lower bound for the diameter;
+ * <li>given a node <var>x</var> and an integer <var>h</var>, 2<var>h</var> maximised with the
+ * eccentricities of all nodes at distance greater than <var>h</var> from <var>x</var> is an
+ * upper bound for the diameter.
+ * </ul>
+ *
+ * <p>The <em>double-sweep</em> algorithm is the standard algorithm to compute the diameter of a tree:
+ * we take a random node and locate using a breadth-first visit a
+ * farthest node <var>x</var>. Then, we perform a second breadth-first visit, computing the
+ * eccentricity of <var>x</var>, which turns out to be the diameter of the tree.
+ * When applied to a general graph, the double-sweep algorithm provides a good lower bound (in general, whenever we perform
+ * a breadth-first visit we use the resulting eccentricity to improve the current lower bound).
+ * With some (usually few) additional visits, the <em>iterative
+ * fringe</em> algorithm often makes it possible to make the bounds match.
+ *
+ * <p>More precisely, after the second visit we find a node <var>c</var> that is
+ * halfway between <var>x</var> and a node farthest from <var>x</var>. The
+ * node <var>c</var> is a tentative center of the graph,
+ * and it certainly is if the graph is a tree.
+ *
+ * <p>We then perform a breadth-first visit from <var>c</var> and compute its eccentricity <var>h</var>, obtaining an upper bound
+ * 2<var>h</var> for the diameter.
+ *
+ * <p>In case our upper bound does not match the lower bound, we compute the eccentricities of the <em>fringe</em>, that is, the set
+ * of nodes at distance <var>h</var> from <var>c</var>, by performing a breadth-first visit from each node in the fringe. At each
+ * eccentricity computed, we update our lower bound, and stop if it matches our current upper bound. Finally, when the fringe is exhausted,
+ * assuming <var>M</var> is the maximum of the eccentricities computed, max(2<var>(h</var>&nbsp;&minus;&nbsp;1),&nbsp;<var>M</var>)
+ * is an improved upper bound for the diameter. We iterate the procedure with the new fringe
+ * (nodes at distance <var>h</var>&nbsp;&minus;&nbsp;1), and so on, until the lower and upper bounds do match.
+ *
+ * <p>The description above is a bit simplified: after finding <var>c</var>, we actually
+ * do a double sweep again starting from <var>c</var> and update <var>c</var> accordingly. This
+ * four-sweep procedure often improves the quality (e.g., reduces the eccentricity) of <var>c</var>.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class uses an instance of {@link ParallelBreadthFirstVisit} to ensure a high degree of parallelism (see its
+ * documentation for memory requirements).
+ *
+ * @deprecated Superseded by {@link SumSweepDirectedDiameterRadius}/{@link SumSweepUndirectedDiameterRadius}.
+ */
+
+@Deprecated
+public class FourSweepIterativeFringeDiameter {
+ private static final Logger LOGGER = LoggerFactory.getLogger(FourSweepIterativeFringeDiameter.class);
+
+ /** Checks that we are always visiting the same component of the same size and possibly logs a warning or throws an exception.
+ *
+ * @param visit the current visit.
+ * @param componentSize the size of the visited component, or 0 if unknown.
+ * @return the size of the visited component.
+ */
+
+ private static int componentSize(final ParallelBreadthFirstVisit visit, int componentSize) {
+ if (visit.queue.size() != visit.graph.numNodes()) {
+ if (componentSize == -1) {
+ componentSize = visit.queue.size();
+ LOGGER.warn("The graph is not connected: computing the diameter of a component of " + componentSize + " < " + visit.graph.numNodes() + " nodes");
+ }
+ else if (componentSize != visit.queue.size()) throw new IllegalStateException("Queue size (" + visit.queue.size() + ") is different from component size (" + componentSize + "): maybe the graph is not symmetric.");
+ }
+
+ return componentSize;
+ }
+
+ /** Computes the diameter of a symmetric graph.
+ *
+ * @param symGraph a symmetric graph.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param pl a progress logger, or <code>null</code>.
+ * @param seed a seed for generating random starting points.
+ * @return the diameter.
+ */
+ public static int run(final ImmutableGraph symGraph, final int threads, final ProgressLogger pl, final long seed) {
+ final ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(symGraph, threads, true, pl);
+ final AtomicIntegerArray parent = visit.marker;
+ final XoRoShiRo128PlusRandom random = new XoRoShiRo128PlusRandom(seed);
+ final int n = symGraph.numNodes();
+ int lowerBound = 0, upperBound = n - 1, componentSize = -1;
+
+ while(lowerBound < upperBound) {
+ if (pl != null) pl.logger().info("New round of bound refinement... [" + lowerBound + ".." + upperBound + "]");
+
+ // After the first iteration, we pick a node from the visit queue
+ visit.clear();
+ visit.visit(visit.queue.isEmpty() ? random.nextInt(n) : visit.queue.getInt(random.nextInt(visit.queue.size())), componentSize);
+ int border = visit.nodeAtMaxDistance();
+ componentSize = componentSize(visit, componentSize);
+ lowerBound = Math.max(visit.maxDistance(), lowerBound);
+ upperBound = Math.min(upperBound, 2 * visit.maxDistance());
+
+ if (pl != null) pl.logger().info("After visit from random node: [" + lowerBound + ".." + upperBound + "]");
+ if (lowerBound == upperBound) break;
+
+ visit.clear();
+ visit.visit(border, componentSize);
+ border = visit.nodeAtMaxDistance();
+ componentSize = componentSize(visit, componentSize);
+ lowerBound = Math.max(visit.maxDistance(), lowerBound);
+ upperBound = Math.min(upperBound, 2 * visit.maxDistance());
+
+ if (pl != null) pl.logger().info("After first double sweep: [" + lowerBound + ".." + upperBound + "]");
+ if (lowerBound == upperBound) break;
+
+ // Find first tentative center of the graph (certainly the center if it is a tree).
+ int center = border;
+ for(int i = visit.maxDistance() / 2; i-- != 0;) center = parent.get(center);
+
+ // We now visit from the tentative center.
+ visit.clear();
+ visit.visit(center, componentSize);
+ border = visit.nodeAtMaxDistance();
+ componentSize = componentSize(visit, componentSize);
+ lowerBound = Math.max(visit.maxDistance(), lowerBound);
+ upperBound = Math.min(upperBound, 2 * visit.maxDistance());
+
+ if (pl != null) pl.logger().info("After visit from first tentative center (node " + center + "): [" + lowerBound + ".." + upperBound + "]");
+ if (lowerBound == upperBound) break;
+
+ // Last sweep
+ visit.clear();
+ visit.visit(border);
+ border = visit.nodeAtMaxDistance();
+ componentSize = componentSize(visit, componentSize);
+ lowerBound = Math.max(visit.maxDistance(), lowerBound);
+ upperBound = Math.min(upperBound, 2 * visit.maxDistance());
+
+ if (pl != null) pl.logger().info("After second double sweep: [" + lowerBound + ".." + upperBound + "]");
+ if (lowerBound == upperBound) break;
+
+ // Find new (and hopefully improved) center.
+ center = border;
+ for(int i = visit.maxDistance() / 2; i-- != 0;) center = parent.get(center);
+
+ // We now visit from the new center.
+ visit.clear();
+ visit.visit(center, componentSize);
+ componentSize = componentSize(visit, componentSize);
+ lowerBound = Math.max(visit.maxDistance(), lowerBound);
+ upperBound = Math.min(upperBound, 2 * visit.maxDistance());
+
+ if (pl != null) pl.logger().info("After visit from new center (node " + center + "): [" + lowerBound + ".." + upperBound + "]");
+ if (lowerBound == upperBound) break;
+
+ // Copy cutpoints and queue as they are needed to visit incrementally the fringe (this stuff could go on disk, actually).
+ final IntArrayList cutPoints = visit.cutPoints.clone();
+ final IntArrayList queue = visit.queue.clone();
+
+ final ProgressLogger globalProgressLogger = pl == null ? null : new ProgressLogger(pl.logger(), pl.logInterval, TimeUnit.MILLISECONDS, "visits");
+ if (pl != null) {
+ pl.logger().debug("Cutpoints: " + cutPoints);
+ globalProgressLogger.start("Starting visits...");
+ }
+
+ /* We now incrementally remove nodes at decreasing distance d from the center,
+ * keeping track of the maximum eccentricity maxEcc of the removed nodes.
+ * max(maxEcc, 2(d - 1)) is obviously an upper bound for the diameter. */
+ int maxEcc = 0;
+ for(int d = visit.maxDistance(); d > 0 && lowerBound < upperBound; d--) {
+ if (pl != null) {
+ globalProgressLogger.expectedUpdates = pl.count + cutPoints.getInt(d + 1) - cutPoints.getInt(lowerBound / 2 + 1);
+ pl.logger().info("Examining " + (cutPoints.getInt(d + 1) - cutPoints.getInt(d)) + " nodes at distance " + d + " (at most " + globalProgressLogger.expectedUpdates + " visits to go)...");
+ }
+ for(int pos = cutPoints.getInt(d); pos < cutPoints.getInt(d + 1); pos++) {
+ final int x = queue.getInt(pos);
+ visit.clear();
+ visit.visit(x);
+ componentSize = componentSize(visit, componentSize);
+ maxEcc = Math.max(maxEcc, visit.maxDistance());
+ lowerBound = Math.max(lowerBound, maxEcc);
+ if (lowerBound == upperBound) return lowerBound;
+ }
+
+ upperBound = Math.max(maxEcc, 2 * (d - 1));
+ if (pl != null) {
+ globalProgressLogger.updateAndDisplay(cutPoints.getInt(d + 1) - cutPoints.getInt(d));
+ pl.logger().info("After enlarging fringe: [" + lowerBound + ".." + upperBound + "]");
+ }
+ }
+
+ if (globalProgressLogger != null) globalProgressLogger.done();
+ }
+ return lowerBound;
+ }
+
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException, IOException, ClassNotFoundException, InstantiationException {
+ final SimpleJSAP jsap = new SimpleJSAP(FourSweepIterativeFringeDiameter.class.getName(), "Computes the diamater of a symmetric graph using Magnien-Latay-Habib's technique.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The basename is rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new Switch("mapped", 'm', "mapped", "Do not load the graph in main memory, but rather memory-map it."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ final String basename = jsapResult.getString("basename");
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean mapped = jsapResult.getBoolean("mapped");
+ final int threads = jsapResult.getInt("threads");
+ final ImmutableGraph graph;
+
+ if (graphClass != null) {
+ if (spec) {
+ System.err.println("Options --graph-class and --spec are incompatible");
+ System.exit(1);
+ return; // Just to avoid spurious errors about graph not being initialised.
+ }
+ else graph = (ImmutableGraph)graphClass.getMethod(mapped ? LoadMethod.MAPPED.toMethod() : LoadMethod.STANDARD.toMethod(), CharSequence.class).invoke(null, basename);
+ }
+ else {
+ if (!spec) graph = mapped ? ImmutableGraph.loadMapped(basename, pl) : ImmutableGraph.load(basename, pl);
+ else graph = ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ }
+
+ System.out.println(run(graph, threads, new ProgressLogger(LOGGER), Util.randomSeed()));
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/GeometricCentralities.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/GeometricCentralities.java
new file mode 100644
index 0000000..a2ab920
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/GeometricCentralities.java
@@ -0,0 +1,281 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.algo.TopKGeometricCentrality.Centrality;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ExecutorCompletionService;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Computes exactly a set of <em>positive</em> geometric centralitites (more precisely, closeness, Lin's, harmonic and exponential centrality)
+ * and the number of reachable nodes using multiple parallel breadth-first visits.
+ * Terminal nodes will have closeness centrality equal to zero and Lin's centrality equal to one.
+ * A survey about geometric centralities can be found
+ * &ldquo;<a href="http://vigna.di.unimi.it/BoVAC">Axioms for centrality</a>&rdquo;,
+ * by Paolo Boldi and Sebastiano Vigna, <i>Internet Math.</i>, 10(3-4):222&minus;262, 2014. Explicit
+ * definitions can be found {@linkplain Centrality here}.
+ *
+ * <p>Note that usually one is interested in the <em>negative</em> version of a centrality measure, that is, the version
+ * that depends on the <em>incoming</em> arcs. This class can compute only <em>positive</em> centralities: if you are
+ * interested (as it usually happens) in the negative version, you must pass to this class the <em>transpose</em> of the graph.
+ *
+ * <p>Every visit is independent and is carried out by a separate thread. The only contention point
+ * is the update of the array accumulating the betweenness score, which is negligible. The downside is
+ * that running on <var>k</var> cores requires approximately <var>k</var> times the memory of the
+ * sequential algorithm, as only the graph and the betweenness array will be shared.
+ *
+ * <p>To use this class you first create an instance, and then invoke {@link #compute()}.
+ * After that, you can peek at the fields {@link #closeness}, {@link #lin}, {@link #harmonic}, {@link #exponential} and {@link #reachable}.
+ */
+
+public class GeometricCentralities {
+ private final static Logger LOGGER = LoggerFactory.getLogger(GeometricCentralities.class);
+ /** The default value for {@link #alpha}. */
+ public static final double DEFAULT_ALPHA = 0.5;
+
+ /** The graph under examination. */
+ private final ImmutableGraph graph;
+ /** Harmonic centrality. */
+ public final double[] harmonic;
+ /** Closeness centrality. */
+ public final double[] closeness;
+ /** Lin's centrality. */
+ public final double[] lin;
+ /** Exponential centrality. */
+ public final double[] exponential;
+ /** The &alpha; parameter for exponential centrality: you can modify this field before calling {@link #compute()} (its default value is {@value #DEFAULT_ALPHA}). */
+ public double alpha;
+ /** Number of reachable nodes. */
+ public final long[] reachable;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The next node to be visited. */
+ protected final AtomicInteger nextNode;
+ /** Whether to stop abruptly the visiting process. */
+ protected volatile boolean stop;
+
+ /** Creates a new class for computing positive geometric centralities and the number reachable nodes.
+ *
+ * @param graph a graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param pl a progress logger, or {@code null}.
+ */
+ public GeometricCentralities(final ImmutableGraph graph, final int requestedThreads, final ProgressLogger pl) {
+ this.pl = pl;
+ this.graph = graph;
+ this.harmonic = new double[graph.numNodes()];
+ this.closeness = new double[graph.numNodes()];
+ this.reachable = new long[graph.numNodes()];
+ this.exponential = new double[graph.numNodes()];
+ this.alpha = DEFAULT_ALPHA;
+ this.lin = new double[graph.numNodes()];
+ this.nextNode = new AtomicInteger();
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ }
+
+ /** Creates a new class for computing positive geometric centralities and the number of reachable nodes, using as many threads as
+ * the number of available processors.
+ *
+ * @param graph a graph.
+ * @param pl a progress logger, or {@code null}.
+ */
+ public GeometricCentralities(final ImmutableGraph graph, final ProgressLogger pl) {
+ this(graph, 0, pl);
+ }
+
+ /** Creates a new class for computing positive geometric centralities and the number of reachable nodes.
+ *
+ * @param graph a graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ */
+ public GeometricCentralities(final ImmutableGraph graph, final int requestedThreads) {
+ this(graph, 1, null);
+ }
+
+ /** Creates a new class for computing positive geometric centralities and the number of reachable nodes, using as many threads as
+ * the number of available processors.
+ *
+ * @param graph a graph.
+ */
+ public GeometricCentralities(final ImmutableGraph graph) {
+ this(graph, 0);
+ }
+
+ private final class IterationThread implements Callable<Void> {
+ /** The queue of visited nodes. */
+ private final IntArrayFIFOQueue queue;
+ /** The array containing the distance of each node from the current source (or -1 if the node has not yet been reached by the visit). */
+ private final int[] distance;
+
+ private IterationThread() {
+ this.distance = new int[graph.numNodes()];
+ this.queue = new IntArrayFIFOQueue();
+ }
+
+ @Override
+ public Void call() {
+ // We cache frequently used fields.
+ final int[] distance = this.distance;
+ final IntArrayFIFOQueue queue = this.queue;
+ final ImmutableGraph graph = GeometricCentralities.this.graph.copy();
+ final double base = GeometricCentralities.this.alpha;
+
+ for(;;) {
+ final int curr = nextNode.getAndIncrement();
+ if (GeometricCentralities.this.stop || curr >= graph.numNodes()) return null;
+ queue.clear();
+ queue.enqueue(curr);
+ Arrays.fill(distance, -1);
+ distance[curr] = 0;
+ int reachable = 0;
+
+ while(! queue.isEmpty()) {
+ final int node = queue.dequeueInt();
+ reachable++;
+ final int d = distance[node] + 1;
+ final double hd = 1. / d;
+ final double ed = Math.pow(base, d);
+ final LazyIntIterator successors = graph.successors(node);
+ for(int s; (s = successors.nextInt()) != -1;) {
+ if (distance[s] == -1) {
+ queue.enqueue(s);
+ distance[s] = d;
+ closeness[curr] += d;
+ harmonic[curr] += hd;
+ exponential[curr] += ed;
+ }
+ }
+ }
+
+ if (GeometricCentralities.this.pl != null)
+ synchronized (GeometricCentralities.this.pl) {
+ GeometricCentralities.this.pl.update();
+ }
+
+ if (closeness[curr] == 0) lin[curr] = 1; // Terminal node
+ else {
+ closeness[curr] = 1 / closeness[curr];
+ lin[curr] = (double)reachable * reachable * closeness[curr];
+ }
+
+ GeometricCentralities.this.reachable[curr] = reachable;
+ }
+ }
+ }
+
+
+ /** Computes geometric centralities and the number of reachable nodes.
+ * Results can be found in {@link GeometricCentralities#closeness}, {@link GeometricCentralities#lin},
+ * {@link GeometricCentralities#harmonic}, {@link GeometricCentralities#exponential} and {@link GeometricCentralities#reachable}. */
+ public void compute() throws InterruptedException {
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = 0; i < thread.length; i++) thread[i] = new IterationThread();
+
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.expectedUpdates = graph.numNodes();
+ pl.itemsName = "nodes";
+ }
+
+ final ExecutorService executorService = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
+ final ExecutorCompletionService<Void> executorCompletionService = new ExecutorCompletionService<>(executorService);
+
+ for(int i = thread.length; i-- != 0;) executorCompletionService.submit(thread[i]);
+
+ try {
+ for(int i = thread.length; i-- != 0;) executorCompletionService.take().get();
+ }
+ catch(ExecutionException e) {
+ stop = true;
+ Throwable cause = e.getCause();
+ throw cause instanceof RuntimeException ? (RuntimeException)cause : new RuntimeException(cause.getMessage(), cause);
+ }
+ finally {
+ executorService.shutdown();
+ }
+
+ if (pl != null) pl.done();
+ }
+
+
+ public static void main(final String[] arg) throws IOException, JSAPException, InterruptedException {
+
+ SimpleJSAP jsap = new SimpleJSAP(GeometricCentralities.class.getName(), "Computes positive centralities of a graph using multiple parallel breadth-first visits.\n\nPlease note that to compute negative centralities on directed graphs (which is usually what you want) you have to compute positive centralities on the transpose.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("closenessFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where closeness centrality scores (doubles in binary form) will be stored."),
+ new UnflaggedOption("linFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where Lin's centrality scores (doubles in binary form) will be stored."),
+ new UnflaggedOption("harmonicFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where harmonic centrality scores (doubles in binary form) will be stored."),
+ new UnflaggedOption("exponentialFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where exponential centrality scores (doubles in binary form) will be stored."),
+ new UnflaggedOption("reachableFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the number of reachable nodes (longs in binary form) will be stored.")
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final int threads = jsapResult.getInt("threads");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+ progressLogger.displayFreeMemory = true;
+ progressLogger.displayLocalSpeed = true;
+
+ ImmutableGraph graph = mapped? ImmutableGraph.loadMapped(graphBasename, progressLogger) : ImmutableGraph.load(graphBasename, progressLogger);
+ if (jsapResult.userSpecified("expand")) graph = new ArrayListMutableGraph(graph).immutableView();
+
+ GeometricCentralities centralities = new GeometricCentralities(graph, threads, progressLogger);
+ centralities.compute();
+
+ BinIO.storeDoubles(centralities.closeness, jsapResult.getString("closenessFilename"));
+ BinIO.storeDoubles(centralities.lin, jsapResult.getString("linFilename"));
+ BinIO.storeDoubles(centralities.harmonic, jsapResult.getString("harmonicFilename"));
+ BinIO.storeDoubles(centralities.exponential, jsapResult.getString("exponentialFilename"));
+ BinIO.storeLongs(centralities.reachable, jsapResult.getString("reachableFilename"));
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/HyperBall.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/HyperBall.java
new file mode 100644
index 0000000..ba37eb7
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/HyperBall.java
@@ -0,0 +1,1514 @@
+package it.unimi.dsi.webgraph.algo;
+
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.NotSerializableException;
+import java.io.ObjectOutputStream;
+import java.io.PrintStream;
+import java.io.RandomAccessFile;
+import java.io.Serializable;
+import java.lang.reflect.InvocationTargetException;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.Arrays;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.ReentrantLock;
+
+import org.apache.commons.lang3.BooleanUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2010-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.fastutil.ints.AbstractInt2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import it.unimi.dsi.fastutil.ints.IntSets;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.io.SafelyCloseable;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+import it.unimi.dsi.util.KahanSummation;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandomGenerator;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.Transform;
+
+/** <p>Computes an approximation of the neighbourhood function, of the size of the reachable sets,
+ * and of (discounted) positive geometric centralities of a graph using HyperBall.
+ *
+ * <p>HyperBall is an algorithm computing by dynamic programming an approximation
+ * of the sizes of the balls of growing radius around the nodes of a graph. Starting from
+ * these data, it can approximate the <em>neighbourhood function</em> of a graph, that is, the function returning
+ * for each <var>t</var> the number of pairs of nodes at distance at most <var>t</var>,
+ * the number of nodes reachable from each node, Bavelas's closeness centrality, Lin's index, and
+ * <em>harmonic centrality</em> (studied by Paolo Boldi and Sebastiano Vigna in &ldquo;<a href ="http://vigna.di.unimi.it/papers.php#BoVAC">Axioms for Centrality</a>&rdquo;, <i>Internet Math.</i>,
+ * 10(3-4):222&minus;262, 2014).
+ * HyperBall can also compute <em>discounted centralities</em>, in which the <em>discount</em> assigned to a node is some
+ * specified function of its distance. All centralities are computed in their <em>positive</em> version (i.e.,
+ * using distance <em>from</em> the source: see below how to compute the more usual, and useful, <em>negative</em> version).
+ *
+ * <p>HyperBall has been described by Paolo Boldi and Sebastiano Vigna in
+ * &ldquo;In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond&rdquo;,
+ * <i>Proc. of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW 2013)</i>, IEEE, 2013,
+ * and it is a generalization of the method described in &ldquo;HyperANF: Approximating the Neighbourhood Function of Very Large Graphs
+ * on a Budget&rdquo;, by Paolo Boldi, Marco Rosa and Sebastiano Vigna,
+ * <i>Proceedings of the 20th international conference on World Wide Web</i>, pages 625&minus;634, ACM, (2011).
+ *
+ * <p>Incidentally, HyperBall (actually, HyperANF) has been used to show that Facebook has just <a href="http://vigna.dsi.unimi.it/papers.php#BBRFDS">four degrees of separation</a>.
+ *
+ * <p>At step <var>t</var>, for each node we (approximately) keep track (using {@linkplain HyperLogLogCounterArray HyperLogLog counters})
+ * of the set of nodes at distance at most <var>t</var>. At each iteration, the sets associated with the successors of each node are merged,
+ * thus obtaining the new sets. A crucial component in making this process efficient and scalable is the usage of
+ * <em>broadword programming</em> to implement the join (merge) phase, which requires maximising in parallel the list of registers associated with
+ * each successor (the implementation is geared towards 64-bits processors).
+ *
+ * <p>Using the approximate sets, for each <var>t</var> we estimate the number of pairs of nodes (<var>x</var>,<var>y</var>) such
+ * that the distance from <var>x</var> to <var>y</var> is at most <var>t</var>. Since during the computation we are also
+ * in possession of the number of nodes at distance <var>t</var> &minus; 1, we can also perform computations
+ * using the number of nodes at distance <em>exactly</em> <var>t</var> (e.g., centralities).
+ *
+ * <p>To use this class, you must first create an instance.
+ * Then, you call {@link #init()} (once) and then {@link #iterate()} as much as needed (you can init/iterate several times, if you want so).
+ * A {@linkplain #run(long, double) commodity method} will do everything for you.
+ * Finally, you <strong>must</strong> {@link #close()} the instance. The method {@link #modified()} will tell you whether the internal state of
+ * the algorithm has changed.
+ *
+ * <p>If you additionally pass to the constructor (or on the command line) the <em>transpose</em> of your graph (you can compute it using {@link Transform#transposeOffline(ImmutableGraph,int)}
+ * or {@link Transform#transposeOffline(ImmutableGraph, int)}), when three quarters of the nodes stop changing their value
+ * HyperBall will switch to a <em>systolic</em> computation: using the transpose, when a node changes it will signal back
+ * to its predecessors that at the next iteration they could change. At the next scan, only the successors of
+ * signalled nodes will be scanned. In particular,
+ * when a very small number of nodes is modified by an iteration, HyperBall will switch to a systolic <em>local</em> mode,
+ * in which all information about modified nodes is kept in (traditional) dictionaries, rather than being represented as arrays of booleans.
+ * This strategy makes the last phases of the computation orders of magnitude faster, and makes
+ * in practice the running time of HyperBall proportional to the theoretical bound
+ * <i>O</i>(<var>m</var> log <var>n</var>), where <var>n</var>
+ * is the number of nodes and <var>m</var> is the number of the arcs of the graph. Note that
+ * graphs with a large diameter require a correspondingly large number of iterations, and these iterations will have to
+ * pass over all nodes if you do not provide the tranpose.
+ *
+ * <p>Deciding when to stop iterating is a rather delicate issue. The only safe way is to iterate until {@link #modified()} is zero,
+ * and systolic (local) computation makes this goal easily attainable.
+ * However, in some cases one can assume that the graph is not pathological, and stop when the relative increment of the number of pairs goes below
+ * some threshold.
+ *
+ * <h2>Computing Centralities</h2>
+ *
+ * <p>Note that usually one is interested in the <em>negative</em> version of a centrality measure, that is, the version
+ * that depends on the <em>incoming</em> arcs. HyperBall can compute only <em>positive</em> centralities: if you are
+ * interested (as it usually happens) in the negative version, you must pass to HyperBall the <em>transpose</em> of the graph
+ * (and if you want to run in systolic mode, the original graph, which is the transpose of the transpose). Note that the
+ * neighbourhood function of the transpose is identical to the neighbourhood function of the original graph, so the exchange
+ * does not alter its computation.
+ *
+ * <h2>Node weights</h2>
+ *
+ * <p>HyperBall can manage to a certain extent a notion of <em>node weight</em> in its computation of centralities. Weights must
+ * be nonnegative integers, and the initialization phase requires generating a random integer for each unit of overall
+ * weight, as weight are simulated by loading the counter of a node with multiple elements. Combining this feature
+ * with discounts, one can compute <em>discounted-gain centralities</em> as defined in the HyperBall paper.
+ *
+ * <h2>Configuring the JVM</h2>
+ *
+ * <p>HyperBall computations go against all basic assumptions of Java garbage collection. It is thus
+ * essential that you reconfigure your JVM properly. A good starting point is the following command line:
+ * <pre>
+ * java -server -Xss256K -Xms100G -XX:PretenureSizeThreshold=512M -XX:MaxNewSize=4G \
+ * -XX:+UseNUMA -XX:+UseTLAB -XX:+ResizeTLAB \
+ * -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=99 -XX:+UseCMSInitiatingOccupancyOnly \
+ * -verbose:gc -Xloggc:gc.log ...
+ * </pre>
+ *
+ * <ul>
+ * <li><code>-Xss256K</code> reduces the stack memory used by each thread.
+ * <li><code>-Xms100G</code> size the heap: the more memory, the more counter per registers
+ * you can use (the amount, of course, depends on your hardware); please note that we set the
+ * <em>starting</em> heap size as expansion of large heaps is very expensive.
+ * <li><code>-XX:PretenureSizeThreshold=512M</code> forces the allocation of registers directly into the old generation.
+ * <li><code>-XX:MaxNewSize=4G</code> leaves almost all memory for the old generation.
+ * <li><code>-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=99 -XX:+UseCMSInitiatingOccupancyOnly</code>
+ * set the concurrent garbage collector, and impose that no collection is performed until 99% of the permanent
+ * generation is filled.
+ * <li><code>-XX:+UseNUMA -XX:+UseTLAB -XX:+ResizeTLAB</code> usually improve performance, but your mileage may vary.
+ * </ul>
+ * <p>Check the garbage collector logs (<code>gc.log</code>) to be sure that your
+ * minor and major collections are very infrequent (as they should be).
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>To use HyperBall effectively, you should aim at filling a large percentage of the available core memory. This requires,
+ * of course, to size properly the heap, but also to configure some parameters.
+ *
+ * <p>Most of the memory goes into storing HyperLogLog registers. By tuning the number of registers per counter, you can
+ * modify the memory allocated for them. The amount of memory is logged, and you should check that the number of registers you
+ * chose almost fills up the heap memory you allocated, possibly leaving space for the graph(s) (but read below).
+ * Note that you can only choose a number of registers per counter that is
+ * a power of two, so your latitude in adjusting the memory used for registers is somewhat limited.
+ *
+ * <p>If you have little memory, this class can perform <em>external</em> computations: instead of keeping in core memory
+ * an old and a new copy of the counters, it can dump on disk an <em>update list</em> containing pairs &lt;<var>node</var>,&nbsp;<var>counter</var>&gt;.
+ * At the end of an iteration, the update list is loaded and applied to the counters in memory.
+ * The process is of course slower, but the core memory used is halved.
+ *
+ * <p>Then, some memory is necessary to load the graph (and possibly its tranpose). We suggest to check the offline
+ * option, which will map the graph into memory, rather than loading it. If you map the graph into memory, take care of
+ * leaving some free memory, beside that allocated for the heap, as the operating system will need some space to buffer
+ * the memory-mapped graph(s).
+ *
+ * <p>If there are several available cores, the runs of {@link #iterate()} will be <em>decomposed</em> into relatively
+ * small tasks (small blocks of nodes) and each task will be assigned to the first available core. Since all tasks are completely
+ * independent, this behaviour ensures a very high degree of parallelism. Be careful, however, because this feature requires a graph with
+ * a reasonably fast random access (e.g., in the case of a {@link BVGraph}, short reference chains), as many
+ * calls to {@link ImmutableGraph#nodeIterator(int)} will be made. The <em>granularity</em> of the decomposition
+ * is the number of nodes assigned to each task.
+ *
+ * <p>In any case, when attacking very large graphs (in particular, in external mode) some system tuning (e.g.,
+ * increasing the filesystem commit time) is a good idea. Also experimenting with granularity and buffer sizes
+ * can be useful. Smaller buffers reduce the waits on I/O calls, but increase the time spent in disk seeks.
+ * Large buffers improve I/O, but they use a lot of memory. The best possible setup is the one in which
+ * the cores are 100% busy during the graph scan, and the I/O time
+ * logged at the end of a scan is roughly equal to the time that is necessary to reload the counters from disk:
+ * in such a case, essentially, you are computing as fast as possible.
+ *
+ * @author Sebastiano Vigna
+ * @author Paolo Boldi
+ * @author Marco Rosa
+ */
+
+public class HyperBall extends HyperLogLogCounterArray implements SafelyCloseable {
+ private static final Logger LOGGER = LoggerFactory.getLogger(HyperBall.class);
+ public static final boolean ASSERTS = false;
+ private static final long serialVersionUID = 1L;
+
+ /** An abstract discount function is a facility to implement a discount function (so that only
+ * the {@link #get(int)} method must be actually implemented).
+ *
+ * <p>Note that by contract {@link #get(int)} will never be called with argument (i.e., distance) zero.
+ */
+ public static abstract class AbstractDiscountFunction extends AbstractInt2DoubleFunction {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public int size() { return -1; }
+ @Override
+ public boolean containsKey(int key) { return true; }
+ };
+
+ public static Int2DoubleFunction INV_SQUARE_DISCOUNT = new AbstractDiscountFunction() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int distance) { return 1. / ((long)distance * distance); }
+ };
+
+ public static Int2DoubleFunction INV_LOG_DISCOUNT = new AbstractDiscountFunction() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int distance) { return 1 / Fast.log2(distance + 1); }
+ };
+
+ /** The default granularity of a task. */
+ public static final int DEFAULT_GRANULARITY = 16 * 1024;
+ /** The default size of a buffer in bytes. */
+ public static final int DEFAULT_BUFFER_SIZE = 4 * 1024 * 1024;
+ /** True if we have the transpose graph. */
+ protected final boolean gotTranspose;
+ /** An array of nonnegative node weights, or {@code null}. */
+ protected final int[] weight;
+ /** True if we started a systolic computation. */
+ protected boolean systolic;
+ /** True if we are preparing a local computation (we are {@link #systolic} and less than 1% nodes were modified). */
+ protected boolean preLocal;
+ /** True if we started a local computation. */
+ protected boolean local;
+ /** Whether the sum of distances from each node (inverse of <strong>positive</strong> closeness centrality) should be computed; if false, {@link #sumOfDistances} is <code>null</code>. */
+ protected final boolean doSumOfDistances;
+ /** Whether the sum of inverse distances from each node (<strong>positive</strong> harmonic centrality) should be computed; if false, {@link #sumOfInverseDistances} is <code>null</code>. */
+ protected boolean doSumOfInverseDistances;
+ /** The neighbourhood function, if requested. */
+ public final DoubleArrayList neighbourhoodFunction;
+ /** The sum of the distances from every given node, if requested. */
+ public final float[] sumOfDistances;
+ /** The sum of inverse distances from each given node, if requested. */
+ public final float[] sumOfInverseDistances;
+ /** A number of discounted centralities to be computed, possibly none. */
+ public final Int2DoubleFunction[] discountFunction;
+ /** The overall discounted centrality, for every {@link #discountFunction}. */
+ public final float[][] discountedCentrality;
+ /** The number of nodes of the graph, cached. */
+ protected final int numNodes;
+ /** The number of arcs of the graph, cached. */
+ protected long numArcs;
+ /** The square of {@link #numNodes}, cached. */
+ protected final double squareNumNodes;
+ /** The number of cores used in the computation. */
+ protected final int numberOfThreads;
+ /** The size of an I/O buffer, in counters. */
+ protected final int bufferSize;
+ /** The number of actually scanned nodes per task in a multithreaded environment. <strong>Must</strong> be a multiple of {@link Long#SIZE}. */
+ protected final int granularity;
+ /** The number of nodes per task (obtained by adapting {@link #granularity} to the current ratio of modified nodes). <strong>Must</strong> be a multiple of {@link Long#SIZE}. */
+ protected long adaptiveGranularity;
+ /** The value computed by the last iteration. */
+ protected double last;
+ /** The value computed by the current iteration. */
+ protected double current;
+ /** The current iteration. */
+ protected int iteration;
+ /** If {@link #external} is true, the name of the temporary file that will be used to write the update list. */
+ protected final File updateFile;
+ /** If {@link #external} is true, a file channel used to write to the update list. */
+ protected final FileChannel fileChannel;
+ /** If {@link #external} is true, the random-access file underlying {@link #fileChannel}. */
+ protected RandomAccessFile randomAccessFile;
+ /** The cumulative list of outdegrees. */
+ protected final EliasFanoCumulativeOutdegreeList cumulativeOutdegrees;
+ /** A progress logger, or <code>null</code>. */
+ protected final ProgressLogger pl;
+ /** The lock protecting all critical sections. */
+ protected final ReentrantLock lock;
+ /** A condition that is notified when all iteration threads are waiting to be started. */
+ protected final Condition allWaiting;
+ /** The condition on which all iteration threads wait before starting a new phase. */
+ protected final Condition start;
+ /** The current computation phase. */
+ public int phase;
+ /** Whether this approximator has been already closed. */
+ protected boolean closed;
+ /** The threads performing the computation. */
+ protected final IterationThread thread[];
+ /** An atomic integer keeping track of the number of node processed so far. */
+ protected final AtomicInteger nodes;
+ /** An atomic integer keeping track of the number of arcs processed so far. */
+ protected final AtomicLong arcs;
+ /** A variable used to wait for all threads to complete their iteration. */
+ protected volatile int aliveThreads;
+ /** True if the computation is over. */
+ protected volatile boolean completed;
+ /** Total number of write operation performed on {@link #fileChannel}. */
+ protected volatile long numberOfWrites;
+ /** Total wait time in milliseconds of I/O activity on {@link #fileChannel}. */
+ protected volatile long totalIoMillis;
+ /** The starting node of the next chunk of nodes to be processed. */
+ protected int nextNode;
+ /** The number of arcs before {@link #nextNode}. */
+ protected long nextArcs;
+ /** The number of register modified by the last call to {@link #iterate()}. */
+ protected final AtomicInteger modified;
+ /** Counts the number of unwritten entries when {@link #external} is true, or
+ * the number of counters that did not change their value. */
+ protected final AtomicInteger unwritten;
+ /** The relative increment of the neighbourhood function for the last iteration. */
+ protected double relativeIncrement;
+ /** Whether we should used an update list on disk, instead of computing results in core memory. */
+ protected boolean external;
+ /** If {@link #external} is false, the arrays where results are stored. */
+ protected final long[][] resultBits;
+ /** If {@link #external} is false, a {@link #registerSize}-bit views of {@link #resultBits}. */
+ protected final LongBigList resultRegisters[];
+ /** For each counter, whether it has changed its value. We use an array of boolean (instead of a {@link LongArrayBitVector}) just for access speed. */
+ protected boolean[] modifiedCounter;
+ /** For each newly computed counter, whether it has changed its value. {@link #modifiedCounter}
+ * will be updated with the content of this bit vector by the end of the iteration. */
+ protected boolean[] modifiedResultCounter;
+ /** For each counter, whether it has changed its value. We use an array of boolean (instead of a {@link LongArrayBitVector}) just for access speed. */
+ protected boolean[] nextMustBeChecked;
+ /** For each newly computed counter, whether it has changed its value. {@link #modifiedCounter}
+ * will be updated with the content of this bit vector by the end of the iteration. */
+ protected boolean[] mustBeChecked;
+ /** If {@link #local} is true, the sorted list of nodes that should be scanned. */
+ protected int[] localCheckList;
+ /** If {@link #preLocal} is true, the set of nodes that should be scanned on the next iteration. Note that this set is synchronized. */
+ protected final IntSet localNextMustBeChecked;
+ /** One of the throwables thrown by some of the threads, if at least one thread has thrown a throwable. */
+ protected volatile Throwable threadThrowable;
+
+ protected final static int ensureRegisters(final int log2m) {
+ if (log2m < 4) throw new IllegalArgumentException("There must be at least 16 registers per counter");
+ if (log2m > 60) throw new IllegalArgumentException("There can be at most 2^60 registers per counter");
+ return log2m;
+ }
+
+ /** Computes the number of threads.
+ *
+ * <p>If the specified number of threads is zero, {@link Runtime#availableProcessors()} will be returned.
+ *
+ * @param suggestedNumberOfThreads
+ * @return the actual number of threads.
+ */
+ private final static int numberOfThreads(final int suggestedNumberOfThreads) {
+ if (suggestedNumberOfThreads != 0) return suggestedNumberOfThreads;
+ return Runtime.getRuntime().availableProcessors();
+ }
+
+ /** Creates a new HyperBall instance.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code> in case you want to perform systolic computations, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ * @param numberOfThreads the number of threads to be used (0 for automatic sizing).
+ * @param bufferSize the size of an I/O buffer in bytes (0 for {@link #DEFAULT_BUFFER_SIZE}).
+ * @param granularity the number of node per task in a multicore environment (it will be rounded to the next multiple of 64), or 0 for {@link #DEFAULT_GRANULARITY}.
+ * @param external if true, results of an iteration will be stored on disk.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl, final int numberOfThreads, final int bufferSize, final int granularity, final boolean external) throws IOException {
+ this(g, gt, log2m, pl, numberOfThreads, bufferSize, granularity, external, false, false, null, Util.randomSeed());
+ }
+
+ /** Creates a new HyperBall instance using default values.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code> in case you want to perform systolic computations, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m) throws IOException {
+ this(g, gt, log2m, null, 0, 0, 0, false);
+ }
+
+ /** Creates a new HyperBall instance using default values.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code> in case you want to perform systolic computations, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl) throws IOException {
+ this(g, null, log2m, pl, 0, 0, 0, false);
+ }
+
+ /** Creates a new HyperBall instance using default values and disabling systolic computation.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ */
+ public HyperBall(final ImmutableGraph g, final int log2m) throws IOException {
+ this(g, null, log2m);
+ }
+
+ /** Creates a new HyperBall instance using default values and disabling systolic computation.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public HyperBall(final ImmutableGraph g, final int log2m, final long seed) throws IOException {
+ this(g, null, log2m, null, 0, 0, 0, false, false, false, null, seed);
+ }
+
+ /** Creates a new HyperBall instance using default values and disabling systolic computation.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public HyperBall(final ImmutableGraph g, final int log2m, final ProgressLogger pl) throws IOException {
+ this(g, null, log2m, pl);
+ }
+
+ /** Creates a new HyperBall instance without weights.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code>, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ * @param numberOfThreads the number of threads to be used (0 for automatic sizing).
+ * @param bufferSize the size of an I/O buffer in bytes (0 for {@link #DEFAULT_BUFFER_SIZE}).
+ * @param granularity the number of node per task in a multicore environment (it will be rounded to the next multiple of 64), or 0 for {@link #DEFAULT_GRANULARITY}.
+ * @param external if true, results of an iteration will be stored on disk.
+ * @param doSumOfDistances whether the sum of distances from each node should be computed.
+ * @param doSumOfInverseDistances whether the sum of inverse distances from each node should be computed.
+ * @param discountFunction an array (possibly <code>null</code>) of discount functions.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl,
+ final int numberOfThreads, final int bufferSize, final int granularity, final boolean external,
+ final boolean doSumOfDistances, final boolean doSumOfInverseDistances, final Int2DoubleFunction[] discountFunction, final long seed) throws IOException {
+ this(g, gt, log2m, pl, numberOfThreads, bufferSize, granularity, external, doSumOfDistances, doSumOfInverseDistances, discountFunction, null, seed);
+ }
+
+ private static long totalWeight(final int numNodes, final int[] weight) {
+ if (weight == null) return numNodes;
+ if (weight.length != numNodes) throw new IllegalArgumentException("The weight array length (" + weight.length + ") and the number of nodes of the graph (" + numNodes + ") do not match");
+
+ long totalWeight = 0;
+ for(int w: weight) {
+ if (w < 0) throw new IllegalArgumentException("Negative weight: " + w);
+ totalWeight += w;
+ }
+ return totalWeight;
+ }
+
+ /** Creates a new HyperBall instance.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code>, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ * @param numberOfThreads the number of threads to be used (0 for automatic sizing).
+ * @param bufferSize the size of an I/O buffer in bytes (0 for {@link #DEFAULT_BUFFER_SIZE}).
+ * @param granularity the number of node per task in a multicore environment (it will be rounded to the next multiple of 64), or 0 for {@link #DEFAULT_GRANULARITY}.
+ * @param external if true, results of an iteration will be stored on disk.
+ * @param doSumOfDistances whether the sum of distances from each node should be computed.
+ * @param doSumOfInverseDistances whether the sum of inverse distances from each node should be computed.
+ * @param discountFunction an array (possibly <code>null</code>) of discount functions.
+ * @param weight an array of nonnegative node weights.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl,
+ final int numberOfThreads, final int bufferSize, final int granularity, final boolean external,
+ final boolean doSumOfDistances, final boolean doSumOfInverseDistances, final Int2DoubleFunction[] discountFunction, final int[] weight, final long seed) throws IOException {
+ super(g.numNodes(), totalWeight(g.numNodes(), weight), ensureRegisters(log2m), seed);
+
+ info("Seed : " + Long.toHexString(seed));
+
+ gotTranspose = gt != null;
+ localNextMustBeChecked = gotTranspose ? IntSets.synchronize(new IntOpenHashSet(Hash.DEFAULT_INITIAL_SIZE, Hash.VERY_FAST_LOAD_FACTOR)) : null;
+ this.weight = weight;
+
+ numNodes = g.numNodes();
+ try {
+ numArcs = g.numArcs();
+ }
+ catch(UnsupportedOperationException e) {
+ // No number of arcs. We have to enumerate.
+ long a = 0;
+ final NodeIterator nodeIterator = g.nodeIterator();
+ for(int i = g.numNodes(); i-- != 0;) {
+ nodeIterator.nextInt();
+ a += nodeIterator.outdegree();
+ }
+ numArcs = a;
+ }
+ squareNumNodes = (double)numNodes * numNodes;
+
+ cumulativeOutdegrees = new EliasFanoCumulativeOutdegreeList(g, numArcs, Math.max(0, 64 / m - 1));
+
+ modifiedCounter = new boolean[numNodes];
+ modifiedResultCounter = external ? null : new boolean[numNodes];
+ if (gt != null) {
+ mustBeChecked = new boolean[numNodes];
+ nextMustBeChecked = new boolean[numNodes];
+ if (gt.numNodes() != g.numNodes()) throw new IllegalArgumentException("The graph and its transpose have a different number of nodes");
+ if (gt.numArcs() != g.numArcs()) throw new IllegalArgumentException("The graph and its transpose have a different number of arcs");
+ }
+
+ this.pl = pl;
+ this.external = external;
+ this.doSumOfDistances = doSumOfDistances;
+ this.doSumOfInverseDistances = doSumOfInverseDistances;
+ this.discountFunction = discountFunction == null ? new Int2DoubleFunction[0] : discountFunction;
+ this.numberOfThreads = numberOfThreads(numberOfThreads);
+ this.granularity = numberOfThreads == 1 ? numNodes : granularity == 0 ? DEFAULT_GRANULARITY : ((granularity + Long.SIZE - 1) & ~(Long.SIZE - 1));
+ this.bufferSize = Math.max(1, (bufferSize == 0 ? DEFAULT_BUFFER_SIZE : bufferSize) / ((Long.SIZE / Byte.SIZE) * (counterLongwords + 1)));
+
+ info("Relative standard deviation: " + Util.format(100 * HyperLogLogCounterArray.relativeStandardDeviation(log2m)) + "% (" + m + " registers/counter, " + registerSize + " bits/register, " + Util.format(m * registerSize / 8.) + " bytes/counter)");
+ if (external) info("Running " + this.numberOfThreads + " threads with a buffer of " + Util.formatSize(this.bufferSize) + " counters");
+ else info("Running " + this.numberOfThreads + " threads");
+
+ thread = new IterationThread[this.numberOfThreads];
+
+ if (external) {
+ info("Creating update list...");
+ updateFile = File.createTempFile(HyperBall.class.getName(), "-temp");
+ updateFile.deleteOnExit();
+ fileChannel = (randomAccessFile = new RandomAccessFile(updateFile, "rw")).getChannel();
+ }
+ else {
+ updateFile = null;
+ fileChannel = null;
+ }
+
+ nodes = new AtomicInteger();
+ arcs = new AtomicLong();
+ modified = new AtomicInteger();
+ unwritten = new AtomicInteger();
+
+ neighbourhoodFunction = new DoubleArrayList();
+ sumOfDistances = doSumOfDistances ? new float[numNodes] : null;
+ sumOfInverseDistances = doSumOfInverseDistances ? new float[numNodes] : null;
+ discountedCentrality = new float[this.discountFunction.length][];
+ for (int i = 0; i < this.discountFunction.length; i++) discountedCentrality[i] = new float[numNodes];
+
+ info("HyperBall memory usage: " + Util.formatSize2(usedMemory()) + " [not counting graph(s)]");
+
+ if (! external) {
+ info("Allocating result bit vectors...");
+ // Allocate vectors that will store the result.
+ resultBits = new long[bits.length][];
+ resultRegisters = new LongBigList[bits.length];
+ for(int i = bits.length; i-- != 0;) resultRegisters[i] = (LongArrayBitVector.wrap(resultBits[i] = new long[bits[i].length])).asLongBigList(registerSize);
+ }
+ else {
+ resultBits = null;
+ resultRegisters = null;
+ }
+
+ lock = new ReentrantLock();
+ allWaiting = lock.newCondition();
+ start = lock.newCondition();
+ aliveThreads = this.numberOfThreads;
+
+ if (this.numberOfThreads == 1) (thread[0] = new IterationThread(g, gt, 0)).start();
+ else for(int i = 0; i < this.numberOfThreads; i++) (thread[i] = new IterationThread(g.copy(), gt != null ? gt.copy() : null, i)).start();
+
+ // We wait for all threads being read to start.
+ lock.lock();
+ try {
+ while (aliveThreads != 0) allWaiting.await();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ finally {
+ lock.unlock();
+ }
+ }
+
+ private void info(String s) {
+ if (pl != null) pl.logger().info(s);
+ }
+
+ private long usedMemory() {
+ long bytes = 0;
+ for(long[] a: bits) bytes += a.length * ((long)Long.SIZE / Byte.SIZE);
+ if (! external) bytes *= 2;
+ if (sumOfDistances != null) bytes += sumOfDistances.length * ((long)Float.SIZE / Byte.SIZE);
+ if (sumOfInverseDistances != null) bytes += sumOfInverseDistances.length * ((long)Float.SIZE / Byte.SIZE);
+ for (int i = discountFunction.length; i-- != 0;) bytes += discountedCentrality[i].length * ((long)Float.SIZE / Byte.SIZE);
+ if (modifiedCounter != null) bytes += modifiedCounter.length;
+ if (modifiedResultCounter != null) bytes += modifiedResultCounter.length;
+ if (nextMustBeChecked != null) bytes += nextMustBeChecked.length;
+ if (mustBeChecked != null) bytes += mustBeChecked.length;
+ return bytes;
+ }
+
+ private void ensureOpen() {
+ if (closed) throw new IllegalStateException("This " + HyperBall.class.getSimpleName() + " has been closed.");
+ }
+
+ /** Initialises the approximator.
+ *
+ * <p>This method must be call before a series of {@linkplain #iterate() iterations}.
+ * Note that it will <em>not</em> change the seed used by the underlying {@link HyperLogLogCounterArray}.
+ *
+ * @see #init(long)
+ */
+ public void init() {
+ init(seed);
+ }
+
+ /** Initialises the approximator, providing a new seed to the underlying {@link HyperLogLogCounterArray}.
+ *
+ * <p>This method must be call before a series of {@linkplain #iterate() iterations}.
+ * @param seed passed to {@link #clear(long)}.
+ */
+ public void init(final long seed) {
+ ensureOpen();
+ info("Clearing all registers...");
+ clear(seed);
+
+ if (weight == null) {
+ // We load the counter i with node i.
+ for(int i = numNodes; i-- != 0;) add(i, i);
+ }
+ else {
+ XoRoShiRo128PlusRandomGenerator random = new XoRoShiRo128PlusRandomGenerator(seed);
+ // We load the counter i with node weight[i] random values.
+ for(int i = numNodes; i-- != 0;)
+ for(int j = weight[i]; j-- != 0;)
+ add(i, random.nextLong());
+ }
+
+ iteration = -1;
+ completed = systolic = local = preLocal = false;
+
+ if (! external) for(long[] a: resultBits) Arrays.fill(a, 0);
+
+ if (sumOfDistances != null) Arrays.fill(sumOfDistances, 0);
+ if (sumOfInverseDistances != null) Arrays.fill(sumOfInverseDistances, 0);
+ for (int i = 0; i < discountFunction.length; i++) Arrays.fill(discountedCentrality[i], 0);
+
+ // The initial value (the iteration for this value does not actually happen).
+ neighbourhoodFunction.add(last = numNodes);
+
+ Arrays.fill(modifiedCounter, true); // Initially, all counters are modified.
+
+ if (pl != null) {
+ pl.displayFreeMemory = true;
+ pl.itemsName = "iterates";
+ pl.start("Iterating...");
+ }
+ }
+
+ public long[] expand(final long[] bits) {
+ int remaining = Long.SIZE;
+ int word = 0;
+ long curr = bits[word];
+ final long[] result = new long[m];
+ final int registerSize = this.registerSize;
+ final int mask = (1 << registerSize) - 1;
+
+ for (int j = m; j-- != 0;) {
+ if (remaining >= registerSize) {
+ result[j] = curr & mask;
+ curr >>>= registerSize;
+ remaining -= registerSize;
+ }
+ else {
+ result[j] = (curr | bits[++word] << remaining) & mask;
+ curr = bits[word] >>> registerSize - remaining;
+ remaining += Long.SIZE - registerSize;
+ }
+
+ // if (ASSERTS) assert r == registers[chunk(k)].getLong(offset(k) + j) : "[" + j + "] " + r + "!=" + registers[chunk(k)].getLong(offset(k) + j);
+ }
+
+ return result;
+ }
+
+ @Override
+ public void close() throws IOException {
+ if (closed) return;
+ closed = true;
+
+ lock.lock();
+ try {
+ completed = true;
+ for(IterationThread t: thread) t.threadShouldWait = false;
+ start.signalAll();
+ }
+ finally {
+ lock.unlock();
+ }
+
+ for(Thread t: thread)
+ try {
+ t.join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (external) {
+ randomAccessFile.close();
+ fileChannel.close();
+ updateFile.delete();
+ }
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ if (! closed) {
+ LOGGER.warn("This " + this.getClass().getName() + " [" + toString() + "] should have been closed.");
+ close();
+ }
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ private final class IterationThread extends Thread {
+ /** A copy of the graph for this thread only. */
+ private final ImmutableGraph g;
+ /** A copy of the transpose graph for this thread only. */
+ private final ImmutableGraph gt;
+ /** The index of this thread (just used to identify the thread). */
+ private final int index;
+ /** True if we should wait for the end of the current phase. */
+ private boolean threadShouldWait;
+
+ /** Create a new iteration thread.
+ * @param index the index of this thread (just used to identify the thread).
+ */
+ private IterationThread(final ImmutableGraph g, ImmutableGraph gt, final int index) {
+ this.g = g;
+ this.gt = gt;
+ this.index = index;
+ }
+
+ private final boolean synchronize(final int phase) throws InterruptedException {
+ lock.lock();
+ try {
+ threadShouldWait = true;
+ if (--aliveThreads == 0) allWaiting.signal();
+ if (aliveThreads < 0) throw new IllegalStateException();
+ while (threadShouldWait) start.await();
+ if (completed) return true;
+ if (phase != HyperBall.this.phase) throw new IllegalStateException("Main thread is in phase " + HyperBall.this.phase + ", but thread " + index + " is heading to phase " + phase);
+ return false;
+ }
+ finally {
+ lock.unlock();
+ }
+ }
+
+
+ @Override
+ public void run() {
+ try {
+ // Lots of local caching.
+ final int registerSize = HyperBall.this.registerSize;
+ final int counterLongwords = HyperBall.this.counterLongwords;
+ final boolean external = HyperBall.this.external;
+ final ImmutableGraph g = this.g;
+ final boolean doSumOfDistances = HyperBall.this.doSumOfDistances;
+ final boolean doSumOfInverseDistances = HyperBall.this.doSumOfInverseDistances;
+ final int numberOfDiscountFunctions = HyperBall.this.discountFunction.length;
+ final boolean doCentrality = doSumOfDistances || doSumOfInverseDistances || numberOfDiscountFunctions != 0;
+
+ final long[] accumulator = new long[counterLongwords];
+ final long[] mask = new long[counterLongwords];
+
+ final long t[] = new long[counterLongwords];
+ final long prevT[] = new long[counterLongwords];
+ final long u[] = new long[counterLongwords];
+
+ final ByteBuffer byteBuffer = external ? ByteBuffer.allocate((Long.SIZE / Byte.SIZE) * bufferSize * (counterLongwords + 1)) : null;
+ if (external) byteBuffer.clear();
+
+ for(;;) {
+
+ if (synchronize(0)) return;
+
+ // These variables might change across executions of the loop body.
+ final long granularity = HyperBall.this.adaptiveGranularity;
+ final long arcGranularity = (long)Math.ceil((double)numArcs * granularity / numNodes);
+ final long bits[][] = HyperBall.this.bits;
+ final long resultBits[][] = HyperBall.this.resultBits;
+ final boolean[] modifiedCounter = HyperBall.this.modifiedCounter;
+ final boolean[] modifiedResultCounter = HyperBall.this.modifiedResultCounter;
+ final boolean[] mustBeChecked = HyperBall.this.mustBeChecked;
+ final boolean[] nextMustBeChecked = HyperBall.this.nextMustBeChecked;
+ final boolean systolic = HyperBall.this.systolic;
+ final boolean local = HyperBall.this.local;
+ final boolean preLocal = HyperBall.this.preLocal;
+ final int localCheckShift = 6 - log2m;
+ final int[] localCheckList = HyperBall.this.localCheckList;
+ final IntSet localNextMustBeChecked = HyperBall.this.localNextMustBeChecked;
+
+ int start = -1;
+ int end = -1;
+ int modified = 0; // The number of registers that have been modified during the computation of the present task.
+ int unwritten = 0; // The number of counters not written to disk.
+
+ // In a local computation tasks are based on the content of localCheckList.
+ int upperLimit = local ? localCheckList.length : numNodes;
+
+ /* During standard iterations, cumulates the neighbourhood function for the nodes scanned
+ * by this thread. During systolic iterations, cumulates the *increase* of the
+ * neighbourhood function for the nodes scanned by this thread. */
+ final KahanSummation neighbourhoodFunctionDelta = new KahanSummation();
+
+ for(;;) {
+
+ // Try to get another piece of work.
+ synchronized(HyperBall.this.cumulativeOutdegrees) {
+ if (nextNode == upperLimit) break;
+ start = nextNode;
+ if (local) {
+ nextNode++;
+ if (log2m < 6) {
+ /* We cannot split the list unless the boundary crosses a
+ * multiple of 1 << localCheckShift. Otherwise, we might create
+ * race conditions with other threads. */
+ while(nextNode < upperLimit) {
+ if (((localCheckList[nextNode - 1] ^ localCheckList[nextNode]) >>> localCheckShift) != 0) break;
+ nextNode++;
+ }
+ }
+ }
+ else {
+ final long target = nextArcs + arcGranularity;
+ if (target >= numArcs) nextNode = numNodes;
+ else {
+ nextArcs = cumulativeOutdegrees.skipTo(target);
+ nextNode = cumulativeOutdegrees.currentIndex();
+ }
+ }
+ end = nextNode;
+ }
+
+ final NodeIterator nodeIterator = local || systolic ? null : g.nodeIterator(start);
+ long arcs = 0;
+
+ for(int i = start; i < end; i++) {
+ final int node = local ? localCheckList[i] : i;
+ /* The three cases in which we enumerate successors:
+ * 1) A non-systolic computation (we don't know anything, so we enumerate).
+ * 2) A systolic, local computation (the node is by definition to be checked, as it comes from the local check list).
+ * 3) A systolic, non-local computation in which the node should be checked.
+ */
+ if (! systolic || local || mustBeChecked[node]) {
+ int d;
+ int[] successor = null;
+ LazyIntIterator successors = null;
+
+ if (local || systolic) {
+ d = g.outdegree(node);
+ successors = g.successors(node);
+ }
+ else {
+ nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+ successor = nodeIterator.successorArray();
+ }
+
+ final int chunk = chunk(node);
+ getCounter(bits[chunk], node, t);
+ // Caches t's values into prevT
+ System.arraycopy(t, 0, prevT, 0, counterLongwords);
+
+ boolean counterModified = false;
+
+ for(int j = d; j-- != 0;) {
+ final int s = local || systolic ? successors.nextInt() : successor[j];
+ /* Neither self-loops nor unmodified counter do influence the computation. */
+ if (s != node && modifiedCounter[s]) {
+ counterModified = true; // This is just to mark that we entered the loop at least once.
+ getCounter(bits[chunk(s)], s, u);
+ max(t, u, accumulator, mask);
+ }
+ }
+
+ arcs += d;
+
+ if (ASSERTS) {
+ LongBigList test = LongArrayBitVector.wrap(t).asLongBigList(registerSize);
+ for(int rr = 0; rr < m; rr++) {
+ int max = (int)registers[chunk(node)].getLong(((long)node << log2m) + rr);
+ if (local || systolic) successors = g.successors(node);
+ for(int j = d; j-- != 0;) {
+ final int s = local || systolic ? successors.nextInt() : successor[j];
+ max = Math.max(max, (int)registers[chunk(s)].getLong(((long)s << log2m) + rr));
+ }
+ assert max == test.getLong(rr) : max + "!=" + test.getLong(rr) + " [" + rr + "]";
+ }
+ }
+
+ if (counterModified) {
+ /* If we enter this branch, we have maximised with at least one successor.
+ * We must thus check explicitly whether we have modified the counter. */
+ counterModified = false;
+ for(int p = counterLongwords; p-- != 0;)
+ if (prevT[p] != t[p]) {
+ counterModified = true;
+ break;
+ }
+ }
+
+ double post = Double.NaN;
+
+ /* We need the counter value only if the iteration is standard (as we're going to
+ * compute the neighbourhood function cumulating actual values, and not deltas) or
+ * if the counter was actually modified (as we're going to cumulate the neighbourhood
+ * function delta, or at least some centrality). */
+ if (! systolic || counterModified) post = count(t, 0);
+ if (! systolic) neighbourhoodFunctionDelta.add(post);
+
+ if (counterModified) {
+ long[] prevRegister = expand(prevT);
+ long[] register = expand(t);
+ long s = 0, snz = 0;
+ int nonzero = 0;
+ for(int p = 0; p < m; p++) {
+ final long r = register[p], pr = prevRegister[p];
+ s += r;
+ if (r != pr) {
+ nonzero++;
+ snz += r;
+ }
+ }
+ s = (s + m / 2) >>> log2m;
+ snz = nonzero == 0 ? 0 : (snz + nonzero / 2) / nonzero;
+ }
+
+ // Here counterModified is true only if the counter was *actually* modified.
+ if (counterModified && (systolic || doCentrality)) {
+ final double pre = count(node);
+ if (systolic) {
+ neighbourhoodFunctionDelta.add(-pre);
+ neighbourhoodFunctionDelta.add(post);
+ }
+
+ if (doCentrality) {
+ final double delta = post - pre;
+ // Note that this code is executed only for distances > 0.
+ if (delta > 0) { // Force monotonicity
+ if (doSumOfDistances) sumOfDistances[node] += delta * (iteration + 1);
+ if (doSumOfInverseDistances) sumOfInverseDistances[node] += delta / (iteration + 1);
+ for (int j = numberOfDiscountFunctions; j-- != 0;) discountedCentrality[j][node] += delta * discountFunction[j].get(iteration + 1);
+ }
+ }
+ }
+
+ if (counterModified) {
+ /* We keep track of modified counters in the result if we are
+ * not in external mode (in external mode modified counters are
+ * computed when the update list is reloaded). Note that we must
+ * add the current node to the must-be-checked set for the next
+ * local iteration if it is modified, as it might need a copy to
+ * the result array at the next iteration. */
+ if (preLocal) localNextMustBeChecked.add(node);
+ if (! external) modifiedResultCounter[node] = true;
+
+ if (systolic) {
+ final LazyIntIterator predecessors = gt.successors(node);
+ int p;
+ /* In systolic computations we must keep track of which counters must
+ * be checked on the next iteration. If we are preparing a local computation,
+ * we do this explicitly, by adding the predecessors of the current
+ * node to a set. Otherwise, we do this implicitly, by setting the
+ * corresponding entry in an array. */
+ if (preLocal) while((p = predecessors.nextInt()) != -1) localNextMustBeChecked.add(p);
+ else while((p = predecessors.nextInt()) != -1) nextMustBeChecked[p] = true;
+ }
+
+ modified++;
+ }
+
+ if (external) {
+ if (counterModified) {
+ byteBuffer.putLong(node);
+ for(int p = counterLongwords; p-- != 0;) byteBuffer.putLong(t[p]);
+
+ if (! byteBuffer.hasRemaining()) {
+ byteBuffer.flip();
+ long time = -System.currentTimeMillis();
+ fileChannel.write(byteBuffer);
+ time += System.currentTimeMillis();
+ totalIoMillis += time;
+ numberOfWrites++;
+ byteBuffer.clear();
+ }
+ }
+ else unwritten++;
+ }
+ else {
+ /* This is slightly subtle: if a counter is not modified, and
+ * the present value was not a modified value in the first place,
+ * then we can avoid updating the result altogether. */
+ if (counterModified || modifiedCounter[node]) setCounter(t, resultBits[chunk], node);
+ else unwritten++;
+ }
+ }
+ else if (! external) {
+ /* Even if we cannot possibly have changed our value, still our copy
+ * in the result vector might need to be updated because it does not
+ * reflect our current value. */
+ if (modifiedCounter[node]) {
+ final int chunk = chunk(node);
+ transfer(bits[chunk], resultBits[chunk], node);
+ }
+ else unwritten++;
+ }
+ }
+
+ // Update the global progress counter.
+ HyperBall.this.arcs.addAndGet(arcs);
+ nodes.addAndGet(end - start);
+ }
+
+ if (external) {
+ // If we can avoid at all calling FileChannel.write(), we do so.
+ if(byteBuffer.position() != 0) {
+ byteBuffer.flip();
+ long time = -System.currentTimeMillis();
+ fileChannel.write(byteBuffer);
+ time += System.currentTimeMillis();
+ totalIoMillis += time;
+ numberOfWrites++;
+ byteBuffer.clear();
+ }
+ }
+
+ HyperBall.this.modified.addAndGet(modified);
+ HyperBall.this.unwritten.addAndGet(unwritten);
+
+ synchronized(HyperBall.this) {
+ current += neighbourhoodFunctionDelta.value();
+ }
+
+ if (external) {
+ synchronize(1);
+ /* Read into memory newly computed counters, updating modifiedCounter.
+ * Note that if m is less than 64 copyFromLocal(), being unsynchronised, might
+ * cause race conditions (when maximising each thread writes in a longword-aligned
+ * block of memory, so no race conditions can arise). Since synchronisation would
+ * lead to significant contention (as we cannot synchronise at a level finer than
+ * a bit vector, and update lists might be quite dense and local), we prefer simply
+ * to do the update with thread 0 only. */
+ if (index == 0 || m >= Long.SIZE) for(;;) {
+ byteBuffer.clear();
+ if (fileChannel.read(byteBuffer) <= 0) break;
+ byteBuffer.flip();
+ while(byteBuffer.hasRemaining()) {
+ final int node = (int)byteBuffer.getLong();
+ for(int p = counterLongwords; p-- != 0;) t[p] = byteBuffer.getLong();
+ setCounter(t, bits[chunk(node)], node);
+ modifiedCounter[node] = true;
+ }
+ }
+ }
+
+ }
+ }
+ catch(Throwable t) {
+ t.printStackTrace();
+ threadThrowable = t;
+ lock.lock();
+ try {
+ if (--aliveThreads == 0) allWaiting.signal();
+ }
+ finally {
+ lock.unlock();
+ }
+ }
+ }
+
+ @Override
+ public String toString() {
+ return "Thread " + index;
+ }
+ }
+
+ /** Performs a new iteration of HyperBall. */
+ public void iterate() throws IOException {
+ ensureOpen();
+ try {
+ iteration++;
+
+ // Let us record whether the previous computation was systolic or local.
+ final boolean previousWasSystolic = systolic;
+ final boolean previousWasLocal = local;
+
+ /* If less than one fourth of the nodes have been modified, and we have the transpose,
+ * it is time to pass to a systolic computation. */
+ systolic = gotTranspose && iteration > 0 && modified.get() < numNodes / 4;
+
+ /* Non-systolic computations add up the value of all counter.
+ * Systolic computations modify the last value by compensating for each modified counter. */
+ current = systolic ? last : 0;
+
+ // If we completed the last iteration in pre-local mode, we MUST run in local mode.
+ local = preLocal;
+
+ // We run in pre-local mode if we are systolic and few nodes where modified.
+ preLocal = systolic && modified.get() < .1 * numNodes * numNodes / numArcs;
+
+ info("Starting " + (systolic ? "systolic iteration (local: " + local + "; pre-local: " + preLocal + ")" : "standard " + (external ? "external " : "") + "iteration"));
+
+ if (! external) {
+ if (previousWasLocal) for(int x: localCheckList) modifiedResultCounter[x] = false;
+ else Arrays.fill(modifiedResultCounter, false);
+ assert ! BooleanUtils.or(modifiedResultCounter);
+ }
+
+ if (local) {
+ /* In case of a local computation, we convert the set of must-be-checked for the
+ * next iteration into a check list. */
+ localCheckList = localNextMustBeChecked.toIntArray();
+ IntArrays.parallelQuickSort(localCheckList);
+ localNextMustBeChecked.clear();
+ }
+ else if (systolic) {
+ // Systolic, non-local computations store the could-be-modified set implicitly into this array.
+ Arrays.fill(nextMustBeChecked, false);
+ // If the previous computation wasn't systolic, we must assume that all registers could have changed.
+ if (! previousWasSystolic) Arrays.fill(mustBeChecked, true);
+ }
+
+ adaptiveGranularity = granularity;
+
+ if (numberOfThreads > 1 && ! local) {
+ if (iteration > 0) {
+ adaptiveGranularity = (long)Math.min(Math.max(1, numNodes / numberOfThreads), granularity * (numNodes / Math.max(1., modified())));
+ adaptiveGranularity = (adaptiveGranularity + Long.SIZE - 1) & ~(Long.SIZE - 1);
+ }
+ info("Adaptive granularity for this iteration: " + adaptiveGranularity);
+ }
+
+ modified.set(0);
+ totalIoMillis = 0;
+ numberOfWrites = 0;
+ final ProgressLogger npl = pl == null ? null : new ProgressLogger(LOGGER, 1, TimeUnit.MINUTES, "arcs");
+
+ if (npl != null) {
+ arcs.set(0);
+ npl.expectedUpdates = systolic || local ? -1 : numArcs;
+ npl.start("Scanning graph...");
+ }
+
+ nodes.set(0);
+ nextArcs = nextNode = 0;
+ unwritten.set(0);
+ if (external) fileChannel.position(0);
+
+ // Start all threads.
+ lock.lock();
+ try {
+ phase = 0;
+ aliveThreads = numberOfThreads;
+ for(IterationThread t: thread) t.threadShouldWait = false;
+ start.signalAll();
+
+ // Wait for all threads to complete their tasks, logging some stuff in the mean time.
+ while(aliveThreads != 0) {
+ allWaiting.await(1, TimeUnit.MINUTES);
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ final int aliveThreads = this.aliveThreads;
+ if (npl != null && aliveThreads != 0) {
+ if (arcs.longValue() != 0) npl.set(arcs.longValue());
+ if (external && numberOfWrites > 0) {
+ final long time = npl.millis();
+ info("Writes: " + numberOfWrites + "; per second: " + Util.format(1000.0 * numberOfWrites / time));
+ info("I/O time: " + Util.format((totalIoMillis / 1000.0)) + "s; per write: " + (totalIoMillis / 1000.0) / numberOfWrites + "s");
+ }
+ if (aliveThreads != 0) info("Alive threads: " + aliveThreads + " (" + Util.format(100.0 * aliveThreads / numberOfThreads) + "%)");
+ }
+ }
+ }
+ finally {
+ lock.unlock();
+ }
+
+ if (npl != null) {
+ npl.done(arcs.longValue());
+ if (! external) info("Unwritten counters: " + Util.format(unwritten.intValue()) + " (" + Util.format(100.0 * unwritten.intValue() / numNodes) + "%)");
+ info("Unmodified counters: " + Util.format(numNodes - modified.intValue()) + " (" + Util.format(100.0 * (numNodes - modified.intValue()) / numNodes) + "%)");
+ }
+
+ if (external) {
+ if (npl != null) {
+ npl.itemsName = "counters";
+ npl.start("Updating counters...");
+ }
+
+ // Read into memory the newly computed counters.
+
+ fileChannel.truncate(fileChannel.position());
+ fileChannel.position(0);
+
+ // In pre-local mode, we do not clear modified counters.
+ if (! preLocal) Arrays.fill(modifiedCounter, false);
+
+ lock.lock();
+ try {
+ phase = 1;
+ aliveThreads = numberOfThreads;
+ for(IterationThread t: thread) t.threadShouldWait = false;
+ start.signalAll();
+ // Wait for all threads to complete the counter update.
+ while(aliveThreads != 0) allWaiting.await();
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ }
+ finally {
+ lock.unlock();
+ }
+
+ if (npl != null) {
+ npl.count = modified();
+ npl.done();
+ }
+ }
+ else {
+ // Switch the bit vectors.
+ for(int i = 0; i < bits.length; i++) {
+ if (npl != null) npl.update(bits[i].length);
+ final LongBigList r = registers[i];
+ registers[i] = resultRegisters[i];
+ resultRegisters[i] = r;
+ final long[] b = bits[i];
+ bits[i] = resultBits[i];
+ resultBits[i] = b;
+ }
+
+ // Switch modifiedCounters and modifiedResultCounters.
+ final boolean[] t = modifiedCounter;
+ modifiedCounter = modifiedResultCounter;
+ modifiedResultCounter = t;
+ }
+
+ if (systolic) {
+ // Switch mustBeChecked and nextMustBeChecked.
+ final boolean[] t = mustBeChecked;
+ mustBeChecked = nextMustBeChecked;
+ nextMustBeChecked = t;
+ }
+
+ last = current;
+ /* We enforce monotonicity. Non-monotonicity can only be caused
+ * by approximation errors. */
+ final double lastOutput = neighbourhoodFunction.getDouble(neighbourhoodFunction.size() - 1);
+ if (current < lastOutput) current = lastOutput;
+ relativeIncrement = current / lastOutput;
+
+ if (pl != null) {
+ pl.logger().info("Pairs: " + current + " (" + current * 100.0 / squareNumNodes + "%)");
+ pl.logger().info("Absolute increment: " + (current - lastOutput));
+ pl.logger().info("Relative increment: " + relativeIncrement);
+ }
+
+ neighbourhoodFunction.add(current);
+
+ if (pl != null) pl.updateAndDisplay();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ /** Returns the number of HyperLogLog counters that were modified by the last call to {@link #iterate()}.
+ *
+ * @return the number of HyperLogLog counters that were modified by the last call to {@link #iterate()}.
+ */
+ public int modified() {
+ return modified.get();
+ }
+
+ /** Runs HyperBall. The computation will stop when {@link #modified()} returns false. */
+ public void run() throws IOException {
+ run(Long.MAX_VALUE);
+ }
+
+ /** Runs HyperBall.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ */
+ public void run(final long upperBound) throws IOException {
+ run(upperBound, -1);
+ }
+
+ /** Runs HyperBall.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ * @param threshold a value that will be used to stop the computation by relative increment if the neighbourhood function is being computed; if you specify -1,
+ * the computation will stop when {@link #modified()} returns false.
+ */
+ public void run(long upperBound, final double threshold) throws IOException {
+ run(upperBound, threshold, seed);
+ }
+
+ /** Runs HyperBall.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ * @param threshold a value that will be used to stop the computation by relative increment if the neighbourhood function is being computed; if you specify -1,
+ * the computation will stop when {@link #modified()} returns false.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public void run(long upperBound, final double threshold, final long seed) throws IOException {
+ upperBound = Math.min(upperBound, numNodes);
+
+ init(seed);
+
+ for(long i = 0; i < upperBound; i++) {
+ iterate();
+
+ if (modified() == 0) {
+ info("Terminating approximation after " + i + " iteration(s) by stabilisation");
+ break;
+ }
+
+ if (i > 3 && relativeIncrement < (1 + threshold)) {
+ info("Terminating approximation after " + i + " iteration(s) by relative bound on the neighbourhood function");
+ break;
+ }
+ }
+
+ if (pl != null) pl.done();
+ }
+
+ /** Throws a {@link NotSerializableException}, as this class implements {@link Serializable}
+ * because it extends {@link HyperLogLogCounterArray}, but it's not really. */
+ private void writeObject(@SuppressWarnings("unused") final ObjectOutputStream oos) throws IOException {
+ throw new NotSerializableException();
+ }
+
+
+ public static void main(String arg[]) throws IOException, JSAPException, IllegalArgumentException, ClassNotFoundException, IllegalAccessException, InvocationTargetException, InstantiationException, NoSuchMethodException {
+ SimpleJSAP jsap = new SimpleJSAP(HyperBall.class.getName(), "Runs HyperBall on the given graph, possibly computing positive geometric centralities.\n\nPlease note that to compute negative centralities on directed graphs (which is usually what you want) you have to compute positive centralities on the transpose.",
+ new Parameter[] {
+ new FlaggedOption("log2m", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'l', "log2m", "The logarithm of the number of registers."),
+ new FlaggedOption("upperBound", JSAP.LONGSIZE_PARSER, Long.toString(Long.MAX_VALUE), JSAP.NOT_REQUIRED, 'u', "upper-bound", "An upper bound to the number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, "-1", JSAP.NOT_REQUIRED, 't', "threshold", "A threshold that will be used to stop the computation by relative increment. If it is -1, the iteration will stop only when all registers do not change their value (recommended)."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("granularity", JSAP.INTSIZE_PARSER, Integer.toString(DEFAULT_GRANULARITY), JSAP.NOT_REQUIRED, 'g', "granularity", "The number of node per task in a multicore environment."),
+ new FlaggedOption("bufferSize", JSAP.INTSIZE_PARSER, Util.formatBinarySize(DEFAULT_BUFFER_SIZE), JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of an I/O buffer in bytes."),
+ new FlaggedOption("neighbourhoodFunction", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "neighbourhood-function", "Store an approximation the neighbourhood function in text format."),
+ new FlaggedOption("sumOfDistances", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'd', "sum-of-distances", "Store an approximation of the sum of distances from each node as a binary list of floats."),
+ new FlaggedOption("harmonicCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'h', "harmonic-centrality", "Store an approximation of the positive harmonic centrality (the sum of the reciprocals of distances from each node) as a binary list of floats."),
+ new FlaggedOption("discountedGainCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'z', "discounted-gain-centrality", "A positive discounted gain centrality to be approximated and stored; it is specified as O:F where O is the spec of an object of type Int2DoubleFunction and F is the name of the file where the binary list of floats will be stored. The spec can be either the name of a public field of HyperBall, or a constructor invocation of a class implementing Int2DoubleFunction.").setAllowMultipleDeclarations(true),
+ new FlaggedOption("closenessCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'c', "closeness-centrality", "Store an approximation of the positive closeness centrality of each node (the reciprocal of sum of the distances from each node) as a binary list of floats. Terminal nodes will have centrality equal to zero."),
+ new FlaggedOption("linCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'L', "lin-centrality", "Store an approximation of the positive Lin centrality of each node (the reciprocal of sum of the distances from each node multiplied by the square of the number of nodes reachable from the node) as a binary list of floats. Terminal nodes will have centrality equal to one."),
+ new FlaggedOption("nieminenCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'N', "nieminen-centrality", "Store an approximation of the positive Nieminen centrality of each node (the square of the number of nodes reachable from each node minus the sum of the distances from the node) as a binary list of floats."),
+ new FlaggedOption("reachable", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'r', "reachable", "Store an approximation of the number of nodes reachable from each node as a binary list of floats."),
+ new FlaggedOption("seed", JSAP.LONG_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'S', "seed", "The random seed."),
+ new FlaggedOption("weights", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'W', "weights", "A binary list of nonnegative integers representing the weight of each node."),
+ new Switch("spec", 's', "spec", "The basename is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new Switch("offline", 'o', "offline", "Do not load the graph in main memory. If this option is used, the graph will be loaded in offline (for one thread) or mapped (for several threads) mode."),
+ new Switch("external", 'e', "external", "Use an external dump file instead of core memory to store new counter values. Note that the file might be very large: you might need to set suitably the Java temporary directory (-Djava.io.tmpdir=DIR)."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("basenamet", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose graph for systolic computations (strongly suggested). If it is equal to <basename>, the graph will be assumed to be symmetric and will be loaded just once."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean external = jsapResult.getBoolean("external");
+ final boolean offline = jsapResult.getBoolean("offline");
+ final String neighbourhoodFunctionFile = jsapResult.getString("neighbourhoodFunction");
+ final boolean neighbourhoodFunction = jsapResult.userSpecified("neighbourhoodFunction");
+ final String sumOfDistancesFile = jsapResult.getString("sumOfDistances");
+ final boolean sumOfDistances = jsapResult.userSpecified("sumOfDistances");
+ final String harmonicCentralityFile = jsapResult.getString("harmonicCentrality");
+ final boolean harmonicCentrality = jsapResult.userSpecified("harmonicCentrality");
+ final String closenessCentralityFile = jsapResult.getString("closenessCentrality");
+ final boolean closenessCentrality = jsapResult.userSpecified("closenessCentrality");
+ final String linCentralityFile = jsapResult.getString("linCentrality");
+ final boolean linCentrality = jsapResult.userSpecified("linCentrality");
+ final String nieminenCentralityFile = jsapResult.getString("nieminenCentrality");
+ final boolean nieminenCentrality = jsapResult.userSpecified("nieminenCentrality");
+ final String reachableFile = jsapResult.getString("reachable");
+ final boolean reachable = jsapResult.userSpecified("reachable");
+ final String weightFile = jsapResult.getString("weights");
+ final String basename = jsapResult.getString("basename");
+ final String basenamet = jsapResult.getString("basenamet");
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ final int log2m = jsapResult.getInt("log2m");
+ final int threads = jsapResult.getInt("threads");
+ final int bufferSize = jsapResult.getInt("bufferSize");
+ final int granularity = jsapResult.getInt("granularity");
+ final long seed = jsapResult.userSpecified("seed") ? jsapResult.getLong("seed") : Util.randomSeed();
+
+ final String[] discountedGainCentralitySpec = jsapResult.getStringArray("discountedGainCentrality");
+ final Int2DoubleFunction[] discountFunction = new Int2DoubleFunction[discountedGainCentralitySpec.length];
+ final String[] discountedGainCentralityFile = new String[discountedGainCentralitySpec.length];
+ for (int i = 0; i < discountedGainCentralitySpec.length; i++) {
+ int pos = discountedGainCentralitySpec[i].indexOf(':');
+ if (pos < 0) throw new IllegalArgumentException("Wrong spec <" + discountedGainCentralitySpec[i] + ">");
+ discountedGainCentralityFile[i] = discountedGainCentralitySpec[i].substring(pos + 1);
+ String gainSpec = discountedGainCentralitySpec[i].substring(0, pos);
+ Int2DoubleFunction candidateFunction;
+ try {
+ candidateFunction = (Int2DoubleFunction)HyperBall.class.getField(gainSpec).get(null);
+ }
+ catch (SecurityException e) {
+ throw new IllegalArgumentException("Field " + gainSpec + " exists but cannot be accessed", e);
+ }
+ catch (ClassCastException e) {
+ throw new IllegalArgumentException("Field " + gainSpec + " exists but it is not of type Int2DoubleFunction", e);
+ }
+ catch (NoSuchFieldException e) {
+ candidateFunction = null;
+ }
+ discountFunction[i] = candidateFunction == null? ObjectParser.fromSpec(gainSpec, Int2DoubleFunction.class) : candidateFunction;
+ }
+
+ final int[] weight = weightFile != null ? BinIO.loadInts(weightFile) : null;
+
+ final ImmutableGraph graph = spec
+ ? ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE)
+ : offline
+ ? ((numberOfThreads(threads) == 1 && basenamet == null ? ImmutableGraph.loadOffline(basename) : ImmutableGraph.loadMapped(basename, new ProgressLogger())))
+ : ImmutableGraph.load(basename, new ProgressLogger());
+
+ final ImmutableGraph grapht = basenamet == null ? null : basenamet.equals(basename) ? graph : spec ? ObjectParser.fromSpec(basenamet, ImmutableGraph.class, GraphClassParser.PACKAGE) :
+ offline ? ImmutableGraph.loadMapped(basenamet, new ProgressLogger()) : ImmutableGraph.load(basenamet, new ProgressLogger());
+
+ final HyperBall hyperBall = new HyperBall(graph, grapht, log2m, pl, threads, bufferSize, granularity, external, sumOfDistances || closenessCentrality || linCentrality || nieminenCentrality, harmonicCentrality, discountFunction, weight, seed);
+ hyperBall.run(jsapResult.getLong("upperBound"), jsapResult.getDouble("threshold"));
+ hyperBall.close();
+
+ if (neighbourhoodFunction) {
+ final PrintStream stream = new PrintStream(new FastBufferedOutputStream(new FileOutputStream(neighbourhoodFunctionFile)));
+ for(DoubleIterator i = hyperBall.neighbourhoodFunction.iterator(); i.hasNext();) stream.println(BigDecimal.valueOf(i.nextDouble()).toPlainString());
+ stream.close();
+ }
+
+ if (sumOfDistances) BinIO.storeFloats(hyperBall.sumOfDistances, sumOfDistancesFile);
+ if (harmonicCentrality) BinIO.storeFloats(hyperBall.sumOfInverseDistances, harmonicCentralityFile);
+ for (int i = 0; i < discountedGainCentralitySpec.length; i++) BinIO.storeFloats(hyperBall.discountedCentrality[i], discountedGainCentralityFile[i]);
+ if (closenessCentrality) {
+ final int n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(closenessCentralityFile)));
+ for (int i = 0; i < n; i++) dos.writeFloat(hyperBall.sumOfDistances[i] == 0 ? 0 : 1 / hyperBall.sumOfDistances[i]);
+ dos.close();
+ }
+ if (linCentrality) {
+ final int n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(linCentralityFile)));
+ for (int i = 0; i < n; i++) {
+ // Lin's index for isolated nodes is by (our) definition one (it's smaller than any other node).
+ if (hyperBall.sumOfDistances[i] == 0) dos.writeFloat(1);
+ else {
+ final double count = hyperBall.count(i);
+ dos.writeFloat((float)(count * count / hyperBall.sumOfDistances[i]));
+ }
+ }
+ dos.close();
+ }
+ if (nieminenCentrality) {
+ final int n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(nieminenCentralityFile)));
+ for (int i = 0; i < n; i++) {
+ final double count = hyperBall.count(i);
+ dos.writeFloat((float)(count * count - hyperBall.sumOfDistances[i]));
+ }
+ dos.close();
+ }
+ if (reachable) {
+ final int n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(reachableFile)));
+ for (int i = 0; i < n; i++) dos.writeFloat((float)hyperBall.count(i));
+ dos.close();
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/NeighbourhoodFunction.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/NeighbourhoodFunction.java
new file mode 100644
index 0000000..cd57653
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/NeighbourhoodFunction.java
@@ -0,0 +1,311 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Computes the neighbourhood function of a graph by multiple {@linkplain ParallelBreadthFirstVisit parallel breadth-first visits}.
+ *
+ * <p>Note that performing all breadth-first visits requires time <i>O</i>(<var>n</var><var>m</var>), so this class
+ * is usable only on very small graphs.
+ *
+ * <p>Additionally, this class provides several useful static methods such as {@link #distanceCumulativeDistributionFunction(double[])},
+ * {@link #distanceProbabilityMassFunction(double[])}, {@link #averageDistance(double[])}, {@link #medianDistance(int, double[])}
+ * and {@link #spid(double[])} that act on neighbourhood functions.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class uses an instance of {@link ParallelBreadthFirstVisit} to ensure a high degree of parallelism (see its
+ * documentation for memory requirements). Note that if the graph is small a large number of thread will slow down the computation because of synchronization costs.
+ *
+ * @author Paolo Boldi
+ * @author Sebastiano Vigna
+ */
+public class NeighbourhoodFunction {
+ private static final Logger LOGGER = LoggerFactory.getLogger(NeighbourhoodFunction.class);
+
+ /** Computes and returns the neighbourhood function of the specified graph by multiple breadth-first visits.
+ *
+ * <p>This method returns an array of doubles. When some values of the function are near 2<sup>63</sup>, it
+ * might lose some least-significant digits. If you need exact values,
+ * use {@link #computeExact(ImmutableGraph, int, ProgressLogger)} instead.
+ *
+ * @param g a graph.
+ * @return the neighbourhood function of the specified graph.
+ */
+ public static double[] compute(final ImmutableGraph g) {
+ return compute(g, null);
+ }
+
+ /** Computes and returns the neighbourhood function of the specified graph by multiple breadth-first visits.
+ *
+ * <p>This method returns an array of doubles. When some values of the function are near 2<sup>63</sup>, it
+ * might lose some least-significant digits. If you need exact values,
+ * use {@link #computeExact(ImmutableGraph, int, ProgressLogger)} instead.
+ *
+ * @param g a graph.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return the neighbourhood function of the specified graph.
+ */
+ public static double[] compute(final ImmutableGraph g, final ProgressLogger pl) {
+ return compute(g, 0, pl);
+ }
+
+ /** Computes and returns the neighbourhood function of the specified graph by multiple breadth-first visits.
+ *
+ * <p>This method returns an array of doubles. When some values of the function are near 2<sup>63</sup>, it
+ * might lose some least-significant digits. If you need exact values,
+ * use {@link #computeExact(ImmutableGraph, int, ProgressLogger)} instead.
+ *
+ * @param g a graph.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * Note that if the graph is small a large number of thread will slow down the computation because of synchronization costs.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return the neighbourhood function of the specified graph.
+ */
+ public static double[] compute(final ImmutableGraph g, final int threads, final ProgressLogger pl) {
+ final long[] computeExact = computeExact(g, threads, pl);
+ final double[] result = new double[computeExact.length];
+ for(int i = result.length; i-- != 0;) result[i] = computeExact[i];
+ return result;
+ }
+
+ /** Computes and returns the neighbourhood function of the specified graph by multiple breadth-first visits.
+ *
+ * <p>This method returns an array of longs. When some values of the function are near 2<sup>63</sup>, it
+ * provides an exact value, as opposed to {@link #compute(ImmutableGraph, int, ProgressLogger)}.
+ *
+ * @param g a graph.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * Note that if the graph is small a large number of thread will slow down the computation because of synchronization costs.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return the neighbourhood function of the specified graph as an array of longs.
+ */
+ public static long[] computeExact(final ImmutableGraph g, final int threads, final ProgressLogger pl) {
+ final int n = g.numNodes();
+ long count[] = LongArrays.EMPTY_ARRAY;
+
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(g, threads, true, null);
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start();
+ }
+
+ for(int i = 0; i < n; i++) {
+ visit.clear();
+ visit.visit(i);
+ final int maxDistance = visit.maxDistance();
+ if (count.length <= maxDistance) count = LongArrays.grow(count, maxDistance + 1);
+ for(int d = maxDistance + 1; d-- != 0;) count[d] += visit.cutPoints.getInt(d + 1) - visit.cutPoints.getInt(d);
+ if (pl != null) pl.update();
+ }
+
+ if (pl != null) pl.done();
+
+ int last;
+ for(last = count.length; last-- != 0 && count[last] == 0;);
+ last++;
+ final long[] result = new long[last];
+ result[0] = count[0];
+ for(int i = 1; i < last; i++) result[i] = result[i - 1] + count[i];
+ return result;
+ }
+
+ /** Returns the distance cumulative distribution function.
+ *
+ * @param neighbourhoodFunction a neighbourhood function or distance cumulative distribution function.
+ * @return the distance cumulative distribution function.
+ */
+ public static double[] distanceCumulativeDistributionFunction(final double[] neighbourhoodFunction) {
+ final double[] result = neighbourhoodFunction.clone();
+ double lastValue = result[result.length - 1];
+ for(int d = result.length; d-- != 0;) result[d] /= lastValue;
+ return result;
+ }
+
+ /** Returns the probability mass function of the distance distribution.
+ *
+ * @param neighbourhoodFunction a neighbourhood function or distance cumulative distribution function.
+ * @return the probability mass function of the distance distribution.
+ */
+ public static double[] distanceProbabilityMassFunction(final double[] neighbourhoodFunction) {
+ final double[] result = neighbourhoodFunction.clone();
+ double lastValue = result[result.length - 1];
+ // Not necessary, but not harmful.
+ for(int d = result.length; d-- != 0;) result[d] /= lastValue;
+ for(int d = result.length; d-- != 1;) result[d] -= result[d - 1];
+ return result;
+ }
+
+
+
+ /** Returns the effective diameter at a specified fraction.
+ *
+ * @param alpha the desired fraction of reachable pairs of nodes (usually, 0.9).
+ * @param neighbourhoodFunction a neighbourhood function or distance cumulative distribution function.
+ * @return the effective diameter at <code>fraction</code>.
+ */
+ public static double effectiveDiameter(final double alpha, final double[] neighbourhoodFunction) {
+ double finalFraction = neighbourhoodFunction[neighbourhoodFunction.length - 1];
+ int d;
+ for (d = 0; neighbourhoodFunction[d] / finalFraction < alpha; d++);
+
+ if (d == 0) // In this case we assume the previous ordinate to be zero
+ return d + (alpha * finalFraction - neighbourhoodFunction[d]) / (neighbourhoodFunction[d]);
+ else
+ return d + (alpha * finalFraction - neighbourhoodFunction[d]) / (neighbourhoodFunction[d] - neighbourhoodFunction[d - 1]);
+ }
+
+ /** Returns the effective diameter at 0.9.
+ *
+ * @param neighbourhoodFunction a neighbourhood function (or distance cumulative distribution function).
+ * @return the effective diameter at 0.9.
+ */
+ public static double effectiveDiameter(final double[] neighbourhoodFunction) {
+ return effectiveDiameter(.9, neighbourhoodFunction);
+ }
+
+ /** Returns the median of distances between all pairs of nodes.
+ *
+ * @param neighbourhoodFunction a neighbourhood function.
+ * @return the median distance, which might be {@link Double#POSITIVE_INFINITY} if less than half
+ * of the pairs of nodes are reachable.
+ */
+ public static double medianDistance(final double[] neighbourhoodFunction) {
+ return medianDistance((int)Math.round(neighbourhoodFunction[0]), neighbourhoodFunction);
+ }
+
+ /** Returns the median of distances between all pairs of nodes.
+ *
+ * <p>Note that if you have an actual neighbourhood function, you can safely pass its first value
+ * as first argument; however, having the number of nodes as a separate input
+ * makes it possible passing this method a distance cumulative distribution
+ * function, too.
+ *
+ * @param n the number of nodes in the graph.
+ * @param neighbourhoodFunction a neighbourhood function (or distance cumulative distribution function).
+ * @return the median distance, which might be {@link Double#POSITIVE_INFINITY} if less than half
+ * of the pairs of nodes are reachable.
+ */
+ public static double medianDistance(final int n, double[] neighbourhoodFunction) {
+ double halfPairs = .5 * n * n;
+ int d;
+ for (d = neighbourhoodFunction.length; d-- != 0 && neighbourhoodFunction[d] > halfPairs;);
+ return d == neighbourhoodFunction.length - 1 ? Double.POSITIVE_INFINITY : d + 1;
+ }
+
+ /** Returns the spid (shortest-paths index of dispersion).
+ *
+ * @param neighbourhoodFunction a neighbourhood function (or distance cumulative distribution function).
+ * @return the spid.
+ */
+ public static double spid(final double[] neighbourhoodFunction) {
+ final double[] distanceProbabilityMassFunction = NeighbourhoodFunction.distanceProbabilityMassFunction(neighbourhoodFunction);
+ double mean = 0, meanOfSquares = 0;
+ for(int i = 0; i < distanceProbabilityMassFunction.length; i++) {
+ mean += distanceProbabilityMassFunction[i] * i;
+ meanOfSquares += distanceProbabilityMassFunction[i] * i * i;
+ }
+
+ return (meanOfSquares - mean * mean) / mean;
+ }
+
+ /** Returns the average of the distances between reachable pairs of nodes.
+ *
+ * @param neighbourhoodFunction a neighbourhood function (or distance cumulative distribution function).
+ * @return the average of the distances between reachable pairs of nodes.
+ */
+ public static double averageDistance(final double[] neighbourhoodFunction) {
+ final double[] distanceProbabilityMassFunction = NeighbourhoodFunction.distanceProbabilityMassFunction(neighbourhoodFunction);
+ double mean = 0;
+ for(int i = 0; i < distanceProbabilityMassFunction.length; i++) mean += distanceProbabilityMassFunction[i] * i;
+ return mean;
+ }
+
+ /** Returns the harmonic diameter, that is, the harmonic mean of all distances.
+ *
+ * @param neighbourhoodFunction a neighbourhood function.
+ * @return the harmonic diameter.
+ */
+ public static double harmonicDiameter(final double[] neighbourhoodFunction) {
+ return harmonicDiameter((int)Math.round(neighbourhoodFunction[0]), neighbourhoodFunction);
+ }
+
+ /** Returns the harmonic diameter, that is, the harmonic mean of all distances.
+ *
+ * <p>Note that if you have an actual neighbourhood function, you can safely pass its first value
+ * as first argument; however, having the number of nodes as a separate input
+ * makes it possible passing this method a distance cumulative distribution
+ * function, too.
+ *
+ * @param n the number of nodes in the graph.
+ * @param neighbourhoodFunction a neighbourhood function (or distance cumulative distribution function).
+ * @return the harmonic diameter.
+ */
+ public static double harmonicDiameter(final int n, double[] neighbourhoodFunction) {
+ double t = 0;
+ for(int i = 1; i < neighbourhoodFunction.length; i++) t += (neighbourhoodFunction[i] - neighbourhoodFunction[i - 1]) / i;
+ return (double)n * (n - 1) / t;
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(NeighbourhoodFunction.class.getName(),
+ "Prints the neighbourhood function of a graph, computing it via breadth-first visits.",
+ new Parameter[] {
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically. Note that if the graph is small a large number of thread will slow down the computation because of synchronization costs."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final int threads = jsapResult.getInt("threads");
+ ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ ImmutableGraph g =ImmutableGraph.load(basename);
+ if (jsapResult.userSpecified("expand")) g = new ArrayListMutableGraph(g).immutableView();
+ TextIO.storeLongs(computeExact(g, threads, pl), System.out);
+ }
+}
+
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ParallelBreadthFirstVisit.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ParallelBreadthFirstVisit.java
new file mode 100644
index 0000000..ce40e71
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/ParallelBreadthFirstVisit.java
@@ -0,0 +1,335 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicIntegerArray;
+import java.util.concurrent.atomic.AtomicLong;
+
+/** Performs breadth-firsts visits of a graph exploiting multicore parallelism.
+ *
+ * <p>To use this class you first create an instance, and then invoke {@link #visit(int)}. If you want perform
+ * more visits preserving the {@link #marker} state you can invoke {@link #visit(int)} again.
+ * By calling {@link #clear()}, instead, you can reset {@link #marker} (i.e., forget about visited nodes).
+ *
+ * <p>Alternatively, {@link #visitAll()} will start a visit from all the nodes of the graph in a more efficient way.
+ *
+ * <p>After the visit, you can peek at the field {@link #marker} to discover details about the visit.
+ * Depending on the {@link #parent} value provided at construction time, the array {@link #marker}
+ * will be filled with parent information (e.g., with the index
+ * of the parent node in the visit tree) or with a <em>{@linkplain #round round number}</em> increased at each nonempty visit,
+ * which act as a connected-component index if the graph is symmetric.
+ *
+ * <p>Observe that in the former case (if {@link #parent} is <code>true</code>), the array {@link #marker} will
+ * contain the value -1 for the nodes that have not been reached by the visit, the parent of the node in the BFS tree
+ * if the node was not the root, or the node itself for the root.
+ *
+ * <p>In the case of {@link #visit(int, int)}, {@link #queue} and {@link #cutPoints}, too, provide useful information. In
+ * particular, the nodes in {@link #queue} from the <var>d</var>-th to the (<var>d</var>&nbsp;+1)-th cutpoint
+ * are exactly the nodes at distance <var>d</var> from the source.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class needs three integers per node.
+ * If there are several available cores, breadth-first visits will be <em>decomposed</em> into relatively
+ * small tasks (small blocks of nodes in the queue at the same distance from the starting node)
+ * and each task will be assigned to the first available core. Since all tasks are completely
+ * independent, this ensures a very high degree of parallelism. However, on very sparse graphs the cost
+ * of keeping the threads synchronised can be extremely high, and even end up <em>increasing</em> the visit time.
+ *
+ * <p>Note that if the degree distribution is extremely skewed some cores might get stuck
+ * in the enumeration of the successors of some nodes with a very high degree.
+ */
+
+public class ParallelBreadthFirstVisit {
+ /** The graph under examination. */
+ public final ImmutableGraph graph;
+ /** The queue of visited nodes. */
+ public final IntArrayList queue;
+ /** At the end of a visit, the cutpoints of {@link #queue}. The <var>d</var>-th cutpoint is the first node in the queue at distance <var>d</var>. The
+ * last cutpoint is the queue size. */
+ public final IntArrayList cutPoints;
+ /** Whether {@link #marker} contains parent nodes or round numbers. */
+ public final boolean parent;
+ /** The marker array; contains -1 for nodes that have not still been enqueued, the parent of the visit tree if
+ * {@link #parent} is true, or an index increased at each visit if {@link #parent} is false, which in the symmetric case is the index
+ * of the connected component of the node. */
+ public final AtomicIntegerArray marker;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The number of nodes visited. */
+ private final AtomicInteger progress;
+ /** The next node position to be picked from the last segment of {@link #queue}. */
+ private final AtomicLong nextPosition;
+ /** If true, the current visit is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize visiting threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in visiting threads. */
+ private volatile Throwable threadThrowable;
+ /** A number increased at each nonempty visit (used to mark {@link #marker} if {@link #parent} is false). */
+ public int round;
+
+ /** Creates a new class for keeping track of the state of parallel breadth-first visits.
+ *
+ * @param graph a graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param parent if true, {@link #marker} will contain parent nodes; otherwise, it will contain {@linkplain #round round numbers}.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ParallelBreadthFirstVisit(final ImmutableGraph graph, final int requestedThreads, final boolean parent, final ProgressLogger pl) {
+ this.graph = graph;
+ this.parent = parent;
+ this.pl = pl;
+ this.marker = new AtomicIntegerArray(graph.numNodes());
+ this.queue = new IntArrayList(graph.numNodes());
+ this.progress = new AtomicInteger();
+ this.nextPosition = new AtomicLong();
+ this.cutPoints = new IntArrayList();
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+ clear();
+ }
+
+ /** Clears the internal state of the visit, setting all {@link #marker} entries and {@link #round} to -1. */
+ public void clear() {
+ round = -1;
+ for(int i = marker.length(); i-- != 0;) marker.set(i, -1);
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 1000;
+
+ @Override
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final AtomicIntegerArray marker = ParallelBreadthFirstVisit.this.marker;
+ final ImmutableGraph graph = ParallelBreadthFirstVisit.this.graph.copy();
+ final boolean parent = ParallelBreadthFirstVisit.this.parent;
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final IntArrayList out = new IntArrayList();
+ final int first = cutPoints.getInt(cutPoints.size() - 2);
+ final int last = cutPoints.getInt(cutPoints.size() - 1);
+ int mark = round;
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = first + nextPosition.getAndAdd(GRANULARITY);
+ if (start >= last) {
+ nextPosition.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final int end = (int)(Math.min(last, start + GRANULARITY));
+ out.clear();
+
+ for(int pos = (int)start; pos < end; pos++) {
+ final int curr = queue.getInt(pos);
+ if (parent == true) mark = curr;
+ final LazyIntIterator successors = graph.successors(curr);
+ for(int s; (s = successors.nextInt()) != -1;)
+ if (marker.compareAndSet(s, -1, mark)) out.add(s);
+ }
+
+ progress.addAndGet(end - (int)start);
+
+ if (! out.isEmpty()) synchronized(queue) {
+ queue.addAll(out);
+ }
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+
+ /** Performs a breadth-first visit of the given graph starting from the given node.
+ *
+ * <p>This method will increment {@link #round}.
+ *
+ * @param start the starting node.
+ * @return the number of visited nodes.
+ * @see #visit(int,int)
+ */
+ public int visit(final int start) {
+ return visit(start, -1);
+ }
+
+
+ /** Performs a breadth-first visit of the given graph starting from the given node.
+ *
+ * <p>This method will increment {@link #round} if at least one node is visited.
+ *
+ * @param start the starting node.
+ * @param expectedSize the expected size (number of nodes) of the visit (for logging), or -1 to use the number of nodes of the graph.
+ * @return the number of visited nodes.
+ */
+ public int visit(final int start, final int expectedSize) {
+ if (marker.get(start) != -1) return 0;
+ round++;
+ completed = false;
+ queue.clear();
+ cutPoints.clear();
+ queue.add(start);
+ cutPoints.add(0);
+ marker.set(start, parent ? start : round);
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+ progress.set(0);
+
+ if (pl != null) {
+ pl.start("Starting visit...");
+ pl.expectedUpdates = expectedSize != -1 ? expectedSize : graph.numNodes();
+ pl.itemsName = "nodes";
+ }
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ @Override
+ public void run() {
+ if (pl != null) pl.set(progress.get());
+
+ if (queue.size() == cutPoints.getInt(cutPoints.size() - 1)) {
+ completed = true;
+ return;
+ }
+
+ cutPoints.add(queue.size());
+ nextPosition.set(0);
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (pl != null) pl.done();
+ return queue.size();
+ }
+
+ /** Visits all nodes. Calls {@link #clear()} initially.
+ *
+ * <p>This method is more efficient than invoking {@link #visit(int, int)} on all nodes as threads are created just once.
+ */
+ public void visitAll() {
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+ final int n = graph.numNodes();
+ completed = false;
+ clear();
+ queue.clear();
+ cutPoints.clear();
+ progress.set(0);
+
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.expectedUpdates = graph.numNodes();
+ pl.displayLocalSpeed = true;
+ pl.itemsName = "nodes";
+ }
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ int curr = -1;
+ @Override
+ public void run() {
+ if (pl != null) pl.set(progress.get());
+ // Either first call, or queue did not grow from the last call.
+ if (curr == -1 || queue.size() == cutPoints.getInt(cutPoints.size() - 1)) {
+ if (pl != null) pl.set(progress.get());
+ // Look for the first nonterminal node not yet visited.
+ for(;;) {
+ while(++curr < n && marker.get(curr) != -1);
+
+ if (curr == n) {
+ completed = true;
+ return;
+ }
+ else {
+ round++;
+ marker.set(curr, parent ? curr : round);
+
+ final int d = graph.outdegree(curr);
+ if (d != 0 && ! (d == 1 && graph.successors(curr).nextInt() == curr)) {
+ queue.clear();
+ queue.add(curr);
+
+ cutPoints.clear();
+ cutPoints.add(0);
+ break;
+ }
+ }
+ }
+ }
+
+ cutPoints.add(queue.size());
+ nextPosition.set(0);
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (pl != null) pl.done();
+ }
+
+
+ /** Returns a node at maximum distance during the last visit (e.g., a node realising the positive eccentricity of the starting node).
+ *
+ * @return the maximum distance computed during the last visit.
+ */
+ public int nodeAtMaxDistance() {
+ return queue.getInt(queue.size() - 1);
+ }
+
+ /** Returns the maximum distance computed during the last visit (e.g., the eccentricity of the source).
+ *
+ * @return the maximum distance computed during the last visit.
+ */
+
+ public int maxDistance() {
+ return cutPoints.size() - 2;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SampleDistanceCumulativeDistributionFunction.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SampleDistanceCumulativeDistributionFunction.java
new file mode 100644
index 0000000..e603d21
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SampleDistanceCumulativeDistributionFunction.java
@@ -0,0 +1,167 @@
+package it.unimi.dsi.webgraph.algo;
+
+import java.io.IOException;
+import java.math.RoundingMode;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.AbstractObjectList;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.stat.Jackknife;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+/** Samples a graph via breadth-first visits.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class uses an instance of {@link ParallelBreadthFirstVisit} to ensure a high degree of parallelism (see its
+ * documentation for memory requirements).
+ */
+
+public class SampleDistanceCumulativeDistributionFunction {
+ private static final Logger LOGGER = LoggerFactory.getLogger(SampleDistanceCumulativeDistributionFunction.class);
+
+ /** Checks that we are always visiting the same number of nodes, warning if it is less than the number of nodes of the graph, and throwing an exception otherwise.
+ *
+ * @param visit the current visit.
+ * @param visitedNodes the number of the visited nodes, or 0 if unknown.
+ * @return the number of visited nodes in <code>visit</code>.
+ */
+ private static int visitedNodes(final ParallelBreadthFirstVisit visit, int visitedNodes) {
+ if (visit.queue.size() != visit.graph.numNodes()) {
+ if (visitedNodes == -1) {
+ visitedNodes = visit.queue.size();
+ LOGGER.warn("The graph is not strongly connected: visiting " + visitedNodes + " < " + visit.graph.numNodes() + " nodes");
+ }
+ else if (visitedNodes != visit.queue.size()) throw new IllegalStateException("Queue size (" + visit.queue.size() + ") is different from the number of previously visited nodes (" + visitedNodes + "): maybe the graph is not symmetric.");
+ }
+
+ return visitedNodes;
+ }
+
+ /** Samples a graph via breadth-first visits.
+ *
+ * <p>This method will estimate the cumulative distribution function of distances of
+ * a strongly connected graph. to which a randomly extracted node belongs.
+ * If there is more than one connected component, a warning will be given, specifying the size of the component. An {@link IllegalStateException}
+ * will be thrown if the algorithm detects that the graph is not strongly connected, but this is not guaranteed to happen.
+ *
+ * @param graph a graph.
+ * @param k a number of samples.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @return an array of samples.
+ */
+ protected static int[][] sample(final ImmutableGraph graph, final int k, final int threads) {
+ return sample(graph, k, false, threads);
+ }
+
+ /** Samples a graph via breadth-first visits.
+ *
+ * <p>This method will estimate the cumulative distribution function of distances of
+ * a strongly connected graph. If there is more than one connected component, a warning will be given, specifying the size of the component. An {@link IllegalStateException}
+ * will be thrown if the algorithm detects that the graph is not strongly connected, but this is not guaranteed to happen.
+ *
+ * @param graph a graph.
+ * @param k a number of samples.
+ * @param naive sample naively: do not stop sampling even when detecting the lack of strong connection.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @return an array of samples.
+ */
+ protected static int[][] sample(final ImmutableGraph graph, final int k, final boolean naive, final int threads) {
+ final ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(graph, threads, false, new ProgressLogger(LOGGER, "nodes"));
+
+ final XoRoShiRo128PlusRandom random = new XoRoShiRo128PlusRandom();
+
+ int componentSize = -1;
+ final int[][] result = new int[k][];
+ for(int i = k; i-- != 0;) {
+ // After the first iteration, we pick a node from the visit queue, unless we are sampling naively.
+ visit.clear();
+ visit.visit(visit.queue.isEmpty() || naive ? random.nextInt(visit.graph.numNodes()) : visit.queue.getInt(random.nextInt(visit.queue.size())), componentSize);
+ if (!naive) componentSize = visitedNodes(visit, componentSize);
+ final int maxDistance = visit.maxDistance();
+ result[i] = new int[maxDistance + 1];
+ for(int d = 0; d <= maxDistance; d++) result[i][d] = visit.cutPoints.getInt(d + 1);
+ }
+
+ return result;
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ final SimpleJSAP jsap = new SimpleJSAP(SampleDistanceCumulativeDistributionFunction.class.getName(),
+ "Estimates the neighbourhood function, the distance cumulative distribution function and the distance probability mass function by sampling." +
+ "The output files contains nine columns: for each function, we give the value, the standard error and the relative" +
+ "standard error as a percentage (all estimated by the jacknife).",
+ new Parameter[] {
+ new Switch("mapped", 'm', "mapped", "Do not load the graph in main memory, but rather memory-map it."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("samples", JSAP.INTSIZE_PARSER, "1000", JSAP.NOT_REQUIRED, 's', "samples", "The number of samples (breadth-first visits)."),
+ new Switch("naive", 'n', "naive", "Sample naively: pick nodes at random and do not stop sampling even when detecting the lack of strong connection."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final ImmutableGraph graph = jsapResult.userSpecified("mapped") ? ImmutableGraph.loadMapped(basename) :ImmutableGraph.load(basename);
+ final int[][] sample = sample(graph, jsapResult.getInt("samples"), jsapResult.userSpecified("naive"), jsapResult.getInt("threads"));
+ int l = 0;
+ for(final int[] s: sample) l = Math.max(l, s.length);
+ final int length = l;
+ final AbstractObjectList<double[]> samples = new AbstractObjectList<double[]>() {
+ @Override
+ public double[] get(final int index) {
+ final double[] result = new double[length];
+ final int[] s = sample[index];
+ final double n = graph.numNodes();
+ for(int i = 0; i < length; i++) result[i] = s[Math.min(i, s.length - 1)] * n;
+ return result;
+ }
+
+ @Override
+ public int size() {
+ return sample.length;
+ }
+ };
+ final Jackknife nf = Jackknife.compute(samples, Jackknife.IDENTITY);
+ final Jackknife cdf = Jackknife.compute(samples, ApproximateNeighbourhoodFunctions.CDF);
+ final Jackknife pmf = Jackknife.compute(samples, ApproximateNeighbourhoodFunctions.PMF);
+
+ for(int i = 0; i < pmf.estimate.length; i++) System.out.println(
+ nf.bigEstimate[i].setScale(30, RoundingMode.HALF_EVEN) + "\t" + nf.standardError[i] + "\t" + 100 * nf.standardError[i] / nf.estimate[i] + "\t" +
+ cdf.bigEstimate[i].setScale(30, RoundingMode.HALF_EVEN) + "\t" + cdf.standardError[i] + "\t" + 100 * cdf.standardError[i] / cdf.estimate[i] + "\t" +
+ pmf.bigEstimate[i].setScale(30, RoundingMode.HALF_EVEN) + "\t" + pmf.standardError[i] + "\t" + 100 * pmf.standardError[i] / pmf.estimate[i]);
+
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/StronglyConnectedComponents.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/StronglyConnectedComponents.java
new file mode 100644
index 0000000..1ab9f17
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/StronglyConnectedComponents.java
@@ -0,0 +1,442 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.Stack;
+import it.unimi.dsi.fastutil.booleans.BooleanArrayList;
+import it.unimi.dsi.fastutil.booleans.BooleanStack;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntStack;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Computes the strongly connected components (and optionally the buckets) of an immutable graph.
+ *
+ * <p>The {@link #compute(ImmutableGraph, boolean, ProgressLogger)} method of this class will return
+ * an instance that contains the data computed by running a variant of Tarjan's algorithm on an immutable graph.
+ * The implementation is iterative, rather than recursive, to work around known limitations on the size of
+ * the stack in current JVMs.
+ * Besides the usually strongly connected components, it is possible to compute the <em>buckets</em> of the
+ * graph, that is, nodes belonging to components that are terminal, but not dangling, in the component DAG.
+ *
+ * <p>After getting an instance, it is possible to run the {@link #computeSizes()} and {@link #sortBySize(int[])}
+ * methods to obtain further information. This scheme has been devised to exploit the available memory as much
+ * as possible&mdash;after the components have been computed, the returned instance keeps no track of
+ * the graph, and the related memory can be freed by the garbage collector.
+ */
+
+
+public class StronglyConnectedComponents {
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ private static final Logger LOGGER = LoggerFactory.getLogger(StronglyConnectedComponents.class);
+
+ /** The number of strongly connected components. */
+ final public int numberOfComponents;
+ /** The component of each node. */
+ final public int[] component;
+ /** The bit vector for buckets, or <code>null</code>, in which case buckets have not been computed. */
+ final public LongArrayBitVector buckets;
+
+ protected StronglyConnectedComponents(final int numberOfComponents, final int[] status, final LongArrayBitVector buckets) {
+ this.numberOfComponents = numberOfComponents;
+ this.component = status;
+ this.buckets = buckets;
+ }
+
+ private final static class Visit {
+ /** The graph. */
+ private final ImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final int n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final int[] status;
+ /** The buckets. */
+ private final LongArrayBitVector buckets;
+ /** The component stack. */
+ private final IntArrayList componentStack;
+ /** The first-visit clock (incremented at each visited node). */
+ private int clock;
+ /** The number of components already output. */
+ private int numberOfComponents;
+
+ private Visit(final ImmutableGraph graph, final int[] status, final LongArrayBitVector buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ componentStack = new IntArrayList(n);
+ }
+
+ /** Performs a visit starting form a given node.
+ *
+ * @param startNode the first node to visit.
+ */
+ private void visit(final int startNode) {
+ final BooleanStack olderNodeFound = new BooleanArrayList();
+ final IntStack nodeStack = new IntArrayList();
+ final Stack<LazyIntIterator> successorsStack = new ObjectArrayList<>();
+ final int[] status = this.status;
+ // For simplicify, we compute nonbuckets and then flip the values.
+ final LongArrayBitVector nonBuckets = this.buckets;
+
+ status[startNode] = ++clock;
+ componentStack.push(startNode);
+ nodeStack.push(startNode);
+ successorsStack.push(graph.successors(startNode));
+ olderNodeFound.push(false);
+ if (computeBuckets && graph.outdegree(startNode) == 0) nonBuckets.set(startNode);
+
+ main: while(! nodeStack.isEmpty()) {
+ final int currentNode = nodeStack.topInt();
+ final LazyIntIterator successors = successorsStack.top();
+
+ for(int s; (s = successors.nextInt()) != -1;) {
+ final int successorStatus = status[s];
+ if (successorStatus == 0) {
+ status[s] = ++clock;
+ nodeStack.push(s);
+ componentStack.push(s);
+ successorsStack.push(graph.successors(s));
+ olderNodeFound.push(false);
+ if (computeBuckets && graph.outdegree(s) == 0) nonBuckets.set(s);
+ continue main;
+ }
+ else if (successorStatus > 0) {
+ if (successorStatus < status[currentNode]) {
+ status[currentNode] = successorStatus;
+ olderNodeFound.popBoolean();
+ olderNodeFound.push(true);
+ }
+ }
+ else if (computeBuckets) nonBuckets.set(currentNode);
+ }
+
+ nodeStack.popInt();
+ successorsStack.pop();
+ if (pl != null) pl.lightUpdate();
+
+ if (olderNodeFound.popBoolean()) {
+ final int parentNode = nodeStack.topInt();
+ final int currentNodeStatus = status[currentNode];
+ if (currentNodeStatus < status[parentNode]) {
+ status[parentNode] = currentNodeStatus;
+ olderNodeFound.popBoolean();
+ olderNodeFound.push(true);
+ }
+
+ if (computeBuckets && nonBuckets.getBoolean(currentNode)) nonBuckets.set(parentNode);
+ }
+ else {
+ if (computeBuckets && ! nodeStack.isEmpty()) nonBuckets.set(nodeStack.topInt());
+ final boolean notABucket = computeBuckets ? nonBuckets.getBoolean(currentNode) : false;
+ numberOfComponents++;
+ int z;
+ do {
+ z = componentStack.popInt();
+ // Component markers are -c-1, where c is the component number.
+ status[z] = -numberOfComponents;
+ if (notABucket) nonBuckets.set(z);
+ } while(z != currentNode);
+ }
+ }
+ }
+
+
+ public void run() {
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (int x = 0; x < n; x++) if (status[x] == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for(int i = status.length; i-- != 0;) status[i] = -status[i] - 1;
+
+ if (buckets != null) buckets.flip();
+ }
+ }
+
+ /** Computes the strongly connected components of a given graph.
+ *
+ * @param graph the graph whose strongly connected components are to be computed.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponents compute(final ImmutableGraph graph, final boolean computeBuckets, final ProgressLogger pl) {
+ final int n = graph.numNodes();
+ final Visit visit = new Visit(graph, new int[n], computeBuckets ? LongArrayBitVector.ofLength(n) : null, pl);
+ visit.run();
+ return new StronglyConnectedComponents(visit.numberOfComponents, visit.status, visit.buckets);
+ }
+
+
+ private final static class FilteredVisit {
+ /** The graph. */
+ private final ArcLabelledImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final int n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** A filter on arc labels. */
+ private final LabelledArcFilter filter;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final int[] status;
+ /** The buckets. */
+ private final LongArrayBitVector buckets;
+ /** The component stack. */
+ private final IntArrayList componentStack;
+ /** The first-visit clock (incremented at each visited node). */
+ private int clock;
+ /** The number of components already output. */
+ private int numberOfComponents;
+
+ private FilteredVisit(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final int[] status, final LongArrayBitVector buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.filter = filter;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ componentStack = new IntArrayList(n);
+ }
+
+ private long filteredOutdegree(final int node) {
+ // Definitely not so efficient, ma very simple.
+ long filteredOutdegree = 0;
+ final LabelledArcIterator successors = graph.successors(node);
+ for(int s; (s = successors.nextInt()) != -1;) if (filter.accept(node, s, successors.label())) filteredOutdegree++;
+ return filteredOutdegree;
+ }
+
+ /** Performs a visit starting form a given node.
+ *
+ * @param startNode the first node to visit.
+ */
+ private void visit(final int startNode) {
+ final LongArrayBitVector olderNodeFound = LongArrayBitVector.ofLength(n);
+ final IntStack nodeStack = new IntArrayList();
+ final Stack<LabelledArcIterator> successorsStack = new ObjectArrayList<>();
+ final int[] status = this.status;
+ // For simplicify, we compute nonbuckets and then flip the values.
+ final LongArrayBitVector nonBuckets = this.buckets;
+
+ status[startNode] = ++clock;
+ componentStack.push(startNode);
+ nodeStack.push(startNode);
+ successorsStack.push(graph.successors(startNode));
+ if (computeBuckets && filteredOutdegree(startNode) == 0) nonBuckets.set(startNode);
+
+ main: while(! nodeStack.isEmpty()) {
+ final int currentNode = nodeStack.topInt();
+ final LabelledArcIterator successors = successorsStack.top();
+
+ for(int s; (s = successors.nextInt()) != -1;) {
+ if (! filter.accept(currentNode, s, successors.label())) continue;
+ final int successorStatus = status[s];
+ if (successorStatus == 0) {
+ status[s] = ++clock;
+ nodeStack.push(s);
+ componentStack.push(s);
+ successorsStack.push(graph.successors(s));
+ if (computeBuckets && filteredOutdegree(s) == 0) nonBuckets.set(s);
+ continue main;
+ }
+ else if (successorStatus > 0) {
+ if (successorStatus < status[currentNode]) {
+ status[currentNode] = successorStatus;
+ olderNodeFound.set(currentNode);
+ }
+ }
+ else if (computeBuckets) nonBuckets.set(currentNode);
+ }
+
+ nodeStack.popInt();
+ successorsStack.pop();
+ if (pl != null) pl.lightUpdate();
+
+ if (olderNodeFound.getBoolean(currentNode)) {
+ final int parentNode = nodeStack.topInt();
+ final int currentNodeStatus = status[currentNode];
+ if (currentNodeStatus < status[parentNode]) {
+ status[parentNode] = currentNodeStatus;
+ olderNodeFound.set(parentNode);
+ }
+
+ if (computeBuckets && nonBuckets.getBoolean(currentNode)) nonBuckets.set(parentNode);
+ }
+ else {
+ if (computeBuckets && ! nodeStack.isEmpty()) nonBuckets.set(nodeStack.topInt());
+ final boolean notABucket = computeBuckets ? nonBuckets.getBoolean(currentNode) : false;
+ numberOfComponents++;
+ int z;
+ do {
+ z = componentStack.popInt();
+ // Component markers are -c-1, where c is the component number.
+ status[z] = -numberOfComponents;
+ if (notABucket) nonBuckets.set(z);
+ } while(z != currentNode);
+ }
+ }
+ }
+
+
+ public void run() {
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (int x = 0; x < n; x++) if (status[x] == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for(int i = status.length; i-- != 0;) status[i] = -status[i] - 1;
+
+ if (buckets != null) buckets.flip();
+ }
+ }
+
+ /** Computes the strongly connected components of a given arc-labelled graph, filtering its arcs.
+ *
+ * @param graph the arc-labelled graph whose strongly connected components are to be computed.
+ * @param filter a filter selecting the arcs that must be taken into consideration.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponents compute(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final boolean computeBuckets, final ProgressLogger pl) {
+ final int n = graph.numNodes();
+ FilteredVisit filteredVisit = new FilteredVisit(graph, filter, new int[n], computeBuckets ? LongArrayBitVector.ofLength(n) : null, pl);
+ filteredVisit.run();
+ return new StronglyConnectedComponents(filteredVisit.numberOfComponents, filteredVisit.status, filteredVisit.buckets);
+ }
+
+
+ /** Returns the size array for this set of strongly connected components.
+ *
+ * @return the size array for this set of strongly connected components.
+ */
+ public int[] computeSizes() {
+ final int[] size = new int[numberOfComponents];
+ for(int i = component.length; i-- != 0;) size[component[i]]++;
+ return size;
+ }
+
+ /** Renumbers by decreasing size the components of this set.
+ *
+ * <p>After a call to this method, both the internal status of this class and the argument
+ * array are permuted so that the sizes of strongly connected components are decreasing
+ * in the component index.
+ *
+ * @param size the components sizes, as returned by {@link #computeSizes()}.
+ */
+ public void sortBySize(final int[] size) {
+ final int[] perm = Util.identity(size.length);
+ IntArrays.parallelRadixSortIndirect(perm, size, false);
+ IntArrays.reverse(perm);
+ final int[] copy = size.clone();
+ for (int i = size.length; i-- != 0;) size[i] = copy[perm[i]];
+ Util.invertPermutationInPlace(perm);
+ for(int i = component.length; i-- != 0;) component[i] = perm[component[i]];
+ }
+
+
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(StronglyConnectedComponents.class.getName(),
+ "Computes the strongly connected components (and optionally the buckets) of a graph of given basename. The resulting data is saved " +
+ "in files stemmed from the given basename with extension .scc (a list of binary integers specifying the " +
+ "component of each node), .sccsizes (a list of binary integer specifying the size of each component) and .buckets " +
+ " (a serialised LongArrayBigVector specifying buckets). Please use suitable JVM options to set a large stack size.",
+ new Parameter[] {
+ new Switch("sizes", 's', "sizes", "Compute component sizes."),
+ new Switch("renumber", 'r', "renumber", "Renumber components in decreasing-size order."),
+ new Switch("buckets", 'b', "buckets", "Compute buckets (nodes belonging to a bucket component, i.e., a terminal nondangling component)."),
+ new FlaggedOption("filter", new ObjectParser(LabelledArcFilter.class, GraphClassParser.PACKAGE), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "filter", "A filter for labelled arcs; requires the provided graph to be arc labelled."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final String resultsBasename = jsapResult.getString("resultsBasename", basename);
+ final LabelledArcFilter filter = (LabelledArcFilter)jsapResult.getObject("filter");
+ ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ final StronglyConnectedComponents components =
+ filter != null ? StronglyConnectedComponents.compute(ArcLabelledImmutableGraph.load(basename), filter, jsapResult.getBoolean("buckets"), pl)
+ : StronglyConnectedComponents.compute(ImmutableGraph.load(basename), jsapResult.getBoolean("buckets"), pl);
+
+ if (jsapResult.getBoolean("sizes") || jsapResult.getBoolean("renumber")) {
+ final int size[] = components.computeSizes();
+ if (jsapResult.getBoolean("renumber")) components.sortBySize(size);
+ if (jsapResult.getBoolean("sizes")) BinIO.storeInts(size, resultsBasename + ".sccsizes");
+ }
+ BinIO.storeInts(components.component, resultsBasename + ".scc");
+ if (components.buckets != null) BinIO.storeObject(components.buckets, resultsBasename + ".buckets");
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SumSweepDirectedDiameterRadius.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SumSweepDirectedDiameterRadius.java
new file mode 100644
index 0000000..b693e16
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SumSweepDirectedDiameterRadius.java
@@ -0,0 +1,1170 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2016-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.lang.EnumStringParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.Transform;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/**
+ * Computes the radius and/or the diameter and/or all eccentricities of a graph,
+ * using the SumSweep algorithm described by Michele Borassi, Pierluigi
+ * Crescenzi, Michel Habib, Walter A. Kosters, Andrea Marino, and Frank W. Takes
+ * in &ldquo;Fast diameter and radius BFS-based computation in (weakly
+ * connected) real-world graphs&mdash;With an application to the six degrees of
+ * separation games &rdquo;, <i>Theoretical Computer Science</i>,
+ * 586:59&minus;80, 2015.
+ *
+ * <p>
+ * We define the <em>positive</em>, or <em>forward</em> (resp.,
+ * <em>negative</em>, or <em>backward</em>) <em>eccentricity</em> of a node
+ * <var>v</var> in a graph <var>G</var>=(<var>V</var>,<var>E</var>) as ecc
+ * <sup>+</sup>(<var>v</var>)=max<sub><var>w</var> reachable from
+ * <var>v</var></sub> d(<var>v</var>,<var>w</var>) (resp., ecc<sup>&minus;</sup>(
+ * <var>v</var>)=max<sub><var>w</var> reaches <var>v</var></sub> <var>d</var>(
+ * <var>w</var>,<var>v</var>)), where <var>d</var>(<var>v</var>,<var>w</var>) is
+ * the number of edges in a shortest path from <var>v</var> to <var>w</var>. The
+ * diameter is max <sub><var>v</var>&isin;<var>V</var></sub> ecc<sup>+</sup>(
+ * <var>v</var>), which is also equal to max <sub><var>v</var>&isin;
+ * <var>V</var></sub> ecc<sup>&minus;</sup>(<var>v</var>), while the radius is min
+ * <sub><var>v</var>&isin;<var>V</var>'</sub> ecc(<var>v</var>), where
+ * <var>V</var>' is a set of vertices specified by the user. These definitions
+ * are slightly different from the standard ones due to the restriction to
+ * reachable nodes. In particular, if we simply define the radius as the minimum
+ * eccentricity, the radius of a graph containing a vertex with out-degree 0
+ * would be 0, and this does not make much sense. For this reason, we restrict
+ * our attention only to a subset <var>V</var>' of the set of all vertices: by
+ * choosing a suitable <var>V</var>', we can specialize this definition to all
+ * definitions proposed in the literature. If <var>V</var>' is not specified, we
+ * include in <var>V</var>' all vertices from which it is possible to reach the
+ * largest strongly connected component, as suggested in the aforementioned
+ * paper.
+ *
+ * <p>
+ * Our algorithm performs some BFSs from "clever" vertices, and uses these BFSs
+ * to bound the eccentricity of all vertices. More specifically, for each vertex
+ * <var>v</var>, this algorithm keeps a lower and an upper bound on the forward
+ * and backward eccentricity of <var>v</var>, named <var>lF</var>[<var>v</var>],
+ * <var>lB</var>[<var>v</var>], <var>uF</var>[<var>v</var>], and <var>uB</var>[
+ * <var>v</var>]. Furthermore, it keeps a lower bound <var>dL</var> on the
+ * diameter and an upper bound <var>rU</var> on the radius. At each step, the
+ * algorithm performs a BFS, and it updates all these bounds: the radius is
+ * found as soon as <var>rU</var> is smaller than the minimum value of
+ * <var>lF</var>, and the diameter is found as soon as <var>dL</var> is bigger
+ * than <var>uF</var>[<var>v</var>] for each <var>v</var>, or <var>dL</var> is
+ * bigger than <var>uB</var>[<var>v</var>] for each <var>v</var>.
+ *
+ * <p>
+ * More specifically, the upper bound on the radius (resp., lower bound on the
+ * diameter) is defined as the minimum forward (resp., maximum forward or
+ * backward) eccentricity of a vertex from which we performed a BFS. Moreover,
+ * if we perform a forward (resp., backward) BFS from a vertex <var>s</var>, we
+ * update <var>lB</var>[<var>v</var>]=max(<var>lB</var>[<var>v</var>], d(
+ * <var>s</var>, <var>v</var>)) (resp., <var>lF</var>[<var>v</var>]=max(
+ * <var>lF</var>[<var>v</var>], d(<var>v</var>, <var>s</var>)). Finally, for the
+ * upper bounds, we use a more complicated procedure that handles different
+ * strongly connected components separately.
+ *
+ * <p>
+ * To use this class, it is enough to create an instance, and then invoke
+ * {@link #compute()}. It is possible to choose between the following stopping
+ * conditions:
+ * <ul>
+ * <li>only the radius is found;</li>
+ * <li>only the diameter is found;</li>
+ * <li>radius and diameter are found;</li>
+ * <li>all forward eccentricities are found;</li>
+ * <li>all eccentricities are found.</li>
+ * </ul>
+ *
+ * <p>
+ * After the method {@link #compute()} is run, the output can be obtained
+ * through the methods {@link #getRadius()} for the radius, {@link #getRadialVertex()} for a
+ * radial vertex, {@link #getDiameter()} for the diameter, {@link #getDiametralVertex()} for a
+ * vertex whose (forward or backward) eccentricity equals the diameter,
+ * {@link #getEccentricity(int, boolean)} for the forward or backward eccentricities.
+ * Similarly, one can use the methods {@link #getRadiusIterations()} An exception is raised
+ * if the field has not been computed.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>
+ * Although the running-time is <var>O</var>(<var>mn</var>) in the worst-case,
+ * the algorithm is usually much more efficient on real-world networks, when
+ * only radius and diameter are needed. If all eccentricities are needed, the
+ * algorithm could be faster than <var>O</var>(<var>mn</var>), but in many
+ * networks it achieves performances similar to the textbook algorithm, that
+ * performs a breadth-first search from each node.
+ *
+ * @author Michele Borassi
+ */
+
+public class SumSweepDirectedDiameterRadius {
+ private static final boolean DEBUG = true;
+
+ /**
+ * TODO: find better way to do it
+ * Returns the index <var>i</var> such that
+ * <var>vec</var>[<var>i</var>] is maximum.
+ *
+ * @param vec
+ * the vector of which we want to compute the argMax
+ * @return the value <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * maximum
+ */
+ public static int argMax(final double[] vec) {
+ double max = -Double.MAX_VALUE;
+ int argMax = -1;
+ for (int i = 0; i < vec.length; i++) {
+ if (vec[i] > max) {
+ argMax = i;
+ max = vec[i];
+ }
+ }
+ return argMax;
+ }
+
+ /**
+ * Returns the index <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * maximum.
+ *
+ * @param vec
+ * the vector of which we want to compute the argMax
+ * @return the value <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * maximum
+ */
+ public static int argMax(final int[] vec) {
+ int max = Integer.MIN_VALUE;
+ int argMax = -1;
+ for (int i = 0; i < vec.length; i++) {
+ if (vec[i] > max) {
+ argMax = i;
+ max = vec[i];
+ }
+ }
+ return argMax;
+ }
+
+ /**
+ * Returns the index <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * maximum, among all indices such that <var>acc</var>[<var>i</var>] is
+ * true. In case of tie, the index maximizing <var>tieBreak</var> is chosen.
+ *
+ * @param vec
+ * the vector of which we want to compute the argMax
+ * @param tieBreak
+ * the tiebreak vector
+ * @param acc
+ * the vector used to decide if an index is acceptable: a
+ * negative value means that the vertex is acceptable
+ * @return the value <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * maximum
+ */
+ public static int argMax(final int[] vec, final int[] tieBreak, final boolean acc[]) {
+
+ int max = Integer.MIN_VALUE, maxTieBreak = Integer.MIN_VALUE, argMax = -1;
+ for (int i = 0; i < vec.length; i++) {
+ if (acc[i] && (vec[i] > max || (vec[i] == max && tieBreak[i] > maxTieBreak))) {
+ argMax = i;
+ max = vec[i];
+ maxTieBreak = tieBreak[i];
+ }
+ }
+ return argMax;
+ }
+
+ /**
+ * Returns the index <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * minimum, among all indices such that <var>acc</var>[<var>i</var>] is
+ * true. In case of tie, the index minimizing <var>tieBreak</var> is chosen.
+ *
+ * @param vec
+ * the vector of which we want to compute the argMax
+ * @param tieBreak
+ * the tiebreak vector
+ * @param acc
+ * the vector used to decide if an index is acceptable: a
+ * negative value means that the vertex is acceptable
+ * @return the value <var>i</var> such that <var>vec</var>[<var>i</var>] is
+ * maximum
+ */
+ public static int argMin(final int[] vec, final int[] tieBreak, final boolean acc[]) {
+
+ int min = Integer.MAX_VALUE, minTieBreak = Integer.MAX_VALUE, argMin = -1;
+ for (int i = 0; i < vec.length; i++) {
+ if (acc[i] && (vec[i] < min || (vec[i] == min && tieBreak[i] < minTieBreak))) {
+ argMin = i;
+ min = vec[i];
+ minTieBreak = tieBreak[i];
+ }
+ }
+ return argMin;
+ }
+
+ private final static Logger LOGGER = LoggerFactory.getLogger(SumSweepDirectedDiameterRadius.class);
+
+ /**
+ * The type of output requested: radius, diameter, radius and diameter, all
+ * forward eccentricities, or all (forward and backward) eccentricities.
+ */
+ public enum OutputLevel {
+ /**
+ * Computes only the radius of the graph.
+ */
+ RADIUS,
+ /**
+ * Computes only the diameter of the graph.
+ */
+ DIAMETER,
+ /**
+ * Computes both radius and diameter.
+ */
+ RADIUS_DIAMETER,
+ /**
+ * Computes the radius, the diameter, and all the forward
+ * eccentricities.
+ */
+ ALL_FORWARD,
+ /**
+ * Computes the radius, the diameter, and all the (forward and backward)
+ * eccentricities.
+ */
+ ALL
+ };
+
+ /** The graph under examination. */
+ private final ImmutableGraph graph;
+ /** The reversed graph. */
+ private final ImmutableGraph revgraph;
+ /** The number of nodes. */
+ private final int nn;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The kind of output requested. */
+ private final OutputLevel output;
+ /** The array of forward eccentricity value. */
+ private final int[] eccF;
+ /** The array of backward eccentricity value. */
+ private final int[] eccB;
+ /**
+ * <var>toCompleteF</var>[<var>v</var>] is <var>True</var> if and only if
+ * the forward eccentricity of <var>v</var> is not guaranteed, yet.
+ */
+ private final boolean[] toCompleteF;
+ /**
+ * <var>toCompleteB</var>[<var>v</var>] is <var>True</var> if and only if
+ * the backward eccentricity of <var>v</var> is not guaranteed, yet.
+ */
+ private final boolean[] toCompleteB;
+ /** The set of vertices that can be radial vertices. */
+ private final boolean[] accRadial;
+ /** The queue used for each BFS (it is recycled to save some time). */
+ private final int[] queue;
+ /**
+ * The array of distances, used in each BFS (it is recycled to save some
+ * time).
+ */
+ private final int[] dist;
+ /** Upper bound on the radius of the graph. */
+ private int dL;
+ /** Upper bound on the radius of the graph. */
+ private int rU;
+ /** A vertex whose eccentricity equals the diameter. */
+ private int dV;
+ /** A vertex whose eccentricity equals the radius. */
+ private int rV;
+ /** Number of iterations performed until now. */
+ private int iter;
+ /** Lower bound on the forward eccentricity. */
+ protected int lF[];
+ /** Upper bound on the forward eccentricity. */
+ protected int uF[];
+ /** Lower bound on the backward eccentricity. */
+ protected int lB[];
+ /** Upper bound on the backward eccentricity. */
+ protected int uB[];
+ /** Number of iteration before the radius is found. */
+ private int iterR;
+ /** Number of iteration before the diameter is found. */
+ private int iterD;
+ /** Number of iteration before all forward eccentricities are found. */
+ private int iterAllF;
+ /** Number of iteration before all eccentricities are found. */
+ private int iterAll;
+ /** Strongly connected components of the graph. */
+ private final StronglyConnectedComponents scc;
+ /** The strongly connected components digraph. */
+ private final int[][] sccGraph;
+ /**
+ * For each edge in the SCC graph, the start vertex of a corresponding edge
+ * in the graph
+ */
+ private final int[][] startBridges;
+ /**
+ * For each edge in the SCC graph, the end vertex of a corresponding edge in
+ * the graph
+ */
+ private final int[][] endBridges;
+
+ /**
+ * Total forward distance from already processed vertices (used as tie-break
+ * for the choice of the next vertex to process).
+ */
+ private final int totDistF[];
+ /**
+ * Total backward distance from already processed vertices (used as
+ * tie-break for the choice of the next vertex to process).
+ */
+ private final int totDistB[];
+
+ /**
+ * Creates a new class for computing diameter and/or radius and/or all
+ * eccentricities.
+ *
+ * @param graph
+ * a graph.
+ * @param pl
+ * a progress logger, or {@code null}.
+ * @param output
+ * which output is requested: radius, diameter, radius and
+ * diameter, or all eccentricities.
+ * @param accRadial
+ * the set of vertices that can be considered radial vertices. If
+ * null, the set is automatically chosen as the set of vertices
+ * that are in the biggest strongly connected component, or that
+ * are able to reach the biggest strongly connected component.
+ */
+ public SumSweepDirectedDiameterRadius(final ImmutableGraph graph, final OutputLevel output,
+ final boolean[] accRadial, final ProgressLogger pl) {
+ this.pl = pl;
+ this.graph = graph;
+ this.revgraph = Transform.transpose(graph);
+ this.nn = graph.numNodes();
+ this.eccF = new int[nn];
+ this.eccB = new int[nn];
+ totDistF = new int[nn];
+ totDistB = new int[nn];
+ lF = new int[nn];
+ lB = new int[nn];
+ uF = new int[nn];
+ uB = new int[nn];
+ toCompleteF = new boolean[nn];
+ toCompleteB = new boolean[nn];
+ queue = new int[nn];
+ dist = new int[nn];
+ scc = StronglyConnectedComponents.compute(graph, false, null);
+ startBridges = new int[scc.numberOfComponents][];
+ endBridges = new int[scc.numberOfComponents][];
+ sccGraph = new int[scc.numberOfComponents][];
+
+ Arrays.fill(eccF, -1);
+ Arrays.fill(eccB, -1);
+ Arrays.fill(uF, nn + 1);
+ Arrays.fill(uB, nn + 1);
+ Arrays.fill(toCompleteF, true);
+ Arrays.fill(toCompleteB, true);
+ this.dL = 0;
+ this.rU = Integer.MAX_VALUE;
+ this.output = output;
+ iterR = -1;
+ iterD = -1;
+ iterAllF = -1;
+ iterAll = -1;
+
+ if (accRadial == null) {
+ this.accRadial = new boolean[nn];
+ computeAccRadial();
+ } else if (accRadial.length != nn)
+ throw new IllegalArgumentException(
+ "The size of the array of acceptable vertices must be equal to the number of nodes in the graph.");
+ else {
+ this.accRadial = accRadial;
+ }
+
+ findEdgesThroughSCC();
+ }
+
+ /**
+ * Returns the radius of the graph, if it has already been computed
+ * (otherwise, an exception is raised).
+ *
+ * @return the radius
+ */
+ public int getRadius() {
+ if (iterR == -1) {
+ throw new UnsupportedOperationException("The radius has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return rU;
+ }
+
+ /**
+ * Returns the diameter, if it has already been computed (otherwise, an
+ * exception is raised).
+ *
+ * @return the diameter
+ */
+ public int getDiameter() {
+ if (iterD == -1) {
+ throw new UnsupportedOperationException("The diameter has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return dL;
+ }
+
+ /**
+ * Returns a radial vertex, if it has already been computed (otherwise, an
+ * exception is raised).
+ *
+ * @return a radial vertex
+ */
+ public int getRadialVertex() {
+ if (iterR == -1) {
+ throw new UnsupportedOperationException("The radius has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return rV;
+ }
+
+ /**
+ * Returns a diametral vertex, if it has already been computed (otherwise,
+ * an exception is raised).
+ *
+ * @return a diametral vertex
+ */
+ public int getDiametralVertex() {
+ if (iterD == -1) {
+ throw new UnsupportedOperationException("The radius has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return dV;
+ }
+
+ /**
+ * Returns the eccentricity of a vertex, if it has already been computed
+ * (otherwise, an exception is raised).
+ *
+ * @param v
+ * the vertex
+ * @param forward
+ * if <var>True</var>, the forward eccentricity is returned,
+ * otherwise the backward eccentricity
+ * @return the eccentricity of <var>v</var>
+ */
+ public int getEccentricity(int v, boolean forward) {
+ int ecc = forward ? eccF[v] : eccB[v];
+
+ if (ecc == -1) {
+ throw new UnsupportedOperationException("The eccentricity of v has not been"
+ + "computed, yet. Please, use the compute method with" + "the correct output.");
+ }
+ return ecc;
+ }
+
+ /**
+ * Returns the number of iteration needed to compute the radius, if it has
+ * already been computed (otherwise, an exception is raised).
+ *
+ * @return the number of iterations before the radius is found
+ */
+ public int getRadiusIterations() {
+ if (iterR == -1) {
+ throw new UnsupportedOperationException("The radius has not been "
+ + "computed, yet. Please, run the compute method with " + "the correct output.");
+ }
+ return iterR;
+ }
+
+ /**
+ * Returns the number of iteration needed to compute the diameter, if it has
+ * already been computed (otherwise, an exception is raised).
+ *
+ * @return the number of iterations before the diameter is found
+ */
+ public int getDiameterIterations() {
+ if (iterD == -1) {
+ throw new UnsupportedOperationException("The diameter has not been "
+ + "computed, yet. Please, run the compute method with the correct output.");
+ }
+ return iterD;
+ }
+
+ /**
+ * Returns the number of iteration needed to compute all forward
+ * eccentricities, if they have already been computed (otherwise, an
+ * exception is raised).
+ *
+ * @return the number of iterations before all forward eccentricities are
+ * found
+ */
+ public int getAllForwardIterations() {
+ if (iterAllF == -1) {
+ throw new UnsupportedOperationException("All forward eccentricities have not been "
+ + " computed, yet. Please, run the compute method with the correct output.");
+ }
+ return iterAllF;
+ }
+
+ /**
+ * Returns the number of iteration needed to compute all eccentricities, if
+ * they have already been computed (otherwise, an exception is raised).
+ *
+ * @return the number of iterations before all eccentricities are found
+ */
+ public int getAllIterations() {
+ if (iterAll == -1) {
+ throw new UnsupportedOperationException("All eccentricities have not been "
+ + " computed, yet. Please, run the compute method with the correct output.");
+ }
+ return iterAll;
+ }
+
+ /**
+ * Uses a heuristic to decide which is the best pivot to choose in each
+ * strongly connected component, in order to perform the
+ * {@link #allCCUpperBound(int[])} function.
+ *
+ * @return an array containing in position <var>i</var> the pivot of the
+ * <var>i</var>th component.
+ */
+ private int[] findBestPivot() {
+ int lF[] = this.lF;
+ int lB[] = this.lB;
+ int totDistF[] = this.totDistF;
+ int totDistB[] = this.totDistB;
+ int nn = this.nn;
+ boolean toCompleteF[] = this.toCompleteF;
+ boolean toCompleteB[] = this.toCompleteB;
+ int pivot[] = new int[this.scc.numberOfComponents];
+ Arrays.fill(pivot, -1);
+ int sccs[] = scc.component;
+ int p;
+ long best, current;
+
+ for (int v = nn - 1; v >= 0; v--) {
+ p = pivot[sccs[v]];
+ if (p == -1) {
+ pivot[sccs[v]] = v;
+ continue;
+ }
+ current = (long) lF[v] + lB[v] + (toCompleteF[v] ? 0 : 1) * nn + (toCompleteB[v] ? 0 : 1) * nn;
+ best = (long) lF[p] + lB[p] + (toCompleteF[p] ? 0 : 1) * nn + (toCompleteB[p] ? 0 : 1) * nn;
+
+ if (current < best || (current == best && totDistF[v] + totDistB[v] <= totDistF[p] + totDistB[p])) {
+ pivot[sccs[v]] = v;
+ }
+ }
+ return pivot;
+ }
+
+ /**
+ * Computes and stores in variable <var>accRadial</var> the set of vertices
+ * that are either in the biggest strongly connected component, or that are
+ * able to reach vertices in the biggest strongly connected component.
+ */
+ private void computeAccRadial() {
+ if (nn == 0) {
+ return;
+ }
+ boolean accRadial[] = this.accRadial;
+ int sccs[] = scc.component;
+
+ int sccSizes[] = scc.computeSizes();
+ int maxSizeSCC = argMax(sccSizes);
+ int v = 0;
+
+ for (v = nn; v-- > 0;) {
+ if (sccs[v] == maxSizeSCC) {
+ break;
+ }
+ }
+ ParallelBreadthFirstVisit bfs = new ParallelBreadthFirstVisit(revgraph, 0, false, null);
+ bfs.visit(v);
+ for (int i = nn; i-- > 0;) {
+ accRadial[i] = bfs.marker.get(i) >= 0;
+ }
+ }
+
+ /**
+ * Performs a (forward or backward) BFS, updating lower bounds on the
+ * eccentricities of all visited vertices.
+ *
+ * @param start
+ * the starting vertex of the BFS
+ * @param forward
+ * if <var>True</var>, the BFS is performed following the
+ * direction of edges, otherwise it is performed in the opposite
+ * direction
+ */
+ private void stepSumSweep(final int start, final boolean forward) {
+ if (start == -1) {
+ return;
+ }
+ int queue[] = this.queue;
+ int dist[] = this.dist;
+ int startQ = 0, endQ = 0;
+ int v, w, eccStart;
+ int[] l, lOther, u, uOther, totDistOther, ecc, eccOther;
+ boolean[] toComplete, toCompleteOther;
+
+ Arrays.fill(dist, -1);
+
+ ImmutableGraph g;
+
+ if (forward) {
+ l = lF;
+ lOther = lB;
+ u = uF;
+ uOther = uB;
+ totDistOther = totDistB;
+ g = graph;
+ ecc = eccF;
+ eccOther = eccB;
+ toComplete = toCompleteF;
+ toCompleteOther = toCompleteB;
+ } else {
+ l = lB;
+ lOther = lF;
+ u = uB;
+ uOther = uF;
+ totDistOther = totDistF;
+ g = revgraph;
+ ecc = eccB;
+ eccOther = eccF;
+ toComplete = toCompleteB;
+ toCompleteOther = toCompleteF;
+ }
+
+ LazyIntIterator iter;
+
+ queue[endQ++] = start;
+ dist[start] = 0;
+
+ while (startQ < endQ) {
+ v = queue[startQ++];
+ iter = g.successors(v);
+
+ while ((w = iter.nextInt()) != -1) {
+ if (dist[w] == -1) {
+ dist[w] = dist[v] + 1;
+ queue[endQ++] = w;
+ }
+ }
+ }
+
+ eccStart = dist[queue[endQ - 1]];
+
+ l[start] = eccStart;
+ u[start] = eccStart;
+ ecc[start] = eccStart;
+ toComplete[start] = false;
+
+ if (dL < eccStart) {
+ dL = eccStart;
+ dV = start;
+ }
+ if (forward) {
+ if (this.accRadial[start] && rU > eccStart) {
+ rU = eccStart;
+ rV = start;
+ }
+ }
+
+ for (v = nn - 1; v >= 0; v--) {
+
+ if (dist[v] == -1)
+ continue;
+
+ totDistOther[v] += dist[v];
+
+ if (toCompleteOther[v]) {
+ if (lOther[v] < dist[v]) {
+ lOther[v] = dist[v];
+ if (lOther[v] == uOther[v]) {
+ toCompleteOther[v] = false;
+ eccOther[v] = lOther[v];
+
+ if (!forward && this.accRadial[v] && eccOther[v] < rU) {
+ rU = eccOther[v];
+ rV = v;
+ }
+ }
+ }
+ }
+ }
+ this.iter++;
+ if (pl != null)
+ pl.update();
+ }
+
+ /**
+ * Performs <var>iter</var> steps of the SumSweep heuristic, starting from
+ * vertex <var>start</var>.
+ *
+ * @param start
+ * the starting vertex
+ * @param iter
+ * the number of iterations
+ */
+ public void sumSweepHeuristic(final int start, final int iter) {
+
+ if (DEBUG)
+ LOGGER.debug("Performing initial SumSweep visit from " + start + ".");
+ this.stepSumSweep(start, true);
+
+ for (int i = 2; i < iter; i++) {
+ if (i % 2 == 0) {
+ int v = argMax(totDistB, lB, toCompleteB);
+ if (DEBUG)
+ LOGGER.debug("Performing initial SumSweep visit from " + v + ".");
+ this.stepSumSweep(v, false);
+ } else {
+ int v = argMax(totDistF, lF, toCompleteF);
+ if (DEBUG)
+ LOGGER.debug("Performing initial SumSweep visit from " + v + ".");
+ this.stepSumSweep(v, true);
+ }
+ }
+ }
+
+ /**
+ * For each edge in the DAG of strongly connected components, finds a
+ * corresponding edge in the graph. This edge is used in the
+ * {@link #allCCUpperBound(int[])} function.
+ */
+ private void findEdgesThroughSCC() {
+ final int sccs[] = scc.component;
+ final int nscc = scc.numberOfComponents;
+ final int bestStart[] = new int[nscc];
+ final int bestEnd[] = new int[nscc];
+ int nSons;
+ final ImmutableGraph graph = this.graph;
+ final ImmutableGraph revgraph = this.revgraph;
+ final int[][] sccGraph = this.sccGraph;
+ final int[][] startBridges = this.startBridges;
+ final int[][] endBridges = this.endBridges;
+
+ IntArrayList childComponents = new IntArrayList();
+ LazyIntIterator iter;
+ int w, cw;
+
+ Arrays.fill(bestStart, -1);
+ Arrays.fill(bestEnd, -1);
+
+ IntArrayList vertInSCC[] = new IntArrayList[nscc];
+ for (int i = vertInSCC.length; i-- > 0;) {
+ vertInSCC[i] = new IntArrayList();
+ }
+
+ for (int v = 0; v < nn; v++) {
+ vertInSCC[sccs[v]].add(v);
+ }
+
+ for (int c = 0; c < nscc; c++) {
+ IntArrayList component = vertInSCC[c];
+ childComponents = new IntArrayList();
+ for (int v : component) {
+ iter = graph.successors(v);
+ while ((w = iter.nextInt()) != -1) {
+ cw = sccs[w];
+ if (sccs[v] != sccs[w]) {
+ if (bestStart[cw] == -1) {
+ bestStart[cw] = v;
+ bestEnd[cw] = w;
+ childComponents.add(cw);
+ } else if (graph.outdegree(v) + revgraph.outdegree(w) > graph.outdegree(bestEnd[cw])
+ + revgraph.outdegree(bestStart[cw])) {
+ bestStart[cw] = v;
+ bestEnd[cw] = w;
+ }
+ }
+ }
+ }
+ nSons = childComponents.size();
+ sccGraph[c] = new int[nSons];
+ startBridges[c] = new int[nSons];
+ endBridges[c] = new int[nSons];
+ for (int i = 0; i < nSons; i++) {
+ cw = childComponents.getInt(i);
+ sccGraph[c][i] = cw;
+ startBridges[c][i] = bestStart[cw];
+ endBridges[c][i] = bestEnd[cw];
+ bestStart[cw] = -1;
+ }
+ }
+ }
+
+ /**
+ * Performs a (forward or backward) BFS inside each strongly connected
+ * component, starting from the pivot
+ *
+ * @param pivot
+ * an array containing in position <var>i</var> the the pivot of
+ * the <var>i</var>th strongly connected component
+ * @param forward
+ * if <var>True</var>, a forward visit is performed, otherwise a
+ * backward visit
+ * @return two arrays of <var>int</var>, implemented as a bidimensional
+ * array <var>a</var>[][]. The array <var>a</var>[1] contains the
+ * distance of each vertex from the pivot of its strongly connected
+ * component, while <var>a</var>[2] contains in position
+ * <var>i</var> the eccentricity of the pivot of the <var>i</var>th
+ * strongly connected component.
+ */
+ private int[][] computeDistPivot(final int[] pivot, final boolean forward) {
+ int nn = this.nn;
+ int scc[] = this.scc.component;
+ int eccPivot[] = new int[this.scc.numberOfComponents];
+ int queue[] = this.queue;
+ int startQ, endQ, v, w;
+ LazyIntIterator iter;
+
+ int distPivot[] = new int[nn];
+ Arrays.fill(distPivot, -1);
+
+ ImmutableGraph g;
+
+ if (forward)
+ g = graph;
+ else
+ g = revgraph;
+
+ for (int p : pivot) {
+ startQ = 0;
+ endQ = 0;
+ queue[endQ++] = p;
+ distPivot[p] = 0;
+
+ while (startQ < endQ) {
+ v = queue[startQ++];
+ iter = g.successors(v);
+
+ while ((w = iter.nextInt()) != -1) {
+ if (scc[w] == scc[p] && distPivot[w] == -1) {
+ distPivot[w] = distPivot[v] + 1;
+ eccPivot[scc[p]] = distPivot[w];
+ queue[endQ++] = w;
+ }
+ }
+ }
+ }
+ return new int[][] { distPivot, eccPivot };
+ }
+
+ /**
+ * Performs a step of the ExactSumSweep algorithm, by performing the
+ * {@link #allCCUpperBound(int[])} function (see the paper for more
+ * details).
+ *
+ * @param pivot
+ * an array containing in position <var>i</var> the pivot of the
+ * <var>i</var>th strongly connected component.
+ */
+ private void allCCUpperBound(final int[] pivot) {
+ final int[][] distEccF = computeDistPivot(pivot, true);
+ final int[][] distEccB = computeDistPivot(pivot, false);
+ final int distPivotF[] = distEccF[0];
+ final int eccPivotF[] = distEccF[1];
+ final int distPivotB[] = distEccB[0];
+ final int eccPivotB[] = distEccB[1];
+ final int[][] sccGraph = this.sccGraph, startBridges = this.startBridges, endBridges = this.endBridges;
+ final int[] uF = this.uF;
+ final int[] uB = this.uB;
+ final int[] lF = this.lF;
+ final int[] lB = this.lB;
+ final int[] eccF = this.eccF;
+ final int[] eccB = this.eccB;
+ final int[] sccs = scc.component;
+ final boolean[] accRadial = this.accRadial;
+ final boolean[] toCompleteB = this.toCompleteB;
+ final boolean[] toCompleteF = this.toCompleteF;
+ final int nscc = scc.numberOfComponents;
+
+ int p;
+
+ for (int c = 0; c < nscc; c++) {
+ p = pivot[c];
+ for (int i = 0; i < sccGraph[c].length; i++) {
+ int nextC = sccGraph[c][i];
+ int start = startBridges[c][i];
+ int end = endBridges[c][i];
+ eccPivotF[c] = Math.max(eccPivotF[c], distPivotF[start] + 1 + distPivotB[end] + eccPivotF[nextC]);
+ if (eccPivotF[c] >= uF[p]) {
+ eccPivotF[c] = uF[p];
+ break;
+ }
+ }
+ }
+ for (int c = nscc; c-- > 0;) {
+ for (int i = 0; i < sccGraph[c].length; i++) {
+ int nextC = sccGraph[c][i];
+ int start = startBridges[c][i];
+ int end = endBridges[c][i];
+ eccPivotB[nextC] = Math.max(eccPivotB[nextC], distPivotF[start] + 1 + distPivotB[end] + eccPivotB[c]);
+ if (eccPivotB[nextC] >= uB[pivot[nextC]]) {
+ eccPivotB[nextC] = uB[pivot[nextC]];
+ }
+ }
+ }
+ for (int v = 0; v < nn; v++) {
+ uF[v] = Math.min(uF[v], distPivotB[v] + eccPivotF[sccs[v]]);
+ if (uF[v] == lF[v]) {
+ // We do not have to check whether eccF(v)=D, because
+ // lF[v]=d(w,v)
+ // for some w from which we have already performed a BFS.
+ toCompleteF[v] = false;
+ eccF[v] = uF[v];
+ if (accRadial[v]) {
+ if (uF[v] < rU) {
+ rU = uF[v];
+ rV = v;
+ }
+ }
+ }
+
+ uB[v] = Math.min(uB[v], distPivotF[v] + eccPivotB[sccs[v]]);
+ if (uB[v] == lB[v]) {
+ toCompleteB[v] = false;
+ eccB[v] = uB[v];
+ // We do not have to check whether eccB(v)=D, because
+ // lB[v]=d(v,w)
+ // for some w from which we have already performed a BFS.
+ }
+ }
+ this.iter += 3;
+ }
+
+ /**
+ * Computes how many nodes are still to be processed, before outputting the
+ * result
+ *
+ * @return the number of nodes to be processed
+ */
+ private int findMissingNodes() {
+ int missingR = 0, missingDF = 0, missingDB = 0, missingAllF = 0, missingAllB = 0;
+ final boolean toCompleteF[] = this.toCompleteF;
+ final boolean toCompleteB[] = this.toCompleteB;
+ final boolean accRadial[] = this.accRadial;
+ final int[] uF = this.uF;
+ final int[] uB = this.uB;
+ final int[] lF = this.lF;
+ final int dL = this.dL;
+ final int rU = this.rU;
+
+ for (int v = nn; v-- > 0;) {
+ if (toCompleteF[v]) {
+ missingAllF++;
+ if (uF[v] > dL) {
+ missingDF++;
+ }
+ if (accRadial[v] && lF[v] < rU) {
+ missingR++;
+ }
+ }
+ if (toCompleteB[v]) {
+ missingAllB++;
+ if (uB[v] > dL) {
+ missingDB++;
+ }
+ }
+ }
+ if (missingR == 0 && iterR == -1) {
+ iterR = iter;
+ }
+ if ((missingDF == 0 || missingDB == 0) && iterD == -1) {
+ iterD = iter;
+ }
+ if (missingAllF == 0 && iterAllF == -1)
+ iterAllF = iter;
+ if (missingAllF == 0 && missingAllB == 0)
+ iterAll = iter;
+
+ switch (output) {
+ case RADIUS:
+ return missingR;
+ case DIAMETER:
+ return Math.min(missingDF, missingDB);
+ case RADIUS_DIAMETER:
+ return missingR + Math.min(missingDF, missingDB);
+ case ALL_FORWARD:
+ return missingAllF;
+ default:
+ return missingAllF + missingAllB;
+ }
+ }
+
+ /**
+ * Computes diameter, radius, and/or all eccentricities. Results can be
+ * accessed by methods such as {@link #getDiameter()},
+ * {@link #getRadialVertex()} and
+ * {@link #getEccentricity(int, boolean)}.
+ */
+ public void compute() {
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.itemsName = "nodes";
+ pl.displayLocalSpeed = true;
+ }
+ int maxDeg = Integer.MIN_VALUE, maxDegVert = -1;
+ for (int v = 0; v < nn; v++) {
+ if (graph.outdegree(v) > maxDeg) {
+ maxDeg = graph.outdegree(v);
+ maxDegVert = v;
+ }
+ }
+
+ sumSweepHeuristic(maxDegVert, 6);
+
+ double points[] = new double[6];
+ int missingNodes = findMissingNodes(), oldMissingNodes = missingNodes;
+
+ Arrays.fill(points, graph.numNodes());
+
+ while (missingNodes > 0) {
+
+ int stepToPerform = argMax(points);
+
+ switch (stepToPerform) {
+ case 0:
+ if (DEBUG)
+ LOGGER.debug("Performing AllCCUpperBound.");
+ this.allCCUpperBound(findBestPivot());
+ break;
+ case 1:
+ if (DEBUG)
+ LOGGER.debug("Performing a forward BFS, from a vertex maximizing the upper bound.");
+ this.stepSumSweep(argMax(uF, totDistF, toCompleteF), true);
+ break;
+ case 2:
+ if (DEBUG)
+ LOGGER.debug("Performing a forward BFS, from a vertex minimizing the lower bound.");
+ this.stepSumSweep(argMin(lF, totDistF, accRadial), true);
+ break;
+ case 3:
+ if (DEBUG)
+ LOGGER.debug("Performing a backward BFS, from a vertex maximizing the upper bound.");
+ this.stepSumSweep(argMax(uB, totDistB, toCompleteB), false);
+ break;
+ case 4:
+ if (DEBUG)
+ LOGGER.debug("Performing a backward BFS, from a vertex maximizing the distance sum.");
+ this.stepSumSweep(argMax(totDistB, uB, toCompleteB), false);
+ break;
+ case 5:
+ if (DEBUG)
+ LOGGER.debug("Performing a forward BFS, from a vertex maximizing the distance sum.");
+ this.stepSumSweep(argMax(totDistF, uF, toCompleteF), false);
+ break;
+ }
+ oldMissingNodes = missingNodes;
+ missingNodes = this.findMissingNodes();
+ points[stepToPerform] = oldMissingNodes - missingNodes;
+
+ for (int j = 0; j < points.length; j++) {
+ if (j != stepToPerform && points[j] >= 0) {
+ points[j] = points[j] + 2.0 / iter;
+ }
+ }
+ if (DEBUG)
+ LOGGER.debug(" Missing nodes: " + missingNodes + "/" + 2 * nn + ".");
+ }
+ if (DEBUG) {
+ if (this.output == OutputLevel.RADIUS || this.output == OutputLevel.RADIUS_DIAMETER)
+ LOGGER.debug("Radius: " + rU + " (" + iterR + " iterations).");
+ if (this.output == OutputLevel.DIAMETER || this.output == OutputLevel.RADIUS_DIAMETER)
+ LOGGER.debug("Diameter: " + dL + " (" + iterD + " iterations).");
+ }
+ if (pl != null)
+ pl.done();
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException {
+
+ SimpleJSAP jsap = new SimpleJSAP(SumSweepDirectedDiameterRadius.class.getName(),
+ "Computes the diameter, radius, diameter and radius, or all eccentricities in a graph, using the ExactSumSweep algorithm.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED,
+ JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("forwardOutputFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT,
+ JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY,
+ "The filename where the resulting forward eccentricities (integers in binary form) are stored. If not available, the output file is not produced."),
+ new UnflaggedOption("backwardOutputFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT,
+ JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY,
+ "The filename where the resulting backward eccentricities (integers in binary form) are stored. If not available, the output file is not produced."),
+ new FlaggedOption("level", EnumStringParser.getParser(OutputLevel.class),
+ OutputLevel.ALL.name(), JSAP.REQUIRED, 'l', "level",
+ Arrays.toString(OutputLevel.values())) });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted())
+ System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+ final OutputLevel level = Enum.valueOf(OutputLevel.class,
+ jsapResult.getObject("level").toString().toUpperCase());
+ final String forwardOutputFilename = jsapResult.getString("forwardOutputFilename");
+ final String backwardOutputFilename = jsapResult.getString("backwardOutputFilename");
+
+ progressLogger.displayFreeMemory = true;
+ progressLogger.displayLocalSpeed = true;
+
+ ImmutableGraph graph = mapped ? ImmutableGraph.loadMapped(graphBasename, progressLogger)
+ : ImmutableGraph.load(graphBasename, progressLogger);
+ if (jsapResult.userSpecified("expand"))
+ graph = new ArrayListMutableGraph(graph).immutableView();
+
+ SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, level, null, progressLogger);
+ ss.compute();
+ if (level != OutputLevel.DIAMETER)
+ System.out.println("Radius: " + ss.rU + " (" + ss.iterR + " iterations).");
+ if (level != OutputLevel.RADIUS)
+ System.out.println("Diameter: " + ss.dL + " (" + ss.iterD + " iterations).");
+
+ if (forwardOutputFilename != null && (level == OutputLevel.ALL || level == OutputLevel.ALL_FORWARD)) {
+ BinIO.storeInts(ss.eccF, forwardOutputFilename);
+ }
+ if (backwardOutputFilename != null && level == OutputLevel.ALL) {
+ BinIO.storeInts(ss.eccB, backwardOutputFilename);
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SumSweepUndirectedDiameterRadius.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SumSweepUndirectedDiameterRadius.java
new file mode 100644
index 0000000..6e4348d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/SumSweepUndirectedDiameterRadius.java
@@ -0,0 +1,666 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2016-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.lang.EnumStringParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.Check;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.Transform;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/**
+ *
+ * Computes the radius and/or the diameter and/or all eccentricities of an
+ * undirected graph, using the SumSweep algorithm described by Michele Borassi,
+ * Pierluigi Crescenzi, Michel Habib, Walter A. Kosters, Andrea Marino, and
+ * Frank W. Takes in &ldquo;Fast diameter and radius BFS-based computation in
+ * (weakly connected) real-world graphs&mdash;With an application to the six
+ * degrees of separation games &rdquo;, <i>Theoretical Computer Science</i>,
+ * 586:59&minus;80, 2015.
+ *
+ * <p>
+ * We define the <em>eccentricity</em> of a node <var>v</var> in a graph
+ * <var>G</var>=(<var>V</var>,<var>E</var>) as <var>ecc</var>(<var>v</var>)=max
+ * <sub><var>w</var> reachable from <var>v</var></sub> <var>d</var>(<var>v</var>
+ * ,<var>w</var>), where <var>d</var>(<var>v</var>,<var>w</var>) is the number
+ * of edges in a shortest path from <var>v</var> to <var>w</var>. The
+ * <em>diameter</em> is max<sub><var>v</var>&isin;<var>V</var></sub>
+ * <var>ecc</var>(<var>v</var>), and the <em>radius</em> is min <sub>v&isin;
+ * <var>V</var></sub> <var>ecc</var>(<var>v</var>).
+ *
+ * <p>
+ * This algorithm performs some BFSs from "clever" vertices, and uses these BFSs
+ * to bound the eccentricity of all vertices. More specifically, for each vertex
+ * <var>v</var>, this algorithm keeps a lower and an upper bound on the
+ * eccentricity of <var>v</var>, named <var>l</var>[<var>v</var>], <var>u</var>[
+ * <var>v</var>]. Furthermore, it keeps a lower bound <var>dL</var> on the
+ * diameter and an upper bound <var>rU</var> on the radius. At each step, the
+ * algorithm performs a BFS, and it updates all these bounds: the radius is
+ * found as soon as <var>rU</var> is smaller than the minimum value of
+ * <var>l</var>, and the diameter is found as soon as <var>dL</var> is bigger
+ * than the maximum value of <var>u</var>.
+ *
+ * <p>
+ * More specifically, the upper bound on the radius (resp., lower bound on the
+ * diameter) is defined as the minimum (resp., maximum) eccentricity of a vertex
+ * from which we performed a BFS. Moreover, if we perform a BFS from a vertex
+ * <var>s</var>, we update <var>l</var>[<var>v</var>]=max(<var>l</var>[
+ * <var>v</var>], d(<var>s</var>, <var>v</var>)), and <var>u</var>[<var>v</var>
+ *]=max(<var>u</var>[<var>v</var>], d(<var>v</var>, <var>s</var>) +
+ * <var>ecc</var>(<var>s</var>).
+ *
+ * <p>
+ * To use this class, it is enough to create an instance, and then invoke
+ * {@link #compute()}. It is possible to choose between the following stopping
+ * conditions:
+ * <ul>
+ * <li>only the radius is found;</li>
+ * <li>only the diameter is found;</li>
+ * <li>radius and diameter are found;</li>
+ * <li>all eccentricities are found.</li>
+ * </ul>
+ *
+ * <p>
+ * After the method {@link #compute()} is run, the output can be obtained
+ * through the methods {@link #getRadius()} for the radius, {@link #getRadialVertex()} for a
+ * radial vertex, {@link #getDiameter()} for the diameter, {@link #getDiametralVertex()} for a
+ * vertex whose (forward or backward) eccentricity equals the diameter,
+ * {@link #getEccentricity(int)} for the eccentricity of a vertex. An exception is raised
+ * if the field has not been computed.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>
+ * The algorithm is exact and, although the running-time is <var>O</var>(
+ * <var>mn</var>) in the worst-case, it is usually much faster on real-world
+ * networks.
+ *
+ *
+ * @author Michele Borassi
+ */
+
+public class SumSweepUndirectedDiameterRadius {
+ private static final boolean DEBUG = true;
+ private final static Logger LOGGER = LoggerFactory.getLogger(SumSweepUndirectedDiameterRadius.class);
+
+ /**
+ * The type of output requested: radius, diameter, radius and diameter, or
+ * all eccentricities.
+ */
+ public enum OutputLevel {
+ /**
+ * Computes only the radius of the graph.
+ */
+ RADIUS,
+ /**
+ * Computes only the diameter of the graph.
+ */
+ DIAMETER,
+ /**
+ * Computes both radius and diameter.
+ */
+ RADIUSDIAMETER,
+ /**
+ * Computes the radius, the diameter, and all the eccentricities.
+ */
+ ALL
+ };
+
+ /** The graph under examination. */
+ private final ImmutableGraph graph;
+ /** The number of nodes. */
+ private final int nn;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The kind of output requested. */
+ private final OutputLevel output;
+ /** The array of eccentricity values. */
+ protected final int[] ecc;
+ /**
+ * <var>toComplete</var>[<var>v</var>] is true if and only if the
+ * eccentricity of <var>v</var> has not been exactly computed, yet.
+ */
+ private final boolean[] toComplete;
+ /** Saves which vertices are in the first branch of a BFS. */
+ private final boolean[] firstBranch;
+ /** The queue used for each BFS (it is recycled to save some time). */
+ private final int[] queue;
+ /**
+ * The array of distances, used in each BFS (it is recycled to save some
+ * time).
+ */
+ private final int[] dist;
+ /** Lower bound on the diameter of the graph. */
+ private int dL;
+ /** Lower bound on the radius of the graph. */
+ private int rU;
+ /** A vertex whose eccentricity equals the diameter. */
+ private int dV;
+ /** A vertex whose eccentricity equals the radius. */
+ private int rV;
+ /** Number of iterations performed until now. */
+ private int iter;
+ /** Lower bound on the eccentricity. */
+ protected int l[];
+ /** Upper bound on the eccentricity. */
+ protected int u[];
+ /** Number of iterations before the radius is found. */
+ private int iterR;
+ /** Number of iterations before the diameter is found. */
+ private int iterD;
+ /** Number of iterations before all eccentricities are found. */
+ private int iterAll;
+ /**
+ * Total forward distance from already processed vertices (used as tie-break
+ * for the choice of the next vertex to process).
+ */
+ private final int totDist[];
+
+ /**
+ * Creates a new class for computing diameter and/or radius and/or all
+ * eccentricities.
+ *
+ * @param graph
+ * a graph.
+ * @param pl
+ * a progress logger, or {@code null}.
+ * @param output
+ * which output is requested: radius, diameter, radius and
+ * diameter, or all eccentricities.
+ */
+ public SumSweepUndirectedDiameterRadius(final ImmutableGraph graph, final OutputLevel output,
+ final ProgressLogger pl) {
+ if (!Check.symmetry(graph)) {
+ throw new IllegalArgumentException("The graph is not undirected.");
+ }
+ this.pl = pl;
+ this.graph = graph;
+ this.nn = graph.numNodes();
+ this.ecc = new int[nn];
+ totDist = new int[nn];
+ l = new int[nn];
+ u = new int[nn];
+ toComplete = new boolean[nn];
+ queue = new int[nn];
+ dist = new int[nn];
+ firstBranch = new boolean[nn];
+
+ Arrays.fill(ecc, -1);
+ Arrays.fill(u, nn + 1);
+ Arrays.fill(l, 0);
+ Arrays.fill(toComplete, true);
+ this.dL = 0;
+ this.rU = Integer.MAX_VALUE;
+ this.output = output;
+ iterR = -1;
+ iterD = -1;
+ iterAll = -1;
+ }
+
+ /**
+ * Returns the radius of the graph, if it has already been computed
+ * (otherwise, an exception is raised).
+ *
+ * @return the radius
+ */
+ public int getRadius() {
+ if (iterR == -1) {
+ throw new UnsupportedOperationException("The radius has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return rU;
+ }
+
+ /**
+ * Returns the diameter, if it has already been computed (otherwise, an
+ * exception is raised).
+ *
+ * @return the diameter
+ */
+ public int getDiameter() {
+ if (iterD == -1) {
+ throw new UnsupportedOperationException("The diameter has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return dL;
+ }
+
+ /**
+ * Returns a radial vertex, if it has already been computed (otherwise, an
+ * exception is raised).
+ *
+ * @return a radial vertex
+ */
+ public int getRadialVertex() {
+ if (iterR == -1) {
+ throw new UnsupportedOperationException("The radius has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return rV;
+ }
+
+ /**
+ * Returns a diametral vertex, if it has already been computed (otherwise,
+ * an exception is raised).
+ *
+ * @return a diametral vertex
+ */
+ public int getDiametralVertex() {
+ if (iterD == -1) {
+ throw new UnsupportedOperationException("The radius has not been"
+ + "computed, yet. Please, run the compute method with" + "the correct output.");
+ }
+ return dV;
+ }
+
+ /**
+ * Returns the eccentricity of a vertex, if it has already been computed
+ * (otherwise, an exception is raised).
+ *
+ * @param v
+ * the vertex
+ */
+ public int getEccentricity(int v) {
+ if (ecc[v] == -1) {
+ throw new UnsupportedOperationException("The eccentricity of v has not been"
+ + "computed, yet. Please, use the compute method with" + "the correct output.");
+ }
+ return ecc[v];
+ }
+
+ /**
+ * Returns the number of iteration needed to compute the radius, if it has
+ * already been computed (otherwise, an exception is raised).
+ *
+ * @return the number of iterations before the radius is found
+ */
+ public int getRadiusIterations() {
+ if (iterR == -1) {
+ throw new UnsupportedOperationException("The radius has not been "
+ + "computed, yet. Please, run the compute method with " + "the correct output.");
+ }
+ return iterR;
+ }
+
+ /**
+ * Returns the number of iteration needed to compute the diameter, if it has
+ * already been computed (otherwise, an exception is raised).
+ *
+ * @return the number of iterations before the diameter is found
+ */
+ public int getDiameterIterations() {
+ if (iterD == -1) {
+ throw new UnsupportedOperationException("The diameter has not been "
+ + "computed, yet. Please, run the compute method with the correct output.");
+ }
+ return iterD;
+ }
+
+ /**
+ * Returns the number of iteration needed to compute all eccentricities, if
+ * they have already been computed (otherwise, an exception is raised).
+ *
+ * @return the number of iterations before all eccentricities are found
+ */
+ public int getAllIterations() {
+ if (iterAll == -1) {
+ throw new UnsupportedOperationException("All eccentricities have not been "
+ + " computed, yet. Please, run the compute method with the correct output.");
+ }
+ return iterAll;
+ }
+
+ /**
+ * Performs a (forward or backward) BFS, updating upper and lower bounds on
+ * the eccentricities of all visited vertices.
+ *
+ * @param start
+ * the starting vertex of the BFS
+ */
+ private void stepSumSweep(final int start) {
+ if (start == -1) {
+ return;
+ }
+ if (graph.outdegree(start) == 0) {
+ rU = 0;
+ rV = start;
+ ecc[start] = 0;
+ toComplete[start] = false;
+ return;
+ }
+ final ImmutableGraph g = graph;
+ final int queue[] = this.queue;
+ final int dist[] = this.dist;
+ int startQ = 0, endQ = 0;
+ int v = start, w, eccStart, eccNotFirstBranch = 0;
+ final int[] l = this.l, u = this.u, totDist = this.totDist, ecc = this.ecc;
+ final boolean[] firstBranch = this.firstBranch, toComplete = this.toComplete;
+ int startingPathL = 0;
+
+ Arrays.fill(dist, -1);
+ dist[start] = 0;
+
+ // If the BFS tree starts with a path, we consider the path separately.
+ if (g.outdegree(start) == 1) {
+ int old = start;
+ w = start;
+ v = g.successors(start).nextInt();
+ dist[v] = 1;
+ startingPathL++;
+ while (g.outdegree(v) == 2) {
+ int successors[] = g.successorArray(v);
+ old = w;
+ w = v;
+ if (successors[0] == old)
+ v = successors[1];
+ else
+ v = successors[0];
+ dist[v] = dist[w] + 1;
+ startingPathL++;
+ }
+ }
+
+ Arrays.fill(firstBranch, false);
+ LazyIntIterator iter;
+
+ queue[endQ++] = v;
+ int successors[] = g.successorArray(v);
+
+ // We want to compute which is the first branch of the BFS tree.
+ // Obviously, we want to exclude the initial path.
+ if (g.outdegree(v) != 1) {
+ if (dist[successors[0]] == -1) {
+ firstBranch[successors[0]] = true;
+ } else {
+ firstBranch[successors[1]] = true;
+ }
+ }
+
+ // We run the BFS.
+ while (startQ < endQ) {
+ v = queue[startQ++];
+
+ iter = g.successors(v);
+
+ if (!firstBranch[v]) {
+ eccNotFirstBranch = dist[v];
+ }
+
+ while ((w = iter.nextInt()) != -1) {
+ if (dist[w] == -1) {
+ dist[w] = dist[v] + 1;
+ queue[endQ++] = w;
+ firstBranch[w] = firstBranch[w] || firstBranch[v];
+ }
+ }
+ }
+
+ eccStart = dist[queue[endQ - 1]];
+
+ // We update all bounds.
+ for (v = nn; v-- > 0;) {
+ if (dist[v] == -1) {
+ continue;
+ }
+ totDist[v] += dist[v];
+ if (toComplete[v]) {
+ int distv = dist[v];
+
+ l[v] = Math.max(l[v], Math.max(eccStart - distv, distv));
+
+ if (firstBranch[v]) {
+ u[v] = Math.min(u[v], Math.max(eccStart - 2 - 2 * (startingPathL) + distv,
+ distv + Math.max(0, eccNotFirstBranch - 2 * startingPathL)));
+ } else if (distv < startingPathL) {
+ u[v] = Math.min(u[v], Math.max(distv, eccStart - distv));
+ } else {
+ u[v] = Math.min(u[v], Math.max(eccStart - 2 * startingPathL + distv, eccStart));
+ }
+
+ if (l[v] == u[v]) {
+ toComplete[v] = false;
+ ecc[v] = l[v];
+ if (dL < ecc[v]) {
+ dL = ecc[v];
+ dV = v;
+ }
+ if (rU > ecc[v]) {
+ rU = ecc[v];
+ rV = v;
+ }
+ }
+ }
+ }
+
+ this.iter++;
+ if (pl != null)
+ pl.update();
+ }
+
+ /**
+ * Performs <var>iter</var> steps of the SumSweep heuristic, starting from
+ * vertex <var>start</var>. The SumSweep heuristic performs BFSes from
+ * vertices maximizing the sum of the distance from the starting vertices of
+ * previous BFSes, and should be considered "peripheral". This way, after
+ * few iterations, usually most lower bounds on the eccentricities are
+ * tight.
+ *
+ * @param start
+ * the starting vertex
+ * @param iter
+ * the number of iterations
+ */
+ public void sumSweepHeuristic(final int start, final int iter) {
+
+ this.stepSumSweep(start);
+
+ for (int i = 2; i < iter; i++) {
+ this.stepSumSweep(SumSweepDirectedDiameterRadius.argMax(totDist, l, toComplete));
+ }
+ }
+
+ /**
+ * Computes how many nodes are still to be processed, before outputting the
+ * result
+ *
+ * @return the number of nodes to be processed
+ */
+ private int findMissingNodes() {
+ int missingR = 0, missingD = 0, missingAll = 0;
+ final boolean toComplete[] = this.toComplete;
+ final int u[] = this.u;
+ final int l[] = this.l;
+
+ for (int v = nn; v-- > 0;) {
+ if (toComplete[v]) {
+ missingAll++;
+ if (u[v] > dL) {
+ missingD++;
+ }
+ if (l[v] < rU) {
+ missingR++;
+ }
+ }
+ }
+ if (missingR == 0 && iterR == -1) {
+ iterR = iter;
+ }
+ if ((missingD == 0) && iterD == -1) {
+ iterD = iter;
+ }
+ if (missingAll == 0)
+ iterAll = iter;
+
+ switch (output) {
+ case RADIUS:
+ return missingR;
+ case DIAMETER:
+ return missingD;
+ case RADIUSDIAMETER:
+ return missingR + missingD;
+ default:
+ return missingAll;
+ }
+ }
+
+ /**
+ * Computes diameter, radius, and/or all eccentricities. Results can be
+ * accessed by methods such as {@link #getDiameter()},
+ * {@link #getRadialVertex()} and
+ * {@link #getEccentricity(int)}.
+ */
+ public void compute() {
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.itemsName = "nodes";
+ pl.displayLocalSpeed = true;
+ }
+ int maxDeg = Integer.MIN_VALUE, maxDegVert = -1;
+ for (int v = 0; v < nn; v++) {
+ if (graph.outdegree(v) > maxDeg) {
+ maxDeg = graph.outdegree(v);
+ maxDegVert = v;
+ }
+ }
+ sumSweepHeuristic(maxDegVert, 3);
+
+ double points[] = new double[3];
+ int missingNodes = findMissingNodes(), oldMissingNodes = missingNodes;
+
+ Arrays.fill(points, graph.numNodes());
+
+ while (missingNodes > 0) {
+
+ int stepToPerform = SumSweepDirectedDiameterRadius.argMax(points);
+
+ switch (stepToPerform) {
+ case 0:
+ if (DEBUG)
+ LOGGER.debug("Performing a BFS from a vertex maximizing the upper bound.");
+ this.stepSumSweep(SumSweepDirectedDiameterRadius.argMax(u, totDist, toComplete));
+ break;
+ case 1:
+ if (DEBUG)
+ LOGGER.debug("Performing a BFS from a vertex minimizing the lower bound.");
+ this.stepSumSweep(SumSweepDirectedDiameterRadius.argMin(l, totDist, toComplete));
+ break;
+ case 2:
+ if (DEBUG)
+ LOGGER.debug("Performing a BFS from a vertex maximizing the distance sum.");
+ this.stepSumSweep(SumSweepDirectedDiameterRadius.argMax(totDist, u, toComplete));
+ break;
+ }
+ oldMissingNodes = missingNodes;
+ missingNodes = this.findMissingNodes();
+ points[stepToPerform] = oldMissingNodes - missingNodes;
+
+ for (int j = 0; j < points.length; j++) {
+ if (j != stepToPerform) {
+ points[j] = points[j] + 2.0 / iter;
+ }
+ }
+ if (DEBUG)
+ LOGGER.debug(" Missing nodes: " + missingNodes + "/" + 2 * nn + ".");
+ }
+ if (DEBUG) {
+ if (this.output == OutputLevel.RADIUS || this.output == OutputLevel.RADIUSDIAMETER)
+ LOGGER.debug("Radius: " + rU + " (" + iterR + " iterations).");
+ if (this.output == OutputLevel.DIAMETER || this.output == OutputLevel.RADIUSDIAMETER)
+ LOGGER.debug("Diameter: " + dL + " (" + iterD + " iterations).");
+ }
+ if (pl != null)
+ pl.done();
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException {
+
+ SimpleJSAP jsap = new SimpleJSAP(SumSweepUndirectedDiameterRadius.class.getName(),
+ "Computes the diameter, radius, diameter and radius, or all eccentricities in a graph, using the ExactSumSweep algorithm.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new Switch("onlyGiant", 'g', "onlyGiant",
+ "Performs the computation only for the biggest component of the input graph."),
+ new Switch("symmetrize", 's', "symmetrize",
+ "Symmetrizes the graph (so that also directed graphs can be input)."),
+ new Switch("mapped", 'm', "mapped", "Use loadMapped() to load the graph."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED,
+ JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("outputFilename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED,
+ JSAP.NOT_GREEDY,
+ "The filename where the resulting backward eccentricities (integers in binary form) are stored. If not available, the output file is not produced."),
+ new FlaggedOption("level", EnumStringParser.getParser(OutputLevel.class, true),
+ OutputLevel.ALL.name(), JSAP.NOT_REQUIRED, 'l', "level",
+ Arrays.toString(OutputLevel.values())), });
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted())
+ System.exit(1);
+
+ final boolean mapped = jsapResult.getBoolean("mapped", false);
+ final boolean onlyGiant = jsapResult.getBoolean("onlyGiant", false);
+ final boolean symmetrize = jsapResult.getBoolean("symmetrize", false);
+ final String graphBasename = jsapResult.getString("graphBasename");
+ final ProgressLogger progressLogger = new ProgressLogger(LOGGER, "nodes");
+ final OutputLevel level = Enum.valueOf(OutputLevel.class,
+ jsapResult.getObject("level").toString().toUpperCase());
+ final String forwardOutputFilename = jsapResult.getString("outputFilename");
+
+ progressLogger.displayFreeMemory = true;
+ progressLogger.displayLocalSpeed = true;
+
+ ImmutableGraph graph = mapped ? ImmutableGraph.loadMapped(graphBasename, progressLogger)
+ : ImmutableGraph.load(graphBasename, progressLogger);
+ if (jsapResult.userSpecified("expand"))
+ graph = new ArrayListMutableGraph(graph).immutableView();
+ if (symmetrize)
+ graph = Transform.symmetrize(graph);
+ if (onlyGiant)
+ graph = ConnectedComponents.getLargestComponent(graph, 0, null);
+
+ SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, level, progressLogger);
+ ss.compute();
+ if (level != OutputLevel.DIAMETER)
+ System.out.println("Radius: " + ss.rU + " (" + ss.iterR + " iterations).");
+ if (level != OutputLevel.RADIUS)
+ System.out.println("Diameter: " + ss.dL + " (" + ss.iterD + " iterations).");
+ System.out.println("Total number of iterations: " + ss.iter + ".");
+
+ if (forwardOutputFilename != null && (level == OutputLevel.ALL)) {
+ BinIO.storeInts(ss.ecc, forwardOutputFilename);
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/TopKGeometricCentrality.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/TopKGeometricCentrality.java
new file mode 100644
index 0000000..b168255
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/TopKGeometricCentrality.java
@@ -0,0 +1,657 @@
+package it.unimi.dsi.webgraph.algo;
+
+import java.io.DataOutputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.PrintStream;
+import java.util.Arrays;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntComparator;
+import it.unimi.dsi.fastutil.ints.IntHeapPriorityQueue;
+import it.unimi.dsi.lang.EnumStringParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+/**
+ * Computes the <var>k</var> most central vertices according to a <em>positive</em> {@linkplain Centrality geometric centrality}.
+ * A survey about geometric centralities can be found
+ * &ldquo;<a href="http://vigna.di.unimi.it/BoVAC">Axioms for centrality</a>&rdquo;,
+ * by Paolo Boldi and Sebastiano Vigna, <i>Internet Math.</i>, 10(3-4):222&minus;262, 2014.
+ *
+ * <p>Note that usually one is interested in the <em>negative</em> version of a centrality measure, that is, the version
+ * that depends on the <em>incoming</em> arcs. This class can compute only <em>positive</em> centralities: if you are
+ * interested (as it usually happens) in the negative version, you must pass to this class the <em>transpose</em> of the graph.
+ *
+ * <p>In more detail, this class can compute the top <var>k</var> nodes for a centrality
+ * out of {@link Centrality}. You must build a suitable instance using one of the
+ * static factory method (i.e., {@link #newHarmonicCentrality(ImmutableGraph, int, int)}) and
+ * then invoke {@link #compute()} on the instance. After the computation, the results will be available
+ * in the public arrays {@link #centrality} and {@link #topK}.
+ *
+ * <p>The algorithm implemented in this class is the the CutClos algorithm proposed by Michele
+ * Borassi, Pierluigi Crescenzi and Andrea Marino in &ldquo;Fast and Simple
+ * Computation of Top-<var>k</var> Closeness Centralities&rdquo;,
+ * <i>CoRR</i>, abs/1507.01490, 2015.
+ * The implementation performs a number of parallel breadth-first visits.
+ *
+ * <p>If <var>k</var> is small, the algorithm is much faster than the standard algorithm
+ * which {@linkplain GeometricCentralities computes all centralities}. For example, if <var>k</var> is 1
+ * the difference can
+ * be several orders of magnitude. For bigger values of <var>k</var>, the performance
+ * improvement decreases, and for <var>k</var> equal to the number of nodes the performance is the same as the
+ * trivial algorithm that computes all centralities. In that case, you might consider using
+ * an approximate algorithm like {@link HyperBall}.
+ *
+ * @author Michele Borassi
+ */
+
+public class TopKGeometricCentrality {
+ private final static Logger LOGGER = LoggerFactory.getLogger(TopKGeometricCentrality.class);
+ private static final boolean DEBUG = false;
+
+ /**
+ * A thread that performs BFSes from some nodes, in order to compute their
+ * centralities, or to prove that these vertices are not in the top-k.
+ */
+ @SuppressWarnings("hiding")
+ private class GeometricCentralityThread extends Thread {
+ private final int nn = TopKGeometricCentrality.this.nn;
+ private final Centrality centralityType = TopKGeometricCentrality.this.centralityType;
+
+ // These variables are used in a BFS, and we have to recycle them for
+ // performance reasons.
+ private final int dist[];
+ private final int queue[];
+ private final ImmutableGraph graph = TopKGeometricCentrality.this.graph.copy();
+ private int neVis;
+ private int nnVis;
+
+ GeometricCentralityThread() {
+ dist = new int[nn];
+ queue = new int[nn];
+ Arrays.fill(dist, -1);
+ }
+
+ /**
+ * A BFS from a vertex v, that is cut as soon as we can prove that v is
+ * not one of the <var>k</var> most central vertices.
+ *
+ * @param v
+ * the starting vertex.
+ * @return the Lin/harmonic/exponential centrality of v if v can
+ * be in the top-k, 0 otherwise.
+ */
+ private double BFSCut(final int v) {
+ if (this.graph.outdegree(v) == 0) {
+ if (centralityType == Centrality.LIN) return 1;
+ else return 0;
+ }
+ int x, y;
+ int startQ = 0;
+ int d = -1, gamma = this.graph.outdegree(v);
+ double sumDist = 0, tildefL = 0, tildefU = 0;
+ double reachL = TopKGeometricCentrality.this.reachL[v],
+ reachU = TopKGeometricCentrality.this.reachU[v];
+ int[] queue = this.queue;
+ int[] dist = this.dist;
+ int[] degs = TopKGeometricCentrality.this.degs;
+ Centrality centrality = this.centralityType;
+ double kth = TopKGeometricCentrality.this.kth;
+ double alpha = TopKGeometricCentrality.this.alpha;
+
+ LazyIntIterator iter;
+
+ // We reset variables that were modified in previous BFSes.
+ for (int i = 0; i < nnVis; i++) dist[queue[i]] = -1;
+ nnVis = 0;
+
+ dist[v] = 0;
+ queue[nnVis++] = v;
+
+ while (startQ < nnVis) {
+ x = queue[startQ++];
+ iter = this.graph.successors(x);
+ if (dist[x] > d) {
+ d++;
+ if (centrality == Centrality.LIN) {
+ tildefL = ((sumDist - gamma + (d + 2) * (reachL - nnVis))) / (reachL * reachL);
+ tildefU = ((sumDist - gamma + (d + 2) * (reachU - nnVis))) / (reachU * reachU);
+ if (kth > 0 && tildefL >= 1.0 / kth && tildefU >= 1.0 / kth) {
+ return -1;
+ }
+ } else if (centrality == Centrality.HARMONIC) {
+ tildefL = sumDist + ((double) gamma) / (d + 1) + (reachU - gamma - nnVis) / (d + 2);
+ if (tildefL <= kth) {
+ return -1;
+ }
+ } else {
+ tildefL = sumDist + gamma * Math.pow(alpha, d + 1) + (reachU - gamma - nnVis) * Math.pow(alpha, d + 2);
+ if (tildefL <= kth) {
+ return -1;
+ }
+ }
+ gamma = 0;
+ }
+ while ((y = iter.nextInt()) != -1) {
+ neVis++;
+ if (dist[y] == -1) {
+ dist[y] = dist[x] + 1;
+ if (centrality == Centrality.LIN) {
+ sumDist += dist[y];
+ } else if (centrality == Centrality.HARMONIC) {
+ sumDist += 1.0 / dist[y];
+ } else {
+ sumDist += Math.pow(alpha, dist[y]);
+ }
+ queue[nnVis++] = y;
+ gamma += degs[y];
+ } else {
+ if (centrality == Centrality.LIN) {
+ tildefL += 1 / (reachL * reachL);
+ tildefU += 1 / (reachU * reachU);
+ if (kth > 0 && tildefL >= 1.0 / kth && tildefU >= 1.0 / kth) {
+ return -1;
+ }
+ } else if (centrality == Centrality.HARMONIC) {
+ tildefL += 1.0 / (d + 2) - 1.0 / (d + 1);
+ if (tildefL <= kth) {
+ return -1;
+ }
+ } else {
+ tildefL += Math.pow(alpha, d + 2) - Math.pow(alpha, d + 1);
+ if (tildefL <= kth) {
+ return -1;
+ }
+ }
+ }
+ }
+ }
+ if (centrality == Centrality.LIN) {
+ return ((double) nnVis) * nnVis / sumDist;
+ }
+ return sumDist;
+ }
+
+ /*
+ * The main function run by each thread: it performs a BFSCut from each
+ * vertex, until no more new vertex is available.
+ */
+ @Override
+ public void run() {
+ int v = nextVert();
+ double centrality;
+ while (v != -1) {
+ neVis = 0;
+ centrality = this.BFSCut(v);
+ endBFS(v, centrality, neVis);
+ v = nextVert();
+ }
+ }
+ }
+
+ /**
+ * The centralities with respect to which it is possible to find the top <var>k</var> nodes.
+ */
+ public enum Centrality {
+ /**
+ * Lin's Centrality: &ell;(<var>x</var>) = <var>R</var>(<var>x</var>)<sup>2</sup>&nbsp;&frasl;&nbsp;&nbsp;<big>&Sigma;</big><sub><var>y</var>&isin;<var>R</var>(<var>x</var>)</sub>
+ * d(<var>x</var>,<var>y</var>), where <var>R</var>(<var>x</var>) is the set of nodes reachable
+ * from <var>x</var>. Note that for a strongly connected graph Lin's centrality is exactly Bavelas's
+ * closeness centrality, that is, 1&nbsp;&frasl;&nbsp;&nbsp;<big>&Sigma;</big><sub><var>y</var></sub>d(<var>x</var>, <var>y</var>),
+ * multiplied by the square of the number of nodes.
+ */
+ LIN,
+ /**
+ * Harmonic Centrality: <var>h</var>(<var>x</var>) = <big>&Sigma;</big> <sub><var>y</var> &ne; <var>x</var></sub>
+ * 1&nbsp;&frasl;&nbsp;d(<var>x</var>,<var>y</var>).
+ */
+ HARMONIC,
+ /**
+ * Exponential Centrality: <var>e</var><sub>&alpha;</sub>(<var>x</var>) = <big>&Sigma;</big><sub><var>y</var></sub> &alpha;<sup>d(<var>x</var>,<var>y</var>)</sup> for some real number &alpha; &isin; (0..1).
+ */
+ EXPONENTIAL
+ };
+
+ /** The graph under examination. */
+ private final ImmutableGraph graph;
+ /** The number of nodes. */
+ private final int nn;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The number of vertices to be analyzed. */
+ private final int k;
+ /** The number of threads. */
+ private final int threads;
+ /** The kind of centrality to be computed. */
+ private final Centrality centralityType;
+ /** The exponent used (only if centrality==Centrality.EXPONENTIAL). */
+ private final double alpha;
+ /** Lower and upper bound on the number of reachable vertices. */
+ private final int reachL[], reachU[];
+ /** The degree of all vertices. */
+ private final int degs[];
+ /** List of all vertices sorted by degree. */
+ private final int sortedVertDeg[];
+ /** Number of vertices already processed. */
+ private int finishedVisits;
+ /** Vertex which is currently processed. */
+ private int currentV;
+ /** K-th biggest centrality found until now. */
+ private double kth;
+ /** The number of visited edges. */
+ private long neVis;
+ /**
+ * If <var>x</var> is one of the <var>k</var> most central vertices, {@code centrality[x]}
+ * will contain its centrality. On all other nodes, this array contains either -1 or
+ * the centrality of the node.
+ */
+ public final double centrality[];
+ /** The <var>k</var> most central vertices, from the most central to the least central. */
+ public int topK[];
+
+ /** The <var>k</var> most central vertices, from the less central to the most central. */
+ private IntHeapPriorityQueue topKQueue = new IntHeapPriorityQueue(new IntComparator() {
+ @Override
+ public int compare(int x, int y) {
+ return (int) Math.signum(centrality[x] - centrality[y]);
+ }
+ });
+
+ /**
+ * Creates a new instance to compute the <var>k</var> most central vertices according
+ * to {@linkplain Centrality#LIN positive Lin's centrality}, logging every 10 seconds.
+ *
+ * @param g
+ * the input graph.
+ * @param k
+ * the number of vertices to be output.
+ * @param threads
+ * the number of threads, or 0 for {@link Runtime#availableProcessors()}.
+ * @return the new instance.
+ */
+ public static TopKGeometricCentrality newLinCentrality(final ImmutableGraph g, final int k, final int threads) throws IllegalArgumentException {
+ return new TopKGeometricCentrality(g, k, Centrality.LIN, threads, 0.5, new ProgressLogger());
+ }
+
+ /**
+ * Creates a new instance to compute the <var>k</var> most central vertices according
+ * to {@linkplain Centrality#HARMONIC positive harmonic centrality}, logging every 10 seconds.
+ *
+ * @param g
+ * the input graph.
+ * @param k
+ * the number of vertices to be output.
+ * @param threads
+ * the number of threads, or 0 for {@link Runtime#availableProcessors()}.
+ * @return the new instance.
+ */
+ public static TopKGeometricCentrality newHarmonicCentrality(final ImmutableGraph g, final int k, final int threads) {
+ return new TopKGeometricCentrality(g, k, Centrality.HARMONIC, threads, 0.5, new ProgressLogger());
+ }
+
+ /**
+ * Creates a new instance to compute the <var>k</var> most central vertices according
+ * to {@linkplain Centrality#EXPONENTIAL positive exponential centrality}, logging every 10 seconds.
+ *
+ * @param g
+ * the input graph
+ * @param k
+ * the number of vertices to be output.
+ * @param threads
+ * the number of threads, or 0 for {@link Runtime#availableProcessors()}.
+ * @param alpha
+ * the base used for the exponential centrality.
+ * @return the new instance.
+ */
+ public static TopKGeometricCentrality newExponentialCentrality(final ImmutableGraph g, final int k, final double alpha, final int threads) {
+ return new TopKGeometricCentrality(g, k, Centrality.EXPONENTIAL, threads, alpha, new ProgressLogger());
+ }
+
+ /**
+ * Creates a new instance.
+ *
+ * @param g
+ * the input graph.
+ * @param k
+ * the number of vertices required.
+ * @param centralityType
+ * the type of centrality.
+ * @param threads
+ * the number of threads, or 0 for {@link Runtime#availableProcessors()}.
+ * @param alpha
+ * the exponent (used only if {@code centrality} is {@link Centrality#EXPONENTIAL}).
+ * @param pl
+ * a progress logger, or {@code null}.
+ */
+ public TopKGeometricCentrality(final ImmutableGraph g, final int k, final Centrality centralityType,
+ final int threads, final double alpha, final ProgressLogger pl) {
+ this.alpha = alpha;
+ this.centralityType = centralityType;
+ this.neVis = 0;
+ this.finishedVisits = 0;
+ this.kth = 0;
+ this.pl = pl;
+ graph = g;
+ nn = graph.numNodes();
+ this.k = Math.min(k, nn);
+ centrality = new double[nn];
+ reachL = new int[nn];
+ reachU = new int[nn];
+
+ if (centralityType == Centrality.EXPONENTIAL && (alpha <= 0 || alpha >= 1))
+ throw new IllegalArgumentException("The value alpha must be strictly between 0 and 1.");
+
+ if (k <= 0) throw new IllegalArgumentException("k must be positive.");
+ if (threads < 0) throw new IllegalArgumentException("The number of threads must not be negative.");
+ else if (threads == 0) this.threads = Runtime.getRuntime().availableProcessors();
+ else this.threads = threads;
+
+ LOGGER.debug("Nodes: " + nn);
+ LOGGER.debug("Arcs: " + graph.numArcs());
+
+ computeReach();
+ degs = new int[nn];
+
+ for (int v = 0; v < nn; v++)
+ degs[v] = graph.outdegree(v);
+
+ sortedVertDeg = countingSort(degs);
+ currentV = nn - 1;
+ }
+
+ /**
+ * Uses counting sort to sort the first n integers, according to their
+ * values.
+ *
+ * @param values
+ * an array containing in position i the value of vertex i (which
+ * is assumed to be between 0 and values.length)
+ * @return a permutation sorted[] of {0,...,values.length-1}, such that
+ * values[sorted[i]] is non-decreasing with respect to i.
+ */
+ private static int[] countingSort(final int[] values) {
+ int[] sorted = new int[values.length];
+ int numValues[] = new int[values.length + 1];
+ for (int i : values)
+ numValues[i + 1]++;
+
+ for (int i = 1; i < numValues.length; i++)
+ numValues[i] += numValues[i - 1];
+
+ for (int i = 0; i < values.length; i++)
+ sorted[numValues[values[i]]++] = i;
+
+ return sorted;
+ }
+
+ /**
+ * Computes a lower and an upper bound on the number of vertices reachable
+ * from each vertex v.
+ */
+ private void computeReach() {
+ StronglyConnectedComponents scc = StronglyConnectedComponents.compute(graph, false, pl);
+ int nscc = scc.numberOfComponents;
+ int sortedVerts[] = new int[nn];
+ int v, vComp, w, wComp, i, maxSCC = 0;
+ int sccSizes[] = new int[nscc];
+ boolean visited[] = new boolean[nscc];
+ boolean reachMaxSCC[] = new boolean[nscc];
+ long lReachSCC[] = new long[nscc];
+ long uReachSCC[] = new long[nscc];
+ long uReachSCCWithoutMax[] = new long[nscc];
+ LazyIntIterator iter;
+
+ LOGGER.debug("There are " + nscc + " strongly connected components.");
+ IntArrayList sccGraph[] = new IntArrayList[nscc];
+
+ for (i = 0; i < nscc; i++)
+ sccGraph[i] = new IntArrayList();
+
+ sortedVerts = countingSort(scc.component);
+ i = 0;
+ v = sortedVerts[i++];
+ for (int contSCC = 0; contSCC < nscc; contSCC++) {
+ while (scc.component[v] == contSCC) {
+ sccSizes[contSCC]++;
+ iter = graph.successors(v);
+ while ((w = iter.nextInt()) != -1) {
+ wComp = scc.component[w];
+ if (!visited[wComp] && contSCC != wComp) {
+ visited[wComp] = true;
+ sccGraph[contSCC].add(wComp);
+ }
+ }
+ if (i >= nn) break;
+ v = sortedVerts[i++];
+ }
+ for (int xComp : sccGraph[contSCC])
+ visited[xComp] = false;
+
+ if (sccSizes[contSCC] > sccSizes[maxSCC])
+ maxSCC = contSCC;
+ }
+
+ // BFS from maxSCC to compute reachL[maxSCC], reachU[maxSCC] exactly
+ int[] queue = new int[nscc];
+ int startQ = 0, endQ = 0;
+ queue[endQ++] = maxSCC;
+ visited[maxSCC] = true;
+ while (startQ < endQ) {
+ wComp = queue[startQ++];
+ lReachSCC[maxSCC] += sccSizes[wComp];
+ for (int xComp : sccGraph[wComp]) {
+ if (!visited[xComp]) {
+ visited[xComp] = true;
+ queue[endQ++] = xComp;
+ }
+ }
+ }
+ uReachSCC[maxSCC] = lReachSCC[maxSCC];
+ reachMaxSCC[maxSCC] = true;
+
+ // Dynamic programming to compute number of reachable vertices
+ for (vComp = 0; vComp < nscc; vComp++) {
+ if (vComp != maxSCC) {
+ for (int xComp : sccGraph[vComp]) {
+ lReachSCC[vComp] = Math.max(lReachSCC[vComp], lReachSCC[xComp]);
+ if (!visited[xComp]) uReachSCCWithoutMax[vComp] += uReachSCCWithoutMax[xComp];
+ uReachSCC[vComp] += uReachSCC[xComp];
+ uReachSCC[vComp] = Math.min(uReachSCC[vComp], nn);
+ reachMaxSCC[vComp] = reachMaxSCC[vComp] || reachMaxSCC[xComp];
+ }
+ lReachSCC[vComp] += sccSizes[vComp];
+ uReachSCC[vComp] += sccSizes[vComp];
+ if (!visited[vComp]) uReachSCCWithoutMax[vComp] += sccSizes[vComp];
+ if (reachMaxSCC[vComp]) uReachSCC[vComp] = uReachSCC[maxSCC] + uReachSCCWithoutMax[vComp];
+ uReachSCC[vComp] = Math.min(uReachSCC[vComp], nn);
+ }
+ }
+
+ // Store all results obtained in reachL, reachU.
+ for (v = 0; v < nn; v++) {
+ vComp = scc.component[v];
+ reachL[v] = (int) Math.min(lReachSCC[vComp], nn);
+ reachU[v] = (int) Math.min(uReachSCC[vComp], nn);
+ }
+ }
+
+ /**
+ * Checks that the bounds reachL and reachU are correct.
+ */
+ void checkReachLU() {
+ int queue[] = new int[nn];
+ for (int v = 0; v < nn; v++) {
+ int startQ = 0, endQ = 0, x, y;
+ LazyIntIterator iter;
+ boolean visited[] = new boolean[nn];
+ visited[v] = true;
+ queue[endQ++] = v;
+
+ while (startQ < endQ) {
+ x = queue[startQ++];
+ iter = this.graph.successors(x);
+
+ while ((y = iter.nextInt()) != -1) {
+ if (!visited[y]) {
+ visited[y] = true;
+ queue[endQ++] = y;
+ }
+ }
+ }
+ assert reachL[v] <= startQ;
+ assert reachU[v] >= startQ;
+ }
+ }
+
+ /**
+ * When a thread asks, it returns the next vertex to be analyzed.
+ *
+ * @return a vertex
+ */
+ private synchronized int nextVert() {
+ if (currentV >= 0) {
+ return this.sortedVertDeg[currentV--];
+ }
+ return -1;
+ }
+
+ /**
+ * Updates the values when a BFS is terminated, and if requested it outputs
+ * some data.
+ *
+ * @param v
+ * the starting vertex
+ * @param centrality
+ * the centrality of v, or 0 if v is not in the top-k.
+ */
+ private synchronized void endBFS(final int v, final double centrality, final int neVis) {
+ this.neVis += neVis;
+ if (pl != null) pl.update();
+ this.centrality[v] = centrality;
+ if (centrality >= 0) {
+ topKQueue.enqueue(v);
+ if (topKQueue.size() > k) topKQueue.dequeueInt();
+ if (topKQueue.size() == k) kth = this.centrality[topKQueue.firstInt()];
+ }
+
+ if (DEBUG) {
+ LOGGER.debug("Finished visit " + ++finishedVisits + "/" + nn + ":");
+ LOGGER.debug("Vertex: " + v + " (degree: " + degs[v] + ")");
+ LOGGER.debug("Current " + k + "-th centrality: " + kth);
+ LOGGER.debug("Current vertex centrality: " + centrality);
+ LOGGER.debug("Current improvement (approx): " + (((double) graph.numArcs()) * finishedVisits / neVis));
+ }
+ }
+
+ /**
+ * Compute top-<var>k</var> geometric centralities.
+ */
+ public void compute() {
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.itemsName = "nodes";
+ pl.displayLocalSpeed = true;
+ }
+
+ final GeometricCentralityThread[] threads = new GeometricCentralityThread[this.threads];
+
+ for (int i = 0; i < this.threads; i++) {
+ threads[i] = new GeometricCentralityThread();
+ threads[i].start();
+ }
+ for (int i = 0; i < this.threads; i++) {
+ try {
+ threads[i].join();
+ } catch (InterruptedException e) {
+ e.printStackTrace();
+ }
+ }
+ topK = new int[this.topKQueue.size()];
+
+ for (int i = this.topKQueue.size()-1; i >= 0; i--)
+ topK[i] = this.topKQueue.dequeueInt();
+
+ if (pl != null) pl.done();
+ }
+
+
+
+ public static void main(String args[]) throws JSAPException, IOException {
+
+ SimpleJSAP jsap = new SimpleJSAP(TopKGeometricCentrality.class.getName(), "Computes top-k central vertices according to different positive geometric centrality measures. Outputs a file with extension .nodes containing the top k nodes (most central nodes first), and a file with extension .values containing the corresponding centralities.\nPlease note that to compute negative centralities on directed graphs (which is usually what you want) you have to compute positive centralities on the transpose.",
+ new Parameter[] {
+ new Switch("expand", 'e', "expand", "Expand the graph to increase speed (no compression)."),
+ new Switch("text", 't', "text", "If true, a human-readable text file is produced, otherwise two binary files containing nodes and centralities."),
+ new FlaggedOption("k", JSAP.INTSIZE_PARSER, "1", JSAP.NOT_REQUIRED, 'k', "k", "The number of vertices to be output"),
+ new FlaggedOption("centrality", EnumStringParser.getParser(Centrality.class, true), Centrality.HARMONIC.name(), JSAP.REQUIRED, 'c', "centrality", Arrays.toString(Centrality.values())),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("alpha", JSAP.DOUBLE_PARSER, "0.5", JSAP.NOT_REQUIRED, 'a', "alpha", "The value of alpha for exponential centrality (ignored, otherwise)."),
+ new UnflaggedOption("graphBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("outputBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The output basename."),
+ });
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("graphBasename");
+ final int k = jsapResult.getInt("k");
+ final String outputBasename = jsapResult.getString("outputBasename");
+ final Centrality centrality = Enum.valueOf(Centrality.class, jsapResult.getObject("centrality").toString().toUpperCase());
+ int threads = jsapResult.getInt("threads");
+ final long logInterval = jsapResult.getLong("logInterval");
+ final double alpha = jsapResult.getDouble("alpha");
+
+ if (centrality == Centrality.EXPONENTIAL && (alpha <= 0 || alpha >= 1))
+ throw new IllegalArgumentException("The value alpha must be strictly between 0 and 1.");
+
+ if (threads == 0) threads = Runtime.getRuntime().availableProcessors();
+
+ TopKGeometricCentrality c;
+ ImmutableGraph g = ImmutableGraph.load(basename);
+ if (jsapResult.userSpecified("expand")) g = new ArrayListMutableGraph(g).immutableView();
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, logInterval, TimeUnit.MILLISECONDS, "nodes");
+
+ c = new TopKGeometricCentrality(g, k, centrality, threads, alpha, pl);
+ c.compute();
+
+ if (jsapResult.getBoolean("text")) {
+ PrintStream outputNodes = new PrintStream(outputBasename + ".nodes");
+ PrintStream outputValues = new PrintStream(outputBasename + ".values");
+ for (int v : c.topK) {
+ outputNodes.println(v);
+ outputValues.println(c.centrality[v]);
+ }
+ outputNodes.close();
+ outputValues.close();
+ } else {
+ DataOutputStream outputNodes = new DataOutputStream(new FileOutputStream(outputBasename + ".nodes"));
+ DataOutputStream outputValues = new DataOutputStream(new FileOutputStream(outputBasename + ".values"));
+ for (int v : c.topK) {
+ outputNodes.writeInt(v);
+ outputValues.writeDouble(c.centrality[v]);
+ }
+ outputNodes.close();
+ outputValues.close();
+ }
+
+ LOGGER.info("\nFinal improvement: " + ((double) c.nn) * c.graph.numArcs() / c.neVis);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/package.html b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/package.html
new file mode 100644
index 0000000..8c07b3e
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/algo/package.html
@@ -0,0 +1,10 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>Webgraph</title>
+ </head>
+
+ <body>
+ <P>Classes implementing useful algorithms on graphs.
+ </body>
+</html>
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/BreadthFirst.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/BreadthFirst.java
new file mode 100644
index 0000000..cd16b33
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/BreadthFirst.java
@@ -0,0 +1,125 @@
+package it.unimi.dsi.webgraph.examples;
+
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Arrays;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+/** The main method of this class loads an arbitrary {@link it.unimi.dsi.webgraph.ImmutableGraph}
+ * and performs a breadth-first visit of the graph (optionally starting just from a given node, if provided,
+ * in which case it prints the eccentricity of the node, i.e., the maximum distance from the node).
+ */
+
+public class BreadthFirst {
+
+ private BreadthFirst() {}
+
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException, IOException {
+ final SimpleJSAP jsap = new SimpleJSAP(BreadthFirst.class.getName(), "Visits a graph in breadth-first fashion, possibly starting just from a given node.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("start", JSAP.INTEGER_PARSER, Integer.toString(-1), JSAP.NOT_REQUIRED, 's', "start", "The starting node; if missing or -1, the visit will be complete."),
+ new FlaggedOption("maxDist", JSAP.INTEGER_PARSER, Integer.toString(Integer.MAX_VALUE), JSAP.NOT_REQUIRED, 'm', "maxDist", "Maximum distance (nodes at larger distance from the root are not enqueued"),
+ new Switch("print", 'p', "print", "Print nodes as they are enqueued. If set, ordinary output is suppressed."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final ProgressLogger pl = new ProgressLogger();
+ pl.logInterval = jsapResult.getLong("logInterval");
+ final String basename = jsapResult.getString("basename");
+ final ImmutableGraph graph;
+ if (jsapResult.userSpecified("graphClass")) graph = (ImmutableGraph)(jsapResult.getClass("graphClass")).getMethod("load", CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ else graph = ImmutableGraph.load(basename, pl);
+
+ final int maxDist = jsapResult.getInt("maxDist");
+ final boolean print = jsapResult.getBoolean("print");
+ // We parse the starting node.
+ final int start = jsapResult.getInt("start");
+ final IntArrayFIFOQueue queue = new IntArrayFIFOQueue();
+ final int n = graph.numNodes();
+ final int[] dist = new int[n];
+
+ Arrays.fill(dist, Integer.MAX_VALUE); // Initially, all distances are infinity.
+ final int lo = start == -1 ? 0 : start;
+ final int hi = start == -1 ? n : start + 1;
+
+ int curr = lo, succ, ecc = 0, reachable = 0;
+
+ pl.start("Starting visit...");
+ pl.expectedUpdates = hi - lo;
+ pl.itemsName = "nodes";
+
+ for(int i = lo; i < hi; i++) {
+ if (dist[i] == Integer.MAX_VALUE) { // Not already visited
+ queue.enqueue(i);
+ if (print) System.out.println(i);
+ dist[i] = 0;
+
+ LazyIntIterator successors;
+
+ while(! queue.isEmpty()) {
+ curr = queue.dequeueInt();
+ successors = graph.successors(curr);
+ int d = graph.outdegree(curr);
+ while(d-- != 0) {
+ succ = successors.nextInt();
+ if (dist[succ] == Integer.MAX_VALUE && dist[curr] + 1 <= maxDist) {
+ reachable++;
+ dist[succ] = dist[curr] + 1;
+ ecc = Math.max(ecc, dist[succ]);
+ queue.enqueue(succ);
+ if (print) System.out.println(succ);
+ }
+ }
+ }
+ }
+ pl.update();
+ }
+ pl.done();
+
+ if (!print)
+ if (start == -1) System.out.println("The maximum depth of a tree in the breadth-first spanning forest is " + ecc);
+ else {
+ System.out.println("The eccentricity of node " + start + " is " + ecc + " (" + reachable + " reachable nodes)");
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/ErdosRenyiGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/ErdosRenyiGraph.java
new file mode 100644
index 0000000..691d828
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/ErdosRenyiGraph.java
@@ -0,0 +1,335 @@
+package it.unimi.dsi.webgraph.examples;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import java.io.IOException;
+
+import org.apache.commons.math3.distribution.BinomialDistribution;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandomGenerator;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.ImmutableSequentialGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+/** An Erd&#x151;s&ndash;R&eacute;nyi random graph: the number of nodes
+ * is fixed, and there is a fixed probability that an arc is put
+ * between any two nodes (independently for every pair).
+ *
+ * <p>Note that an instance of this class is not {@linkplain ImmutableGraph#randomAccess() random-access}:
+ * you can, however, {@linkplain ArrayListMutableGraph#ArrayListMutableGraph(ImmutableGraph) make a mutable copy of the returned graph}
+ * and then {@linkplain ArrayListMutableGraph#immutableView() take its immutable view}.
+ *
+ * <p><strong>Warning</strong>: From version 3.5.2, this classes uses {@link BinomialDistribution}
+ * instead of the previous COLT-based {@code Binomial} class. As a side-effect, the graphs generated
+ * with the same parameters will be different.
+ */
+public class ErdosRenyiGraph extends ImmutableSequentialGraph {
+
+ private final static Logger LOGGER = LoggerFactory.getLogger(ErdosRenyiGraph.class);
+
+ /** Number of nodes. */
+ private final int n;
+ /** Probability to put an arc between each pair of nodes. */
+ private final double p;
+ /** Whether loops should also be generated. */
+ private final boolean loops;
+ /** The random seed. */
+ private final long seed;
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters and random seed.
+ *
+ * @param n the number of nodes.
+ * @param p the probability of generating an arc.
+ * @param seed a seed for pseudorandom number generation.
+ * @param loops whether loops are allowed or not.
+ */
+ public ErdosRenyiGraph(final int n, final double p, final long seed, final boolean loops) {
+ this.n = n;
+ this.p = p;
+ this.loops = loops;
+ this.seed = seed;
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters.
+ *
+ * @param n the number of nodes.
+ * @param p the probability of generating an arc.
+ * @param loops whether loops are allowed or not.
+ */
+ public ErdosRenyiGraph(final int n, final double p, final boolean loops) {
+ this(n, p, Util.randomSeed(), loops);
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters and no loops.
+ *
+ * @param n the number of nodes.
+ * @param p the probability of generating an arc.
+ */
+ public ErdosRenyiGraph(final int n, final double p) {
+ this(n, p, false);
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters and random seed.
+ *
+ * <p>This constructor can be used with an {@link ObjectParser}.
+ *
+ * @param n the number of nodes.
+ * @param p the probability of generating an arc.
+ * @param seed a seed for pseudorandom number generation.
+ * @param loops whether loops are allowed or not.
+ */
+ public ErdosRenyiGraph(final String n, final String p, final String seed, final String loops) {
+ this(Integer.parseInt(n), Double.parseDouble(p), Long.parseLong(seed), Boolean.parseBoolean(loops));
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters and no loops.
+ *
+ * <p>This constructor can be used with an {@link ObjectParser}.
+ *
+ * @param n the number of nodes.
+ * @param p the probability of generating an arc.
+ */
+ public ErdosRenyiGraph(final String n, final String p) {
+ this(Integer.parseInt(n), Double.parseDouble(p));
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters.
+ *
+ * <p>This constructor can be used with an {@link ObjectParser}.
+ *
+ * @param n the number of nodes.
+ * @param p the probability of generating an arc.
+ * @param loops whether loops are allowed or not.
+ */
+ public ErdosRenyiGraph(final String n, final String p, final String loops) {
+ this(Integer.parseInt(n), Double.parseDouble(p), Boolean.parseBoolean(loops));
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters and random seed.
+ *
+ * @param n the number of nodes.
+ * @param m the expected number of arcs.
+ * @param seed a seed for pseudorandom number generation.
+ * @param loops whether loops are allowed or not.
+ */
+ public ErdosRenyiGraph(final int n, final long m, final long seed, final boolean loops) {
+ this(n, (double)m / (loops? (long)n * n : (long)n * (n - 1)), seed, loops);
+ }
+
+ /** Creates an Erd&#x151;s&ndash;R&eacute;nyi graph with given parameters and random seed.
+ *
+ * @param n the number of nodes.
+ * @param m the expected number of arcs.
+ * @param loops whether loops are allowed or not.
+ */
+ public ErdosRenyiGraph(final int n, final long m, final boolean loops) {
+ this(n, m, Util.randomSeed(), loops);
+ }
+
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public ErdosRenyiGraph copy() {
+ return this;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ return new NodeIterator() {
+ private final XoRoShiRo128PlusRandomGenerator random = new XoRoShiRo128PlusRandomGenerator(seed);
+
+ private final BinomialDistribution bg = new BinomialDistribution(random, n - (loops ? 0 : 1), p);
+
+ private int outdegree;
+ private int curr = -1;
+ private final IntOpenHashSet successors = new IntOpenHashSet();
+ private int[] successorArray = new int[1024];
+
+ @Override
+ public boolean hasNext() {
+ return curr < n - 1;
+ }
+
+ @Override
+ public int nextInt() {
+ curr++;
+ outdegree = bg.sample();
+ successors.clear();
+ if (! loops) successors.add(curr);
+ for(int i = 0; i < outdegree; i++) while(! successors.add(random.nextInt(n)));
+ if (! loops) successors.remove(curr);
+ successorArray = IntArrays.grow(successorArray, outdegree);
+ successors.toArray(successorArray);
+ IntArrays.quickSort(successorArray, 0, outdegree);
+ return curr;
+ }
+
+ @Override
+ public int outdegree() {
+ return outdegree;
+ }
+
+ @Override
+ public int[] successorArray() {
+ return successorArray;
+ }
+
+ @Override
+ public NodeIterator copy(int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+ };
+ }
+
+ /** Generates an Erd&#x151;s&ndash;R&eacute;nyi graph with the specified seed.
+ *
+ * <p>This method exists only for backward compatibility.
+ *
+ * @param seed the seed for random generation.
+ * @return the generated graph.
+ * @deprecated An instance of this class is already an {@link ImmutableSequentialGraph}.
+ */
+ @Deprecated
+ public ImmutableSequentialGraph generate(final long seed) {
+ LOGGER.debug("Generating with probability " + p);
+
+ return new ImmutableSequentialGraph() {
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public ImmutableSequentialGraph copy() {
+ return this;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ return new NodeIterator() {
+ private final XoRoShiRo128PlusRandomGenerator random = new XoRoShiRo128PlusRandomGenerator(seed);
+
+ private final BinomialDistribution bg = new BinomialDistribution(random, n - (loops ? 0 : 1), p);
+
+ private int outdegree;
+ private int curr = -1;
+ private final IntOpenHashSet successors = new IntOpenHashSet();
+ private int[] successorArray = new int[1024];
+
+ @Override
+ public boolean hasNext() {
+ return curr < n - 1;
+ }
+
+ @Override
+ public int nextInt() {
+ curr++;
+ outdegree = bg.sample();
+ successors.clear();
+ if (! loops) successors.add(curr);
+ for(int i = 0; i < outdegree; i++) while(! successors.add(random.nextInt(n)));
+ if (! loops) successors.remove(curr);
+ successorArray = IntArrays.grow(successorArray, outdegree);
+ successors.toIntArray(successorArray);
+ IntArrays.quickSort(successorArray, 0, outdegree);
+ return curr;
+ }
+
+ @Override
+ public int outdegree() {
+ return outdegree;
+ }
+
+ @Override
+ public int[] successorArray() {
+ return successorArray;
+ }
+
+ @Override
+ public NodeIterator copy(int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+ };
+ }
+ };
+ }
+
+ /** Generates an Erd&#x151;s&ndash;R&eacute;nyi graph.
+ *
+ * <p>This method exists only for backward compatibility.
+ *
+ * @return the generated graph.
+ * @deprecated An instance of this class is already an {@link ImmutableSequentialGraph}.
+ */
+ @Deprecated
+ public ImmutableGraph generate() {
+ return generate(Util.randomSeed());
+ }
+
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(ErdosRenyiGraph.class.getName(), "Generates an Erd\u0151s-R\u00E9nyi random graph and stores it as a BVGraph.",
+ new Parameter[] {
+ new Switch("loops", 'l', "loops", "Whether the graph should include self-loops."),
+ new FlaggedOption("p", JSAP.DOUBLE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'p', "The probability of generating an arc."),
+ new FlaggedOption("m", JSAP.LONGSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'm', "The expected number of arcs."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.REQUIRED, "The basename of the output graph file."),
+ new UnflaggedOption("n", JSAP.INTEGER_PARSER, JSAP.REQUIRED, "The number of nodes."),
+ });
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String baseName = jsapResult.getString("basename");
+ final int n = jsapResult.getInt("n");
+ final boolean loops = jsapResult.getBoolean("loops");
+
+ if (jsapResult.userSpecified("p") && jsapResult.userSpecified("m")) {
+ System.err.println("Options p and m cannot be specified together");
+ System.exit(1);
+ }
+ if (! jsapResult.userSpecified("p") && ! jsapResult.userSpecified("m")) {
+ System.err.println("Exactly one of the options p and m must be specified");
+ System.exit(1);
+ }
+
+ BVGraph.store((jsapResult.userSpecified("p") ? new ErdosRenyiGraph(n, jsapResult.getDouble("p"), loops) : new ErdosRenyiGraph(n, jsapResult.getLong("m"), loops)), baseName, new ProgressLogger());
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/IntegerListImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/IntegerListImmutableGraph.java
new file mode 100644
index 0000000..516381b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/IntegerListImmutableGraph.java
@@ -0,0 +1,170 @@
+package it.unimi.dsi.webgraph.examples;
+
+/*
+ * Copyright (C) 2006-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.ImmutableSequentialGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+import java.io.DataInputStream;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+
+/** Exposes a graph in a simple binary format as an (offline-only) {@link ImmutableGraph}.
+ *
+ * <P>This class is a simple example that should help in understanding how to interface
+ * WebGraph with external data. We have a graph contained in a file and represented by a list of binary
+ * 32-bit integers as follows:
+ * first we have the number of nodes, then the number of successors of node 0, then the list in increasing
+ * order of successors of node 0, then the number of successors of node 1, then the list in increasing
+ * order of successors of node 1, and so on.
+ *
+ * <P>If we want to transform this graph into, say, a {@link it.unimi.dsi.webgraph.BVGraph},
+ * we must create a class that exposes the file as an {@link it.unimi.dsi.webgraph.ImmutableGraph}
+ * and than save it using {@link it.unimi.dsi.webgraph.BVGraph#store(ImmutableGraph,CharSequence)} or by calling
+ * the main method of {@link it.unimi.dsi.webgraph.BVGraph}.
+ * A complete implementation is not necessary, as {@link it.unimi.dsi.webgraph.BVGraph} uses
+ * just {@link #nodeIterator()}. Since we are just interesting in importing data, we do not
+ * implement efficient random access methods, and the only loading method we implement is {@link #loadOffline(CharSequence)}.
+ */
+
+public class IntegerListImmutableGraph extends ImmutableSequentialGraph {
+
+ /** The filename of the graph. */
+ final private String filename;
+ /** The number of nodes, read at creation time and cached. */
+ final private int numNodes;
+
+ private IntegerListImmutableGraph(final CharSequence filename) throws IOException {
+ this.filename = filename.toString();
+ final DataInputStream dis = new DataInputStream(new FileInputStream(this.filename));
+ numNodes = dis.readInt();
+ dis.close();
+ }
+
+ @Override
+ public int numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ try {
+ return new NodeIterator() {
+ final int n = numNodes();
+ final DataInputStream dis = new DataInputStream(new FileInputStream(IntegerListImmutableGraph.this.filename));
+ int curr = - 1, outdegree;
+ int successorsArray[] = IntArrays.EMPTY_ARRAY;
+
+ {
+ try {
+ dis.readInt(); // Skip number of nodes
+ }
+ catch(IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ try {
+ outdegree = dis.readInt();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return ++curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return (curr < n - 1);
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (curr == - 1) throw new IllegalStateException();
+ successorsArray = IntArrays.ensureCapacity(successorsArray, outdegree, 0);
+ try {
+ for(int i = 0; i< outdegree; i++) successorsArray[i] = dis.readInt();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return successorsArray;
+ }
+
+ @Override
+ public int outdegree() {
+ if (curr == - 1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ dis.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ @Override
+ public NodeIterator copy(int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+ };
+ }
+ catch (FileNotFoundException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ public static ImmutableGraph load(final CharSequence basename, final ProgressLogger pl) {
+ throw new UnsupportedOperationException("Graphs may be loaded offline only");
+ }
+
+ public static ImmutableGraph load(final CharSequence basename) {
+ return load(basename, (ProgressLogger)null);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(final CharSequence basename, final ProgressLogger pl) {
+ return load(basename, pl);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(final CharSequence basename) {
+ return load(basename, (ProgressLogger)null);
+ }
+
+ public static ImmutableGraph loadOffline(final CharSequence basename, final ProgressLogger pl) throws IOException {
+ return new IntegerListImmutableGraph(basename);
+ }
+
+ public static ImmutableGraph loadOffline(final CharSequence basename) throws IOException {
+ return loadOffline(basename, (ProgressLogger)null);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/IntegerTriplesArcLabelledImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/IntegerTriplesArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..e01bf25
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/IntegerTriplesArcLabelledImmutableGraph.java
@@ -0,0 +1,234 @@
+package it.unimi.dsi.webgraph.examples;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.webgraph.AbstractLazyIntIterator;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableSequentialGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.webgraph.labelling.BitStreamArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.GammaCodedIntLabel;
+import it.unimi.dsi.webgraph.labelling.Label;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.NoSuchElementException;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** A class exposing a list of triples as an {@link ArcLabelledImmutableGraph}. The triples are
+ * interpreted as labelled arcs: the first element is the source, the second element is the target,
+ * and the third element must be a nonnegative integer that will be saved using a {@link GammaCodedIntLabel}.
+ *
+ * <p>This class is mainly a useful example of how to expose of your data <i>via</i> an {@link ArcLabelledImmutableGraph}, and
+ * it is also used to build test cases, but it is not efficient or particularly refined.
+ *
+ * <p>A main method reads from standard input a list of TAB-separated triples and writes the corresponding graph
+ * using {@link BVGraph} and {@link BitStreamArcLabelledImmutableGraph}.
+ */
+
+public class IntegerTriplesArcLabelledImmutableGraph extends ArcLabelledImmutableSequentialGraph {
+ /** The list of triples. */
+ final private int[][] triple;
+ /** The prototype of the labels used by this class. */
+ final private GammaCodedIntLabel prototype;
+ /** The number of nodes, computed at construction time by triple inspection. */
+ final private int n;
+
+ /** Creates a new arc-labelled immutable graph using a specified list of triples.
+ *
+ * <p>Note that it is impossible to specify isolated nodes with indices larger than
+ * the largest node with positive indegree or outdegree, as the number of nodes is computed
+ * by maximising over all indices in <code>triple</code>.
+ *
+ * @param triple a list of triples specifying labelled arcs (see the {@linkplain IntegerTriplesArcLabelledImmutableGraph class documentation});
+ * order is not relevant, but multiple arcs are not allowed.
+ */
+ public IntegerTriplesArcLabelledImmutableGraph(int[][] triple) {
+ this.triple = triple;
+ prototype = new GammaCodedIntLabel("FOO");
+ int m = 0;
+ for(int i = 0; i < triple.length; i++) m = Math.max(m, Math.max(triple[i][0], triple[i][1]));
+ Arrays.sort(triple, new Comparator<int[]>() {
+ @Override
+ public int compare(int[] p, int[] q) {
+ final int t = p[0] - q[0]; // Compare by source
+ if (t != 0) return t;
+ final int u = p[1] - q[1]; // Compare by destination
+ if (u == 0) throw new IllegalArgumentException("Duplicate arc <" + p[0] + "," + p[1] + ">");
+ return u;
+ }
+ });
+
+ n = m + 1;
+ }
+
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return true;
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(int from) {
+ ArcLabelledNodeIterator result = nodeIterator();
+ for (int i = 0; i < from; i++) result.nextInt();
+ return result;
+ }
+
+ @Override
+ public LabelledArcIterator successors(int from) {
+ ArcLabelledNodeIterator nodeIterator = nodeIterator(from);
+ return nodeIterator.successors();
+ }
+
+ private final class ArcIterator extends AbstractLazyIntIterator implements LabelledArcIterator {
+ private final int d;
+ private int k = 0; // Index of the last returned triple is pos+k
+ private final int pos;
+ private final GammaCodedIntLabel label;
+
+ private ArcIterator(int d, int pos, GammaCodedIntLabel label) {
+ this.d = d;
+ this.pos = pos;
+ this.label = label;
+ }
+
+ @Override
+ public Label label() {
+ if (k == 0) throw new IllegalStateException();
+ label.value = triple[pos + k][2];
+ return label;
+ }
+
+ @Override
+ public int nextInt() {
+ if (k >= d) return -1;
+ return triple[pos + ++k][1];
+ }
+ }
+
+ class InternalArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ /** Last node returned by this iterator. */
+ private int last = -1;
+ /** Last triple examined by this iterator. */
+ private int pos = -1;
+ /** A local copy of the prototye. */
+ private GammaCodedIntLabel label = prototype.copy();
+ /** No node &ge; this will be returned. */
+ private final int upperBound;
+
+ public InternalArcLabelledNodeIterator(final int upperBound) {
+ this.upperBound = upperBound;
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ if (last < 0) throw new IllegalStateException();
+ final int d = outdegree(); // Triples to be returned are pos+1,pos+2,...,pos+d
+ return new ArcIterator(d, pos, label);
+ }
+
+ @Override
+ public int outdegree() {
+ if (last < 0) throw new IllegalStateException();
+ int p;
+ for (p = pos + 1; p < triple.length && triple[p][0] == last; p++);
+ return p - pos - 1;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return last < Math.min(n - 1, upperBound - 1);
+ }
+
+ @Override
+ public int nextInt() {
+ if (!hasNext()) throw new NoSuchElementException();
+ if (last >= 0) pos += outdegree();
+ return ++last;
+ }
+
+ @Override
+ public ArcLabelledNodeIterator copy(final int upperBound) {
+ InternalArcLabelledNodeIterator result = new InternalArcLabelledNodeIterator(upperBound);
+ result.last = last;
+ result.pos = pos;
+ result.label = prototype.copy();
+ return result;
+ }
+
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return new InternalArcLabelledNodeIterator(Integer.MAX_VALUE);
+ }
+
+ public static void main(String arg[]) throws JSAPException, IOException {
+ final SimpleJSAP jsap = new SimpleJSAP(IntegerTriplesArcLabelledImmutableGraph.class.getName(),
+ "Reads from standard input a list of triples <source,dest,label>, where the three " +
+ "components are separated by a TAB, and saves the " +
+ "corresponding arc-labelled graph using a BVGraph and a BitStreamArcLabelledImmutableGraph. " +
+ "Labels are represeted using GammaCodedIntLabel.",
+ new Parameter[] {
+ //new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting arc-labelled graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+ final String basename = jsapResult.getString("basename");
+
+ // We read triples from stdin, parse them and feed them to the constructor.
+ BufferedReader br = new BufferedReader(new InputStreamReader(System.in, "ASCII"));
+ ObjectArrayList<int[]> list = new ObjectArrayList<>();
+
+ String line;
+ while((line = br.readLine()) != null) {
+ final String p[] = line.split("\t");
+ list.add(new int[] { Integer.parseInt(p[0]),Integer.parseInt(p[1]), Integer.parseInt(p[2]) });
+ }
+
+ final ArcLabelledImmutableGraph g = new IntegerTriplesArcLabelledImmutableGraph(list.toArray(new int[0][]));
+ BVGraph.store(g, basename + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX);
+ BitStreamArcLabelledImmutableGraph.store(g, basename, basename + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/OutdegreeStats.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/OutdegreeStats.java
new file mode 100644
index 0000000..d5460e5
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/OutdegreeStats.java
@@ -0,0 +1,108 @@
+package it.unimi.dsi.webgraph.examples;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** The main method of this class loads an arbitrary {@link it.unimi.dsi.webgraph.ImmutableGraph}
+ * and performs a sequential scan to establish the minimum, maximum and average outdegree.
+ */
+
+public class OutdegreeStats {
+
+ private OutdegreeStats() {}
+
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException, IOException {
+ SimpleJSAP jsap = new SimpleJSAP(OutdegreeStats.class.getName(), "Prints on standard error the maximum, minimum and average degree of a graph, and outputs on standard output the numerosity of each outdegree value (first line is the number of nodes with outdegree 0).",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final String basename = jsapResult.getString("basename");
+
+ final ProgressLogger pl = new ProgressLogger();
+ pl.logInterval = jsapResult.getLong("logInterval");
+ final ImmutableGraph graph;
+ // We fetch by reflection the class specified by the user
+ if (graphClass != null) graph = (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class).invoke(null, basename);
+ else graph = ImmutableGraph.loadOffline(basename, pl);
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ int count[] = IntArrays.EMPTY_ARRAY;
+ int curr, d, maxd = 0, maxNode = 0, mind = Integer.MAX_VALUE, minNode = 0;
+ long totd = 0;
+
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Scanning...");
+
+ for(int i = graph.numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextInt();
+ d = nodeIterator.outdegree();
+
+ if (d < mind) {
+ mind = d;
+ minNode = curr;
+ }
+
+ if (d > maxd){
+ maxd = d;
+ maxNode = curr;
+ }
+
+ totd += d;
+
+ if (d >= count.length) count = IntArrays.grow(count, d + 1);
+ count[d]++;
+
+ pl.lightUpdate();
+ }
+
+ pl.done();
+
+ System.err.println("The minimum outdegree is " + mind + ", attained by node " + minNode);
+ System.err.println("The maximum outdegree is " + maxd + ", attained by node " + maxNode);
+ System.err.println("The average outdegree is " + (double)totd / graph.numNodes());
+
+ TextIO.storeInts(count, 0, maxd + 1, System.out);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/package.html b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/package.html
new file mode 100644
index 0000000..f78c861
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/examples/package.html
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>WebGraph Usage Examples</title>
+ </head>
+
+ <body>
+
+ <P>Example classes that do nice things using the WebGraph framework.
+
+ </body>
+</html>
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/jung/JungAdapter.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/jung/JungAdapter.java
new file mode 100644
index 0000000..dc49da6
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/jung/JungAdapter.java
@@ -0,0 +1,412 @@
+package it.unimi.dsi.webgraph.jung;
+
+/*
+ * Copyright (C) 2012-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.objects.AbstractObjectList;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.fastutil.objects.ObjectLists;
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.Transform;
+
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.util.Collection;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import edu.uci.ics.jung.graph.DirectedGraph;
+import edu.uci.ics.jung.graph.util.EdgeType;
+import edu.uci.ics.jung.graph.util.Pair;
+import edu.uci.ics.jung.io.PajekNetWriter;
+
+/** An adapter exposing an {@link ImmutableGraph} as a <a href="http://jung.sourceforge.net/">Jung</a>
+ * {@link DirectedGraph}.
+ *
+ * <p>Using this adapter it is easy to apply Jung's analysis and visualisation code to {@linkplain ImmutableGraph immutable graphs}.
+ *
+ * <p>Edges are just {@link Long}s, and their values are the index of the source node, shifted to the left by
+ * 32 bits, OR'd with the index of the target node.
+ *
+ * <p>The main method of this class provides a simple way to translate any immutable graph in {@linkplain PajekNetWriter Pajek format}.
+ */
+
+public class JungAdapter implements DirectedGraph<Integer, Long> {
+ /** The immutable graph to be exposed. */
+ private final ImmutableGraph graph;
+ /** The transpose of {@link #graph}. */
+ private final ImmutableGraph transpose;
+ /** The number of nodes of {@link #graph}. */
+ private final int n;
+
+ /** Creates a Jung adapter.
+ *
+ * @param graph a graph.
+ * @param transpose its transpose (look at {@link Transform#transpose(ImmutableGraph)}
+ * and {@link Transform#transposeOffline(ImmutableGraph, int, java.io.File)} for ways
+ * to generate the transpose of a graph).
+ * @throws IllegalArgumentException if <code>graph</code> has more than {@link Integer#MAX_VALUE} arcs (as
+ * {@link #getEdgeCount()} returns an integer).
+ */
+
+ public JungAdapter(final ImmutableGraph graph, final ImmutableGraph transpose) {
+ this.graph = graph;
+ this.transpose = transpose;
+ this.n = graph.numNodes();
+ if (graph.numArcs() > Integer.MAX_VALUE) throw new IllegalArgumentException();
+ }
+
+ @Override
+ public Integer getSource(final Long e) {
+ return Integer.valueOf((int)(e.longValue() >>> 32));
+ }
+
+ @Override
+ public Integer getDest(final Long e) {
+ return Integer.valueOf(e.intValue());
+ }
+
+ @Override
+ public Pair<Integer> getEndpoints(final Long e) {
+ return new Pair<>(getSource(e), getDest(e));
+ }
+
+ @Override
+ public Collection<Long> getInEdges(final Integer x) {
+ final int v = x.intValue();
+ final ObjectArrayList<Long> list = new ObjectArrayList<>(transpose.outdegree(v));
+ final LazyIntIterator pred = transpose.successors(v);
+ for(long p; (p = pred.nextInt()) != -1;) list.add(Long.valueOf(p << 32 | v));
+ return list;
+ }
+
+ @Override
+ public Integer getOpposite(final Integer v, final Long e) {
+ final int x = e.intValue();
+ if (x != v.intValue()) return Integer.valueOf(x);
+ else return getSource(e);
+ }
+
+ @Override
+ public Collection<Long> getOutEdges(final Integer x) {
+ final int v = x.intValue();
+ final ObjectArrayList<Long> list = new ObjectArrayList<>(graph.outdegree(v));
+ final LazyIntIterator succ = graph.successors(v);
+ final long nodeShifted = (long)v << 32;
+ for(int s; (s = succ.nextInt()) != -1;) list.add(Long.valueOf(nodeShifted | s));
+ return list;
+ }
+
+ @Override
+ public int getPredecessorCount(final Integer x) {
+ return transpose.outdegree(x.intValue());
+ }
+
+ private static Collection<Integer> getSuccessors(final ImmutableGraph g, final int x) {
+ final ObjectArrayList<Integer> list = new ObjectArrayList<>(g.outdegree(x));
+ final LazyIntIterator succ = g.successors(x);
+ for(int s; (s = succ.nextInt()) != -1;) list.add(Integer.valueOf(s));
+ return list;
+ }
+
+ @Override
+ public Collection<Integer> getPredecessors(final Integer x) {
+ return getSuccessors(transpose, x.intValue());
+ }
+
+ @Override
+ public int getSuccessorCount(final Integer x) {
+ return graph.outdegree(x.intValue());
+ }
+
+ @Override
+ public Collection<Integer> getSuccessors(Integer x) {
+ return getSuccessors(graph, x.intValue());
+ }
+
+ @Override
+ public int inDegree(Integer x) {
+ return getPredecessorCount(x);
+ }
+
+ @Override
+ public boolean isDest(final Integer v, final Long e) {
+ return e.intValue() == v.intValue();
+ }
+
+ public boolean isArc(final int x, final int y) {
+ final LazyIntIterator succ = graph.successors(x);
+ for(int s; (s = succ.nextInt()) != -1;) if (s == y) return true;
+ return false;
+ }
+
+ @Override
+ public boolean isPredecessor(final Integer x, final Integer y) {
+ return isArc(x.intValue(), y.intValue());
+ }
+
+ @Override
+ public boolean isSource(final Integer v, final Long e) {
+ return e.longValue() >>> 32 == v.intValue();
+ }
+
+ @Override
+ public boolean isSuccessor(final Integer x, final Integer y) {
+ return isArc(y.intValue(), x.intValue());
+ }
+
+ @Override
+ public int outDegree(final Integer x) {
+ return graph.outdegree(x.intValue());
+ }
+
+ @Override
+ public boolean containsEdge(final Long e) {
+ return isArc((int)(e.longValue() >>> 32), e.intValue());
+ }
+
+ @Override
+ public boolean containsVertex(final Integer x) {
+ int v = x.intValue();
+ return v >= 0 && v < n;
+ }
+
+ @Override
+ public int degree(final Integer x) {
+ final int v = x.intValue();
+ int self = 0;
+
+ final LazyIntIterator succ = graph.successors(v);
+ for(int s; (s = succ.nextInt()) != -1;) if (s == v) self++;
+
+ return graph.outdegree(v) + (transpose.outdegree(v) - self);
+ }
+
+ @Override
+ public Long findEdge(final Integer x, final Integer y) {
+ if (! containsVertex(x) || ! containsVertex(y)) return null;
+ final Long l = Long.valueOf(x.longValue() << 32 | y.intValue());
+ return containsEdge(l) ? l : null;
+ }
+
+ @Override
+ public Collection<Long> findEdgeSet(final Integer x, final Integer y) {
+ final Long e = Long.valueOf(x.longValue() << 32 | y.intValue());
+ return containsEdge(e) ? ObjectLists.singleton(e) : ObjectLists.EMPTY_LIST;
+ }
+
+ @Override
+ public EdgeType getDefaultEdgeType() {
+ return EdgeType.DIRECTED;
+ }
+
+ @Override
+ public int getEdgeCount() {
+ return (int)graph.numArcs();
+ }
+
+ @Override
+ public int getEdgeCount(final EdgeType x) {
+ return EdgeType.DIRECTED.equals(x) ? getEdgeCount() : 0;
+ }
+
+ @Override
+ public EdgeType getEdgeType(final Long e) {
+ return EdgeType.DIRECTED;
+ }
+
+ @Override
+ public Collection<Long> getEdges() {
+ final ObjectArrayList<Long> edges = new ObjectArrayList<>();
+ final NodeIterator iterator = graph.nodeIterator();
+ for(int i = n; i-- != 0;) {
+ final int x = iterator.nextInt();
+ final long xShifted = (long)x << 32;
+ final int d = iterator.outdegree();
+ final int[] s = iterator.successorArray();
+ for(int j = 0; j < d; j++) edges.add(Long.valueOf(xShifted | s[j]));
+ }
+ return edges;
+ }
+
+ @Override
+ public Collection<Long> getEdges(final EdgeType x) {
+ return EdgeType.DIRECTED.equals(x) ? getEdges() : ObjectLists.EMPTY_LIST;
+ }
+
+ @Override
+ public int getIncidentCount(final Long e) {
+ return e.intValue() != e.longValue() >>> 32 ? 2 : 1;
+ }
+
+ @Override
+ public Collection<Long> getIncidentEdges(final Integer x) {
+ final int v = x.intValue();
+ final long vShifted = (long)v << 32;
+ final int outdegree = graph.outdegree(v);
+ final int indegree = transpose.outdegree(v);
+ final LazyIntIterator succ = graph.successors(v);
+ final LazyIntIterator pred = transpose.successors(v);
+
+ final ObjectArrayList<Long> res = new ObjectArrayList<>(outdegree + indegree);
+ for(int s; (s = succ.nextInt()) != -1;) res.add(Long.valueOf(vShifted | s));
+ // We do not add loops again.
+ for(int p; (p = pred.nextInt()) != -1;) if (p != v) res.add(Long.valueOf((long)p << 32 | v));
+ return res;
+ }
+
+ @Override
+ public Collection<Integer> getIncidentVertices(final Long e) {
+ final int x = (int)(e.longValue() >>> 32);
+ final int y = e.intValue();
+ if (x == y) return ObjectLists.singleton(Integer.valueOf(x));
+ final ObjectArrayList<Integer> res = new ObjectArrayList<>();
+ res.add(Integer.valueOf(x));
+ res.add(Integer.valueOf(y));
+ return res;
+ }
+
+ @Override
+ public int getNeighborCount(final Integer x) {
+ return getNeighbors(x).size();
+ }
+
+ @Override
+ public Collection<Integer> getNeighbors(final Integer x) {
+ final int v = x.intValue();
+ final int outdegree = graph.outdegree(v);
+ final int indegree = transpose.outdegree(v);
+ final LazyIntIterator succ = graph.successors(v);
+ final LazyIntIterator pred = transpose.successors(v);
+
+ final ObjectOpenHashSet<Integer> res = new ObjectOpenHashSet<>(outdegree + indegree);
+ for(int s; (s = succ.nextInt()) != -1;) res.add(Integer.valueOf(s));
+ for(int p; (p = pred.nextInt()) != -1;) res.add(Integer.valueOf(p));
+ return res;
+ }
+
+ @Override
+ public int getVertexCount() {
+ return n;
+ }
+
+ @Override
+ public Collection<Integer> getVertices() {
+ return new AbstractObjectList<Integer>() {
+ @Override
+ public Integer get(final int x) {
+ return Integer.valueOf(x);
+ }
+ @Override
+ public int size() {
+ return n;
+ }
+ };
+ }
+
+ @Override
+ public boolean isIncident(final Integer x, final Long e) {
+ final int v = x.intValue();
+ return e.intValue() == v || e.longValue() >>> 32 == v;
+ }
+
+ @Override
+ public boolean isNeighbor(final Integer x, final Integer y) {
+ final int v = x.intValue();
+ final int w = y.intValue();
+
+ final LazyIntIterator succ = graph.successors(v);
+ for(int s; (s = succ.nextInt()) != -1;) if (s == w) return true;
+ final LazyIntIterator pred = transpose.successors(v);
+ for(int p; (p = pred.nextInt()) != -1;) if (p == w) return true;
+ return false;
+ }
+
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean removeEdge(final Long e) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean removeVertex(final Integer x) {
+ throw new UnsupportedOperationException();
+ }
+
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean addEdge(final Long e, final Integer y, final Integer arg2) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean addEdge(final Long e, final Integer y, final Integer arg2, final EdgeType arg3) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean addEdge(final Long e, final Collection<? extends Integer> y) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean addEdge(final Long e, final Collection<? extends Integer> y, final EdgeType arg2) {
+ throw new UnsupportedOperationException();
+ }
+
+ /** @throws UnsupportedOperationException */
+ @Override
+ public boolean addVertex(final Integer x) {
+ throw new UnsupportedOperationException();
+ }
+
+ public static void main(final String[] arg) throws IOException, JSAPException {
+ final SimpleJSAP simpleJSAP = new SimpleJSAP(JungAdapter.class.getName(), "Reads a graph with a given basename, optionally its transpose, and writes it on standard output in Pajek format.",
+ new Parameter[] {
+ new Switch("offline", 'o', "offline", "Use the offline load method to reduce memory consumption. It usually works, but your mileage may vary."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph."),
+ new UnflaggedOption("transpose", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose. If unspecified, the JungAdapter constructor will be provided with null as a parameter. This usually works, but your mileage may vary.")
+ });
+
+ final JSAPResult jsapResult = simpleJSAP.parse(arg);
+ if (simpleJSAP.messagePrinted()) System.exit(1);
+ final boolean offline = jsapResult.userSpecified("offline");
+
+ final ImmutableGraph graph = offline ? ImmutableGraph.loadOffline(jsapResult.getString("basename")) : ImmutableGraph.load(jsapResult.getString("basename"));
+ final ImmutableGraph transpose = jsapResult.userSpecified("transpose") ? (offline ? ImmutableGraph.loadOffline(jsapResult.getString("transpose")) : ImmutableGraph.load(jsapResult.getString("transpose"))) : null;
+
+ final PrintWriter printWriter = new PrintWriter(System.out);
+ new PajekNetWriter<Integer, Long>().save(new JungAdapter(graph, transpose), printWriter);
+ printWriter.flush();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractIntLabel.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractIntLabel.java
new file mode 100644
index 0000000..a654df4
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractIntLabel.java
@@ -0,0 +1,128 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract (single-attribute) integer label.
+ *
+ * <p>This class provides basic methods for a label holding an integer.
+ * Concrete implementations may impose further requirements on the integer.
+ *
+ * <p>Implementing subclasses must provide constructors, {@link Label#copy()},
+ * {@link Label#fromBitStream(it.unimi.dsi.io.InputBitStream, int)}, {@link Label#toBitStream(it.unimi.dsi.io.OutputBitStream, int)}
+ * and possibly override {@link #toString()}.
+ */
+
+public abstract class AbstractIntLabel extends AbstractLabel implements Label {
+ /** The key of the attribute represented by this label. */
+ protected final String key;
+ /** The value of the attribute represented by this label. */
+ public int value;
+
+ /** Creates an int label with given key and value.
+ *
+ * @param key the (only) key of this label.
+ * @param value the value of this label.
+ */
+ public AbstractIntLabel(String key, int value) {
+ this.key = key;
+ this.value = value;
+ }
+
+ @Override
+ public String wellKnownAttributeKey() {
+ return key;
+ }
+
+ @Override
+ public String[] attributeKeys() {
+ return new String[] { key };
+ }
+
+ @Override
+ public Class<?>[] attributeTypes() {
+ return new Class[] { int.class };
+ }
+
+ @Override
+ public Object get(String key) {
+ return Integer.valueOf(getInt(key));
+ }
+
+ @Override
+ public int getInt(String key) {
+ if (this.key.equals(key)) return value;
+ throw new IllegalArgumentException("Unknown key " + key);
+ }
+
+ @Override
+ public long getLong(String key) {
+ return getInt(key);
+ }
+
+ @Override
+ public float getFloat(String key) {
+ return getInt(key);
+ }
+
+ @Override
+ public double getDouble(String key) {
+ return getInt(key);
+ }
+
+ @Override
+ public Object get() {
+ return Integer.valueOf(getInt());
+ }
+
+ @Override
+ public int getInt() {
+ return value;
+ }
+
+ @Override
+ public long getLong() {
+ return value;
+ }
+
+ @Override
+ public float getFloat() {
+ return value;
+ }
+
+ @Override
+ public double getDouble() {
+ return value;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value;
+ }
+
+ @Override
+ public boolean equals(Object x) {
+ if (x instanceof AbstractIntLabel) return (value == ((AbstractIntLabel)x).value);
+ else return false;
+ }
+
+ @Override
+ public int hashCode() {
+ return value;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractIntListLabel.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractIntListLabel.java
new file mode 100644
index 0000000..58b5529
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractIntListLabel.java
@@ -0,0 +1,90 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.util.Arrays;
+
+/** An abstract (single-attribute) list-of-integers label.
+*
+* <p>This class provides basic methods for a label holding a list of integers.
+* Concrete implementations may impose further requirements on the integer.
+*
+* <p>Implementing subclasses must provide constructors, {@link Label#copy()},
+* {@link Label#fromBitStream(it.unimi.dsi.io.InputBitStream, int)}, {@link Label#toBitStream(it.unimi.dsi.io.OutputBitStream, int)}
+* and possibly override {@link #toString()}.
+*/
+
+public abstract class AbstractIntListLabel extends AbstractLabel implements Label {
+ /** The key of the attribute represented by this label. */
+ protected final String key;
+ /** The values of the attribute represented by this label. */
+ public int[] value;
+
+ /** Creates an int label with given key and value.
+ *
+ * @param key the (only) key of this label.
+ * @param value the value of this label.
+ */
+ public AbstractIntListLabel(String key, int[] value) {
+ this.key = key;
+ this.value = value;
+ }
+
+ @Override
+ public String wellKnownAttributeKey() {
+ return key;
+ }
+
+ @Override
+ public String[] attributeKeys() {
+ return new String[] { key };
+ }
+
+ @Override
+ public Class<?>[] attributeTypes() {
+ return new Class[] { int[].class };
+ }
+
+ @Override
+ public Object get(String key) {
+ if (this.key.equals(key)) return value;
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public Object get() {
+ return value;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + Arrays.toString(value);
+ }
+
+ @Override
+ public boolean equals(Object x) {
+ if (x instanceof AbstractIntListLabel) return Arrays.equals(value, ((AbstractIntListLabel)x).value);
+ else return false;
+ }
+
+ @Override
+ public int hashCode() {
+ return Arrays.hashCode(value);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractLabel.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractLabel.java
new file mode 100644
index 0000000..8714948
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/AbstractLabel.java
@@ -0,0 +1,104 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract implementation throwing an {@link IllegalArgumentException} on all primitive-type methods. */
+
+public abstract class AbstractLabel implements Label {
+
+ @Override
+ public byte getByte() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public short getShort(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public int getInt(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public long getLong(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public float getFloat(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public double getDouble(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public char getChar(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public boolean getBoolean(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public byte getByte(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public short getShort() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public int getInt() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public long getLong() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public float getFloat() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public double getDouble() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public char getChar() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public boolean getBoolean() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..e11c78f
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableGraph.java
@@ -0,0 +1,249 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+/** An abstract implementation of a graph labelled on its arcs.
+ *
+ * <p>The main purpose of this class is that of override covariantly the return
+ * type of {@link #nodeIterator()} and {@link #nodeIterator(int)} so that
+ * it is an {@link ArcLabelledNodeIterator}, and the return type of
+ * all static load methods and of {@link #copy()} so that it is an {@link ArcLabelledImmutableGraph} (the
+ * methods themselves just delegate to the corresponding method in {@link ImmutableGraph}).
+ *
+ * <p>The only additional instance methods are {@link #labelArray(int)} and {@link #prototype()}.
+ *
+ * <h2>Saving labels</h2>
+ *
+ * <P>A subclass of this class <strong>may</strong> implement
+ * <UL>
+ * <LI><code>store(ArcLabelledImmutableGraph, CharSequence, CharSequence, ProgressLogger)</code>;
+ * <LI><code>store(ArcLabelledImmutableGraph, CharSequence, CharSequence)</code>.
+ * </UL>
+ *
+ * <p>These methods must save the labels of the given arc-labelled graph using the first given character
+ * sequence as a basename, and a suitable property file using the second given basename. Note that the graph
+ * will <strong>not</strong> be saved&mdash;use the <code>store()</code>
+ * method of an {@link ImmutableGraph} implementation for that purpose.
+ *
+ * <p>For istance, assuming <code>g</code> is an arc-labelled graph the idiomatic way
+ * of storing it on disk using {@link BVGraph} for the underlying graph and
+ * {@link BitStreamArcLabelledImmutableGraph} for the labels is
+ * <pre>
+ * BVGraph.store(g, "foo");
+ * BitStreamArcLabelledImmutableGraph.store(g, "bar", "foo");
+ * </pre>
+ *
+ * <h2>Underlying graphs</h2>
+ *
+ * <p>Often, implementations of this class will just wrap an <em>underlying graph</em> (i.e.,
+ * an instance of {@link ImmutableGraph}). In that case, we suggest that if the implementation
+ * uses property files the basename of the underlying graph is specified using the property
+ * key {@link #UNDERLYINGGRAPH_PROPERTY_KEY}. If the basename must be generated starting
+ * from the arc-labelled graph basename, we suggest to just add at the end the string
+ * {@link #UNDERLYINGGRAPH_SUFFIX}.
+ */
+
+public abstract class ArcLabelledImmutableGraph extends ImmutableGraph {
+
+ /** The standard property key for the underlying graph. All implementations decorating
+ * with labels an underlying graph are strongly encouraged to use this property
+ * name to specify the basename of the underlying graph. */
+ public static final String UNDERLYINGGRAPH_PROPERTY_KEY = "underlyinggraph";
+ /** The standard suffix added to basenames in order to give a basename
+ * to the underlying graph, when needed. */
+ public static final String UNDERLYINGGRAPH_SUFFIX = "-underlying";
+
+
+ @Override
+ public abstract ArcLabelledImmutableGraph copy();
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return nodeIterator(0);
+ }
+
+ /** Returns a node iterator for scanning the graph sequentially, starting from the given node.
+ *
+ * <P>This implementation strengthens that provided in {@link ImmutableGraph}, but
+ * calls the labelled random-access method {@link #successors(int)}.
+ *
+ * @param from the node from which the iterator will iterate.
+ * @return an {@link ArcLabelledNodeIterator} for accessing nodes, successors and their labels sequentially.
+ *
+ * @see ImmutableGraph#nodeIterator()
+ */
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final int from) {
+ class InternalArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ int curr = from - 1;
+ final int n = numNodes();
+ final int upperBound;
+
+ public InternalArcLabelledNodeIterator(final int upperBound) {
+ this.upperBound = upperBound;
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ return ++curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return curr < Math.min(n - 1, upperBound - 1);
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return ArcLabelledImmutableGraph.this.successors(curr);
+ }
+
+ @Override
+ public int outdegree() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return ArcLabelledImmutableGraph.this.outdegree(curr);
+ }
+
+ @Override
+ public ArcLabelledNodeIterator copy(final int upperBound) {
+ InternalArcLabelledNodeIterator result = new InternalArcLabelledNodeIterator(upperBound);
+ result.curr = curr;
+ return result;
+ }
+ };
+
+
+ return new InternalArcLabelledNodeIterator(Integer.MAX_VALUE);
+ }
+
+ @Override
+ public abstract ArcLabelledNodeIterator.LabelledArcIterator successors(int x);
+
+ /** Returns a prototype of the labels used by this graph. The prototype can be
+ * used to produce new copies, but must not be modified by the caller.
+ *
+ * @return a prototype for the labels of this graph.
+ */
+ public abstract Label prototype();
+
+ /** Returns a reference to an array containing the labels of the arcs going out of a given node
+ * in the same order as the order in which the corresponding successors are returned by {@link #successors(int)}.
+ *
+ * <P>The returned array may contain more entries than the outdegree of <code>x</code>.
+ * However, only those with indices from 0 (inclusive) to the outdegree of <code>x</code> (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwrap the iterator returned by {@link #successors(int)} and
+ * writes in a newly allocated array copies of the labels returned by {@link LabelledArcIterator#label()}.
+ *
+ * @return an array whose first elements are the labels of the arcs going
+ * out of <code>x</code>; the array must not be modified by the caller.
+ */
+
+ public Label[] labelArray(int x) {
+ return ArcLabelledNodeIterator.unwrap(successors(x), outdegree(x));
+ }
+
+ @Deprecated
+ public static ArcLabelledImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadSequential(basename);
+ }
+
+ @Deprecated
+ public static ArcLabelledImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadSequential(basename, pl);
+ }
+
+ public static ArcLabelledImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadOffline(basename);
+ }
+
+ public static ArcLabelledImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadOffline(basename, pl);
+ }
+
+ public static ArcLabelledImmutableGraph load(CharSequence basename) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.load(basename);
+ }
+
+ public static ArcLabelledImmutableGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.load(basename, pl);
+ }
+
+ public static ArcLabelledImmutableGraph loadOnce(InputStream is) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadOnce(is);
+ }
+
+ @Override
+ public String toString() {
+ final StringBuilder s = new StringBuilder();
+
+ long numArcs = -1;
+ try {
+ numArcs = numArcs();
+ }
+ catch(UnsupportedOperationException ignore) {}
+
+ s.append("Nodes: " + numNodes() + "\nArcs: " + (numArcs == -1 ? "unknown" : Long.toString(numArcs)) + "\n");
+
+ final ArcLabelledNodeIterator nodeIterator = nodeIterator();
+ ArcLabelledNodeIterator.LabelledArcIterator successors;
+ int curr;
+ for (int i = numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextInt();
+ s.append("Successors of " + curr + " (degree " + nodeIterator.outdegree() + "):");
+ successors = nodeIterator.successors();
+ int d = nodeIterator.outdegree();
+ while (d-- != 0) s.append(" " + successors.nextInt() + " [" + successors.label() + "]");
+ s.append('\n');
+ }
+ return s.toString();
+ }
+
+ @Override
+ public boolean equals(Object x) {
+ if (! (x instanceof ArcLabelledImmutableGraph)) return false;
+ ArcLabelledImmutableGraph g = (ArcLabelledImmutableGraph)x;
+ if (g.numNodes() != numNodes()) return false;
+ ArcLabelledNodeIterator nodeIterator = nodeIterator();
+ ArcLabelledNodeIterator gNodeIterator = g.nodeIterator();
+ while (nodeIterator.hasNext()) {
+ nodeIterator.nextInt(); gNodeIterator.nextInt();
+ if (nodeIterator.outdegree() != gNodeIterator.outdegree()) return false;
+ LabelledArcIterator arcIterator = nodeIterator.successors();
+ LabelledArcIterator gArcIterator = gNodeIterator.successors();
+ int d = nodeIterator.outdegree();
+ while (d-- != 0) {
+ if (arcIterator.nextInt() != gArcIterator.nextInt()
+ || ! arcIterator.label().equals(gArcIterator.label())) return false;
+ }
+ }
+ return true;
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableSequentialGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableSequentialGraph.java
new file mode 100644
index 0000000..be75196
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledImmutableSequentialGraph.java
@@ -0,0 +1,58 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+/** An abstract arc-labelled immutable graph that throws an {@link java.lang.UnsupportedOperationException}
+ * on all random-access methods.
+ *
+ * <p>The main purpose of this class is to be used as a base for the numerous anonymous
+ * classes that do not support random access.
+ */
+
+public abstract class ArcLabelledImmutableSequentialGraph extends ArcLabelledImmutableGraph {
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public int[] successorArray(final int x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public Label[] labelArray(final int x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public int outdegree(final int x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(int x) {
+ if (x == 0) return nodeIterator();
+ throw new UnsupportedOperationException();
+ }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public LabelledArcIterator successors(int x) { throw new UnsupportedOperationException(); }
+ /** Returns false.
+ * @return false.
+ */
+ @Override
+ public boolean randomAccess() { return false; }
+
+ /** Throws an {@link UnsupportedOperationException}. */
+ @Override
+ public ArcLabelledImmutableGraph copy() { throw new UnsupportedOperationException(); }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledNodeIterator.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledNodeIterator.java
new file mode 100644
index 0000000..a984667
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcLabelledNodeIterator.java
@@ -0,0 +1,103 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+/** An iterator returning nodes, their successors and labels on the arcs.
+ *
+ * <p>The purpose of this abstract implementation is to override covariantly
+ * the return type of of {@link NodeIterator#successors()}, so that
+ * it has to be a {@link ArcLabelledNodeIterator.LabelledArcIterator}, and provide a general
+ * implementation of a new {@link #labelArray()} method that returns
+ * the labels of the arcs going out of the current node as an array.
+ */
+public abstract class ArcLabelledNodeIterator extends NodeIterator {
+
+ /** An iterator returning successor and the labels of the arcs toward them.
+ * The label can be accessed through {@link #label()}, which must be called just after
+ * advancing the iterator.
+ *
+ * <p><strong>Warning</strong>: the returned label can be the same object
+ * upon several calls to {@link #label()}; if you need to store it,
+ * you should {@linkplain Label#copy() copy it}.
+ */
+ public interface LabelledArcIterator extends LazyIntIterator {
+ /** The label of arc leading to the last returned successor.
+ *
+ * @return the label of arc leading to the last returned successor.
+ */
+ public Label label();
+ }
+
+ @Override
+ public abstract ArcLabelledNodeIterator.LabelledArcIterator successors();
+
+ /** Returns a reference to an array containing the labels of the arcs going out of the current node
+ * in the same order as the order in which the corresponding successors are returned by {@link #successors()}.
+ *
+ * <P>The returned array may contain more entries than the outdegree of the current node.
+ * However, only those with indices from 0 (inclusive) to the outdegree of the current node (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwrap the iterator returned by {@link #successors()} and
+ * writes in a newly allocated array copies of the labels returned by {@link LabelledArcIterator#label()}.
+ *
+ * @return an array whose first elements are the labels of the arcs going
+ * out of the current node; the array must not be modified by the caller.
+ */
+
+ public Label[] labelArray() {
+ return unwrap(successors(), outdegree());
+ }
+
+ /** Returns a new array of labels filled with exactly <code>howMany</code> labels from the given iterator.
+ * Note that the iterator is required to have at least as many labels as needed.
+ *
+ * @param iterator the iterator.
+ * @param howMany the number of labels.
+ * @return the new array where labels are copied.
+ */
+ protected static Label[] unwrap(final ArcLabelledNodeIterator.LabelledArcIterator iterator, final int howMany) {
+ final Label[] result = new Label[howMany];
+ for (int i = 0; i < howMany; i++) {
+ iterator.nextInt();
+ result[i] = iterator.label().copy();
+ }
+ return result;
+ }
+
+ /** Creates a copy of this iterator that will never return nodes &ge; the specified bound; the copy
+ * must be accessible by a different thread. Optional operation (it should be implemented by all classes that allow
+ * to scan the graph more than once).
+ *
+ * <p>This implementation just throws an {@link UnsupportedOperationException}. It should be kept
+ * in sync with the result of {@link ImmutableGraph#hasCopiableIterators()}.
+ *
+ * @param upperBound the upper bound.
+ * @return a copy of this iterator, with the given upper bound.
+ */
+ @Override
+ public ArcLabelledNodeIterator copy(int upperBound) {
+ throw new UnsupportedOperationException();
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcRelabelledImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcRelabelledImmutableGraph.java
new file mode 100644
index 0000000..e81e5f3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/ArcRelabelledImmutableGraph.java
@@ -0,0 +1,248 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.AbstractLazyIntIterator;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Properties;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Exhibits an arc-labelled immutable graph as another arc-labelled immutable graph changing only
+ * the kind of labels. Labels of the source graphs are mapped to labels
+ * of the exhibited graph via a suitable strategy provided at construction time.
+ */
+public class ArcRelabelledImmutableGraph extends ArcLabelledImmutableGraph {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(ArcRelabelledImmutableGraph.class);
+
+ /** A way to convert a label into another label.
+ */
+ public static interface LabelConversionStrategy {
+ /** Takes a label <code>from</code> and writes its content into another label <code>to</code>.
+ * If the types of labels are incompatible, or unapt for this strategy, an {@link IllegalArgumentException}
+ * or a {@link ClassCastException} will be thrown.
+ *
+ * @param from source label.
+ * @param to target label.
+ * @param source the source node of the arc labelled by the two labels.
+ * @param target the target node of the arc labelled by the two labels.
+ */
+ public void convert(Label from, Label to, int source, int target);
+ }
+
+ /** A conversion strategy that converts between any two classes extending {@link AbstractIntLabel}.
+ */
+ public static final LabelConversionStrategy INT_LABEL_CONVERSION_STRATEGY = new LabelConversionStrategy() {
+ @Override
+ public void convert(final Label from, final Label to, final int source, final int target) {
+ ((AbstractIntLabel)to).value = ((AbstractIntLabel)from).value;
+ }
+
+ };
+
+ /** The wrapped graph. */
+ private final ArcLabelledImmutableGraph wrappedGraph;
+ /** The new type of labels. */
+ private final Label newLabelPrototype;
+ /** The conversion strategy to be used. */
+ private final LabelConversionStrategy conversionStrategy;
+
+ /** Creates a relabelled graph with given label prototype.
+ *
+ * @param wrappedGraph the graph we are going to relabel.
+ * @param newLabelPrototype the prototype for the new type of labels.
+ * @param conversionStrategy the strategy to convert the labels of the wrapped graph into the new labels.
+ */
+ public ArcRelabelledImmutableGraph(final ArcLabelledImmutableGraph wrappedGraph, final Label newLabelPrototype, final LabelConversionStrategy conversionStrategy) {
+ this.wrappedGraph = wrappedGraph;
+ this.newLabelPrototype = newLabelPrototype;
+ this.conversionStrategy = conversionStrategy;
+ }
+
+ @Override
+ public ArcRelabelledImmutableGraph copy() {
+ return new ArcRelabelledImmutableGraph(wrappedGraph.copy(), newLabelPrototype.copy(), conversionStrategy);
+ }
+
+ private final class RelabelledArcIterator extends AbstractLazyIntIterator implements LabelledArcIterator {
+ /** The wrapped arc iterator. */
+ private final LabelledArcIterator wrappedArcIterator;
+ /** The source node of the current {@link #wrappedArcIterator}. */
+ private final int source;
+ /** The target of the current arc. */
+ private int target;
+
+ public RelabelledArcIterator(final LabelledArcIterator wrappedArcIterator, final int source) {
+ this.wrappedArcIterator = wrappedArcIterator;
+ this.source = source;
+ }
+
+ @Override
+ public Label label() {
+ conversionStrategy.convert(wrappedArcIterator.label(), newLabelPrototype, source, target);
+ return newLabelPrototype;
+ }
+
+ @Override
+ public int nextInt() {
+ return target = wrappedArcIterator.nextInt();
+ }
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final int from) {
+ class InternalArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ /** The current node. */
+ private int current;
+ private final int upperBound;
+ ArcLabelledNodeIterator wrappedNodeIterator = wrappedGraph.nodeIterator(from);
+
+ public InternalArcLabelledNodeIterator(final int upperBound) {
+ current = -1;
+ this.upperBound = upperBound;
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ return new RelabelledArcIterator(wrappedNodeIterator.successors(), current);
+ }
+
+ @Override
+ public int outdegree() {
+ return wrappedNodeIterator.outdegree();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return current + 1 < upperBound && wrappedNodeIterator.hasNext();
+ }
+
+ @Override
+ public int nextInt() {
+ return current = wrappedNodeIterator.nextInt();
+ }
+
+ @Override
+ public ArcLabelledNodeIterator copy(final int upperBound) {
+ InternalArcLabelledNodeIterator result = new InternalArcLabelledNodeIterator(upperBound);
+ result.current = current;
+ result.wrappedNodeIterator = wrappedNodeIterator.copy(upperBound);
+ return result;
+ }
+
+ };
+ return new InternalArcLabelledNodeIterator(Integer.MAX_VALUE); }
+
+ @Override
+ public LabelledArcIterator successors(int x) {
+ return new RelabelledArcIterator(wrappedGraph.successors(x), x);
+ }
+
+ @Override
+ public Label prototype() {
+ return newLabelPrototype;
+ }
+
+ @Override
+ public int numNodes() {
+ return wrappedGraph.numNodes();
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return wrappedGraph.randomAccess();
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return wrappedGraph.hasCopiableIterators();
+ }
+
+ @Override
+ public int outdegree(int x) {
+ return wrappedGraph.outdegree(x);
+ }
+
+ public static void main(String arg[]) throws JSAPException, IOException, IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, ClassNotFoundException, InstantiationException {
+ final SimpleJSAP jsap = new SimpleJSAP(ArcRelabelledImmutableGraph.class.getName(),
+ "Relabels a graph with given basename, with integer labels, saving it with a different basename and " +
+ "using another (typically: different) type of integer labels, specified via a spec, and possibly using " +
+ "a different kind of graph class.",
+ new Parameter[] {
+ new FlaggedOption("underlyingGraphClass", GraphClassParser.getParser(), BVGraph.class.getName(), JSAP.NOT_REQUIRED, 'u', "underlying-graph-class", "Forces a Java immutable graph class to be used for saving the underlying graph (if the latter did not exist before)."),
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), BitStreamArcLabelledImmutableGraph.class.getName(), JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java arc-labelled graph class to be used for saving."),
+ new UnflaggedOption("spec", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The label spec (e.g. FixedWidthIntLabel(FOO,10))."),
+ new UnflaggedOption("source", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source arc-labelled graph."),
+ new UnflaggedOption("target", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the target arc-labelled graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+ final Class<?> destClass = jsapResult.getClass("graphClass");
+ final Class<?> underlyingDestClass = jsapResult.getClass("underlyingGraphClass");
+ final String sourceBasename = jsapResult.getString("source");
+ final String targetBasename = jsapResult.getString("target");
+ final String spec = jsapResult.getString("spec");
+ final Label label = ObjectParser.fromSpec(new File(sourceBasename).getParent(), spec, Label.class);
+
+ ImmutableGraph source = ImmutableGraph.loadOffline(sourceBasename);
+ if (! (source instanceof ArcLabelledImmutableGraph)) throw new IllegalArgumentException("The graph " + sourceBasename + " of class " + sourceBasename.getClass().getName() + " is not arc-labelled");
+ ArcLabelledImmutableGraph labSource = (ArcLabelledImmutableGraph)source;
+
+ if (! (labSource.prototype() instanceof AbstractIntLabel && label instanceof AbstractIntLabel)) throw new IllegalArgumentException("Relabelling from command line is only allowed for int labels, not for " + labSource.prototype().getClass().getName() + " -> " + label.getClass().getName());
+ ArcLabelledImmutableGraph labTarget = new ArcRelabelledImmutableGraph(labSource, label, ArcRelabelledImmutableGraph.INT_LABEL_CONVERSION_STRATEGY);
+
+ ProgressLogger pl = new ProgressLogger(LOGGER);
+
+ Properties prop = new Properties();
+ prop.load(new FileInputStream(sourceBasename + ImmutableGraph.PROPERTIES_EXTENSION));
+ String underlyingBasename = prop.getProperty(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY); // Tries to get the underlying basename
+ if (underlyingBasename == null)
+ // If the underlying did not exist, we store it with a fixed basename variant
+ underlyingDestClass.getMethod("store", ImmutableGraph.class, CharSequence.class, ProgressLogger.class)
+ .invoke(null, labTarget, underlyingBasename = targetBasename + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX, pl);
+
+ destClass.getMethod("store", ArcLabelledImmutableGraph.class, CharSequence.class, CharSequence.class, ProgressLogger.class)
+ .invoke(null, labTarget, targetBasename, underlyingBasename, pl);
+
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/BitStreamArcLabelledImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/BitStreamArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..54e71d3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/BitStreamArcLabelledImmutableGraph.java
@@ -0,0 +1,532 @@
+package it.unimi.dsi.webgraph.labelling;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.util.Properties;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.objects.ObjectArrays;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+import it.unimi.dsi.webgraph.AbstractLazyIntIterator;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+/** A labelled graph storing its labels as a bit stream.
+ *
+ * <p>Instances of this class wrap a given {@linkplain ImmutableGraph immutable graph} and a bit stream.
+ * Given a prototype {@link Label}, the bit stream is then considered as containing all labels of all arcs
+ * as returned by a complete enumeration (made using {@link #nodeIterator()}). The overall graph is described
+ * by a <em>label file</em> (with extension
+ * <code>.labels</code>), an <em>offset file</em> (with extension
+ * <code>.labeloffsets</code>) and a <em>property file</em> (with extension
+ * <code>.properties</code>). The latter, not surprisingly, is a Java property file.
+ *
+ * <H2>The Label and Offset Files</H2>
+ *
+ * <P>Since the labels are stored as a bit stream, we must have some way to know where the labels
+ * related to the successors of each node start.
+ * This information is stored in the offset file, which contains the bit offset of the list of labels
+ * of the arcs going out of each node (in particular,
+ * the offset of the first list will be zero). As a commodity, the offset file contains an additional
+ * offset pointing just after the last list (providing, as a side-effect, the actual bit length of the label file).
+ * Each offset (except for the first one) is stored as a {@linkplain OutputBitStream#writeGamma(int) &gamma;-coded} difference from the previous offset.
+ *
+ * <H2>The Property File</H2>
+ *
+ * <p>The property file for an instance of this class must contain the following entries:
+ *
+ * <dl>
+ * <dt>graphclass
+ * <dd>the name of this class; it is necessary so that load methods in
+ * {@link ImmutableGraph} can identify this class;
+ * <dt>underlyinggraph
+ * <dd>the basename (relative to the name of the property file, unless it is absolute) of the underlying {@link ImmutableGraph};
+ * <dt>labelspec
+ * <dd>a string describing a constructor call for a label class; an example is
+ * <div style="margin:1em; text-align: center">
+ * <code>it.unimi.dsi.webgraph.labelling.FixedWidthIntLabel(FOO,10)</code>
+ * </div>
+ * parameters
+ * are separated by a comma, and no quoting or escaping is allowed (see {@link Label} for details
+ * about string-based constructors).
+ * </dl>
+ *
+ * <p>The {@link #load(it.unimi.dsi.webgraph.ImmutableGraph.LoadMethod, CharSequence, java.io.InputStream, ProgressLogger) load()}
+ * method of this class takes care of looking at the property file, loading the underlying immutable graph,
+ * and setting up either sequential or random access to the bit stream containing the labels. If
+ * just sequential access is required, the offsets are not loaded into memory, and if just offline
+ * access is required, bit stream is never loaded into memory.
+ *
+ * <h2>Saving labels</h2>
+ *
+ * <p>The {@link #store(ArcLabelledImmutableGraph, CharSequence, CharSequence)}
+ * and {@link #store(ArcLabelledImmutableGraph, CharSequence, CharSequence, ProgressLogger)}
+ * methods will save the labels of an instance of this graph as expected, that is,
+ * the bitstream and its offsets will be saved with the extensions described above.
+ */
+
+public class BitStreamArcLabelledImmutableGraph extends ArcLabelledImmutableGraph {
+ /** The standard extension for the labels bit stream. */
+ public static final String LABELS_EXTENSION = ".labels";
+ /** The standard extension for the label offsets bit stream. */
+ public static final String LABEL_OFFSETS_EXTENSION = ".labeloffsets";
+ /** The standard property key for a label specification. */
+ public static final String LABELSPEC_PROPERTY_KEY = "labelspec";
+
+ /** The buffer size we use for most operations. */
+ private static final int STD_BUFFER_SIZE = 1024 * 1024;
+ /** The underlying immutable graph. */
+ public final ImmutableGraph g;
+ /** A prototype label, used to deserialise labels and create copies. */
+ protected final Label prototype;
+
+ /** A byte array containing the label bit stream, or <code>null</code> for offline processing or for streams longer than {@link Integer#MAX_VALUE} bytes (see {@link #labelStream}). */
+ private final byte[] byteArray;
+ /** A multi-byte array input stream that replaces {@link #byteArray} for streams longer than {@link Integer#MAX_VALUE} bytes. */
+ private final FastMultiByteArrayInputStream labelStream;
+ /** The basename of this graph (required for offline access). */
+ protected final CharSequence basename;
+ /** The offset array, or <code>null</code> for sequential access. */
+ protected final EliasFanoMonotoneLongBigList offset;
+
+ /** Builds a new labelled graph using a bit stream of labels.
+ *
+ * @param basename the basename of the graph (mandatory for offline access).
+ * @param g the underlying immutable graph.
+ * @param prototype a label instance.
+ * @param byteArray a byte array containing the bit stream of labels, or <code>null</code> for offline access
+ * or large file access.
+ * @param labelStream if <code>byteArray</code> is <code>null</code>, this stream is used as the bit stream of labels.
+ * @param offset the offset array for random access, or <code>null</code>.
+ */
+ protected BitStreamArcLabelledImmutableGraph(CharSequence basename, ImmutableGraph g, Label prototype, byte[] byteArray, FastMultiByteArrayInputStream labelStream, EliasFanoMonotoneLongBigList offset) {
+ this.g = g;
+ this.byteArray = byteArray;
+ this.labelStream = labelStream;
+ this.prototype = prototype;
+ this.basename = basename;
+ this.offset = offset;
+ }
+
+ @Override
+ public BitStreamArcLabelledImmutableGraph copy() {
+ return new BitStreamArcLabelledImmutableGraph(basename, g.copy(), prototype.copy(), byteArray, labelStream, offset);
+ }
+
+ /** Returns the label bit stream.
+ *
+ * <p>This method takes care of creating the bit stream from the right source&mdash;the byte array,
+ * the stream of multiple byte arrays or the label file itself.
+ *
+ * @return the label bit stream.
+ */
+ protected InputBitStream newInputBitStream() throws FileNotFoundException {
+ return byteArray != null ? new InputBitStream(byteArray) :
+ labelStream != null ? new InputBitStream(new FastMultiByteArrayInputStream(labelStream)) :
+ new InputBitStream(basename + LABELS_EXTENSION);
+ }
+
+ @Override
+ public CharSequence basename() {
+ return basename;
+ }
+
+ /** Return the actual offset of the labels of the arcs going out of a given node.
+ *
+ * @param x a node.
+ * @return the offset of the labels of the arcs going out of <code>x</code>.
+ */
+ protected long offset(final int x) {
+ // Without offsets, we just give up.
+ return offset.getLong(x);
+ }
+
+ protected static class BitStreamLabelledArcIterator extends AbstractLazyIntIterator implements ArcLabelledNodeIterator.LabelledArcIterator {
+ final protected LazyIntIterator underlyingIterator;
+ final protected InputBitStream ibs;
+ final protected Label label;
+ final protected int from;
+
+ public BitStreamLabelledArcIterator(final BitStreamArcLabelledImmutableGraph alg, final int x) {
+ this.underlyingIterator = alg.g.successors(from = x);
+ try {
+ ibs = alg.newInputBitStream();
+ ibs.position(alg.offset(x));
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ label = alg.prototype.copy();
+ }
+
+ @Override
+ public Label label() {
+ return label;
+ }
+
+ @Override
+ public int nextInt() {
+ final int successor = underlyingIterator.nextInt();
+ if (successor == -1) return -1;
+ try {
+ label.fromBitStream(ibs, from);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return successor;
+ }
+ }
+
+ @Override
+ public ArcLabelledNodeIterator.LabelledArcIterator successors(final int x) {
+ return new BitStreamLabelledArcIterator(this, x);
+ }
+
+ @Override
+ public int[] successorArray(final int x) {
+ return g.successorArray(x);
+ }
+
+ @Override
+ public int numNodes() {
+ return g.numNodes();
+ }
+
+ @Override
+ public long numArcs() {
+ return g.numArcs();
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return g.randomAccess() && offset != null;
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return false;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ return g.outdegree(x);
+ }
+
+ @Deprecated
+ public static BitStreamArcLabelledImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, null);
+ }
+
+ @Deprecated
+ public static BitStreamArcLabelledImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, pl);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, null);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, pl);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph load(CharSequence basename) throws IOException {
+ return load(LoadMethod.STANDARD, basename, null);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.STANDARD, basename, pl);
+ }
+
+ /** Loads a labelled graph using the given method.
+ *
+ * @param method a load method.
+ * @param basename the basename of the graph.
+ * @param pl a progress logger.
+ * @return a graph labelled using a bit stream.
+ */
+
+ @SuppressWarnings("deprecation")
+ protected static BitStreamArcLabelledImmutableGraph load(LoadMethod method, CharSequence basename, ProgressLogger pl) throws IOException {
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ if (properties.getProperty(UNDERLYINGGRAPH_PROPERTY_KEY) == null) throw new IOException("The property file for " + basename + " does not contain an underlying graph basename");
+ // We resolve the underlying graph basename relatively to our basename
+ String graphName = properties.getProperty(UNDERLYINGGRAPH_PROPERTY_KEY);
+ // This is a workaround because absolute filenames are not correctly relativised
+ if (! (new File(graphName).isAbsolute())) graphName = new File(new File(basename.toString()).getParentFile(), properties.getProperty(UNDERLYINGGRAPH_PROPERTY_KEY)).toString();
+
+ final ImmutableGraph g;
+
+ // A kluge to pass the offset step down to a BVGraph
+
+ final FileInputStream graphPropertyFile = new FileInputStream(graphName + PROPERTIES_EXTENSION);
+ final Properties graphProperties = new Properties();
+ graphProperties.load(graphPropertyFile);
+ graphPropertyFile.close();
+
+ g = ImmutableGraph.load(method, graphName, null, pl);
+
+ // We parse the label spec and build a prototype
+ if (properties.getProperty(LABELSPEC_PROPERTY_KEY) == null) throw new IOException("The property file for " + basename + " does not contain a label specification");
+ Label prototype;
+ try {
+ try {
+ prototype = ObjectParser.fromSpec(new File(basename.toString()).getParentFile(), properties.getProperty(LABELSPEC_PROPERTY_KEY), Label.class);
+ }
+ catch(NoSuchMethodException e) {
+ prototype = ObjectParser.fromSpec(properties.getProperty(LABELSPEC_PROPERTY_KEY), Label.class);
+ }
+ }
+ catch (RuntimeException e) {
+ throw new RuntimeException(e);
+ }
+ catch (Exception e) {
+ throw new RuntimeException(e);
+ }
+
+ byte[] byteArray = null;
+ FastMultiByteArrayInputStream labelStream = null;
+ EliasFanoMonotoneLongBigList offsets = null;
+
+ if (method != LoadMethod.OFFLINE) {
+ if (pl != null) {
+ pl.itemsName = "bytes";
+ pl.start("Loading labels...");
+ }
+
+ final FileInputStream fis = new FileInputStream(basename + LABELS_EXTENSION);
+ final long size = fis.getChannel().size();
+ if (size <= Integer.MAX_VALUE) byteArray = BinIO.loadBytes(basename + LABELS_EXTENSION);
+ else labelStream = new FastMultiByteArrayInputStream(fis, size);
+
+ if (pl != null) {
+ pl.count = size;
+ pl.done();
+ }
+ // We do not load offsets if only sequential access is required.
+ if (method != LoadMethod.SEQUENTIAL) {
+ if (pl != null) {
+ pl.itemsName = "deltas";
+ pl.expectedUpdates = g.numNodes() + 1;
+ pl.start("Loading label offsets...");
+ }
+ final InputBitStream offsetStream = new InputBitStream(basename + LABEL_OFFSETS_EXTENSION);
+
+ offsets = new EliasFanoMonotoneLongBigList(g.numNodes() + 1, size * Byte.SIZE + 1, new LongIterator() {
+ private long off;
+ private int i;
+
+ @Override
+ public boolean hasNext() {
+ return i <= g.numNodes();
+ }
+ @Override
+ public long nextLong() {
+ i++;
+ try {
+ return off = offsetStream.readLongGamma() + off;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ });
+
+ offsetStream.close();
+ if (pl != null) {
+ pl.count = g.numNodes() + 1;
+ pl.done();
+ pl.logger().info("Label pointer bits per node: " + offsets.numBits() / (g.numNodes() + 1.0));
+ }
+ }
+
+ fis.close();
+ }
+
+ return new BitStreamArcLabelledImmutableGraph(basename, g, prototype, byteArray, labelStream, offsets);
+
+ }
+
+ private final static class BitStreamArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ final private static Label[] EMPTY_ARRAY = new Label[0];
+ final private NodeIterator underlyingNodeIterator;
+ final private InputBitStream ibs;
+ final private Label prototype;
+ private Label[] label = EMPTY_ARRAY;
+
+ public BitStreamArcLabelledNodeIterator(final int from, final ImmutableGraph g, final Label prototype, final InputBitStream ibs) {
+ this.prototype = prototype;
+ this.ibs = ibs;
+ underlyingNodeIterator = g.nodeIterator();
+ // Skip nodes up to from. This is necessary to skip labels, too.
+ for(int i = from; i-- != 0;) nextInt();
+ }
+
+ private final static class BitStreamArcLabelledNodeIteratorArcIterator extends AbstractLazyIntIterator implements ArcLabelledNodeIterator.LabelledArcIterator {
+ private final Label[] label;
+ private final int[] successor;
+ private final int outdegree;
+ private int curr;
+
+ public BitStreamArcLabelledNodeIteratorArcIterator(final int outdegree, final int[] successor, final Label[] label) {
+ this.outdegree = outdegree;
+ this.successor = successor;
+ this.label = label;
+ curr = -1;
+ }
+
+ @Override
+ public Label label() {
+ if (curr == -1) throw new IllegalStateException("This successor iterator is currently not valid");
+ return label[curr];
+ }
+
+ @Override
+ public int nextInt() {
+ if (curr == outdegree - 1) return -1;
+ return successor[++curr];
+ }
+
+ @Override
+ public int skip(final int n) {
+ final int toSkip = Math.min(n, outdegree - 1 - curr);
+ curr += toSkip;
+ return toSkip;
+ }
+ }
+
+
+ @Override
+ public ArcLabelledNodeIterator.LabelledArcIterator successors() {
+ return new BitStreamArcLabelledNodeIteratorArcIterator(underlyingNodeIterator.outdegree(), underlyingNodeIterator.successorArray(), label);
+ }
+
+ @Override
+ public int[] successorArray() {
+ return underlyingNodeIterator.successorArray();
+ }
+
+ @Override
+ public Label[] labelArray() {
+ return label;
+ }
+
+ @Override
+ public int outdegree() {
+ return underlyingNodeIterator.outdegree();
+ }
+
+ @Override
+ public int nextInt() {
+ final int curr = underlyingNodeIterator.nextInt();
+ final int d = underlyingNodeIterator.outdegree();
+ // Store all labels of arcs going out of the current node
+ if (label.length < d) {
+ label = ObjectArrays.grow(label, d);
+ for(int i = label.length; i-- != 0 && label[i] == null;) label[i] = prototype.copy();
+ }
+ try {
+ for(int i = 0; i < d; i++) label[i].fromBitStream(ibs, curr);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return underlyingNodeIterator.hasNext();
+ }
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final int from) {
+ try {
+ return new BitStreamArcLabelledNodeIterator(from, g, prototype, newInputBitStream());
+ }
+ catch (FileNotFoundException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+ public static void store(final ArcLabelledImmutableGraph graph, final CharSequence basename, final CharSequence underlyingBasename) throws IOException {
+ store(graph, basename, underlyingBasename, null);
+ }
+
+ public static void store(final ArcLabelledImmutableGraph graph, final CharSequence basename, final CharSequence underlyingBasename, final ProgressLogger pl) throws IOException {
+ final OutputBitStream labels = new OutputBitStream(basename + LABELS_EXTENSION, STD_BUFFER_SIZE);
+ final OutputBitStream offsets = new OutputBitStream(basename + LABEL_OFFSETS_EXTENSION, STD_BUFFER_SIZE);
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Saving labels...");
+ }
+
+ final ArcLabelledNodeIterator nodeIterator = graph.nodeIterator();
+ offsets.writeGamma(0);
+ int curr;
+ long count;
+ LabelledArcIterator successors;
+
+ while(nodeIterator.hasNext()) {
+ curr = nodeIterator.nextInt();
+ successors = nodeIterator.successors();
+ count = 0;
+ while(successors.nextInt() != -1) count += successors.label().toBitStream(labels, curr);
+ offsets.writeLongGamma(count);
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+ labels.close();
+ offsets.close();
+
+ final PrintWriter properties = new PrintWriter(new FileOutputStream(basename + ImmutableGraph.PROPERTIES_EXTENSION));
+ properties.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ properties.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + underlyingBasename);
+ properties.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + graph.prototype().toSpec());
+ properties.close();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/FixedWidthIntLabel.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/FixedWidthIntLabel.java
new file mode 100644
index 0000000..6391c8b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/FixedWidthIntLabel.java
@@ -0,0 +1,99 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+
+/** An integer represented in fixed width. The provided width must
+ * be smaller than 32.
+ */
+
+public class FixedWidthIntLabel extends AbstractIntLabel {
+ /** The bit width used to represent the value of this label. */
+ protected final int width;
+
+ /** Creates a new fixed-width int label.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ * @param value the value of this label.
+ */
+ public FixedWidthIntLabel(String key, int width, int value) {
+ super(key, value);
+ if (width < 0 || width > 31) throw new IllegalArgumentException("Width out of range: " + width);
+ if (value < 0 || value >= 1L << width) throw new IllegalArgumentException("Value out of range: " + Integer.toString(value));
+ this.width = width;
+ }
+
+ /** Creates a new fixed-width int label of value 0.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ */
+ public FixedWidthIntLabel(String key, int width) {
+ this(key, width, 0);
+ }
+
+ /** Creates a new fixed-width integer label using the given key and width
+ * with value 0.
+ *
+ * @param arg two strings containing the key and the width of this label.
+ */
+ public FixedWidthIntLabel(String... arg) {
+ this(arg[0], Integer.parseInt(arg[1]));
+ }
+
+ @Override
+ public Label copy() {
+ return new FixedWidthIntLabel(key, width, value);
+ }
+
+ @Override
+ public int fromBitStream(final InputBitStream inputBitStream, final int sourceUnused) throws IOException {
+ value = inputBitStream.readInt(width);
+ return width;
+ }
+
+ @Override
+ public int toBitStream(final OutputBitStream outputBitStream, final int sourceUnused) throws IOException {
+ return outputBitStream.writeInt(value, width);
+ }
+
+ /** Returns the width of this label (as provided at construction time).
+ * @return the width of this label.
+ */
+ @Override
+ public int fixedWidth() {
+ return width;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value + " (width:" + width + ")";
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + "," + width + ")";
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/FixedWidthIntListLabel.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/FixedWidthIntListLabel.java
new file mode 100644
index 0000000..b2eb7cc
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/FixedWidthIntListLabel.java
@@ -0,0 +1,106 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+/** A list of integers represented in fixed width. The provided width must
+ * be smaller than 32. Each list is prefixed by its length written
+ * in {@linkplain OutputBitStream#writeGamma(int) &gamma; coding}.
+ */
+
+public class FixedWidthIntListLabel extends AbstractIntListLabel {
+ /** The bit width used to represent the value of this label. */
+ private final int width;
+
+ /** Creates a new fixed-width int label.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ * @param value the value of this label.
+ */
+ public FixedWidthIntListLabel(String key, int width, int[] value) {
+ super(key, value);
+ if (width < 0 || width > 31) throw new IllegalArgumentException("Width out of range: " + width);
+ for(int i = value.length; i-- != 0;) if (value[i] < 0 || value[i] >= 1L << width) throw new IllegalArgumentException("Value out of range: " + Integer.toString(value[i]));
+ this.width = width;
+ }
+
+ /** Creates a new fixed-width label with an empty list.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ */
+ public FixedWidthIntListLabel(String key, int width) {
+ this(key, width, IntArrays.EMPTY_ARRAY);
+ }
+
+ /** Creates a new fixed-width integer label using the given key and width
+ * with an empty list.
+ *
+ * @param arg two strings containing the key and the width of this label.
+ */
+ public FixedWidthIntListLabel(String... arg) {
+ this(arg[0], Integer.parseInt(arg[1]));
+ }
+
+ @Override
+ public Label copy() {
+ return new FixedWidthIntListLabel(key, width, value.clone());
+ }
+
+ @Override
+ public int fromBitStream(InputBitStream inputBitStream, final int sourceUnused) throws IOException {
+ long readBits = inputBitStream.readBits();
+ value = new int[inputBitStream.readGamma()];
+ for(int i = 0; i < value.length; i++) value[i] = inputBitStream.readInt(width);
+ return (int)(inputBitStream.readBits() - readBits);
+ }
+
+ @Override
+ public int toBitStream(OutputBitStream outputBitStream, final int sourceUnused) throws IOException {
+ int bits = outputBitStream.writeGamma(value.length);
+ for(int i = 0; i < value.length; i++) bits += outputBitStream.writeInt(value[i], width);
+ return bits;
+ }
+
+ /** Returns -1 (the fixed width refers to a single integer, not to the entire list).
+ * @return -1;
+ */
+ @Override
+ public int fixedWidth() {
+ return -1;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + Arrays.toString(value) + " (width:" + width + ")";
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + "," + width + ")";
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/GammaCodedIntLabel.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/GammaCodedIntLabel.java
new file mode 100644
index 0000000..a9e5896
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/GammaCodedIntLabel.java
@@ -0,0 +1,99 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+
+/** A natural number represented in {@linkplain OutputBitStream#writeGamma(int) &gamma; coding}. */
+
+public class GammaCodedIntLabel extends AbstractIntLabel {
+
+ /** Creates a new label with given key and value.
+ *
+ * @param key the (only) key.
+ * @param value the value of this label.
+ */
+ public GammaCodedIntLabel(String key, int value) {
+ super(key, value);
+ if (value < 0) throw new IllegalArgumentException("Value cannot be negative: " + value);
+ }
+
+ /** Creates a new &gamma;-coded label using the given key and value 0.
+ *
+ * @param key one string containing the key of this label.
+ */
+ public GammaCodedIntLabel(String... key) {
+ super(key[0], 0);
+ }
+
+ @Override
+ public GammaCodedIntLabel copy() {
+ return new GammaCodedIntLabel(key, value);
+ }
+
+ /** Fills this label {@linkplain InputBitStream#readGamma() reading a &gamma;-coded natural number}
+ * from the given input bit stream.
+ *
+ * @param inputBitStream an input bit stream.
+ * @return the number of bits read to fill this lbael.
+ */
+
+ @Override
+ public int fromBitStream(InputBitStream inputBitStream, final int sourceUnused) throws IOException {
+ long prevRead = inputBitStream.readBits();
+ value = inputBitStream.readGamma();
+ return (int)(inputBitStream.readBits() - prevRead);
+ }
+
+ /** Writes this label {@linkplain OutputBitStream#writeGamma(int) as a &gamma;-coded natural number}
+ * to the given output bit stream.
+ *
+ * @param outputBitStream an output bit stream.
+ * @return the number of bits written.
+ */
+
+ @Override
+ public int toBitStream(OutputBitStream outputBitStream, final int sourceUnused) throws IOException {
+ return outputBitStream.writeGamma(value);
+ }
+
+ /** Returns -1 (as this label has not a fixed width).
+ * @return -1.
+ */
+
+ @Override
+ public int fixedWidth() {
+ return -1;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value + " (gamma)";
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + ")";
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/IntegerLabelFilter.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/IntegerLabelFilter.java
new file mode 100644
index 0000000..c8922e6
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/IntegerLabelFilter.java
@@ -0,0 +1,63 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2008-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.webgraph.Transform.LabelledArcFilter;
+
+
+/** A filter for labelled graphs preserving those arcs whose integer labels are in a specified set.
+ *
+ * @author Sebastiano Vigna
+ *
+ */
+public class IntegerLabelFilter implements LabelledArcFilter {
+ /** The values of the label that will be preserved. */
+ private final IntOpenHashSet values;
+ /** The key to retrieve labels. If <code>null</code>, the well-known attribute will be retrieved. */
+ private final String key;
+
+ /** Creates a new integer-label filter.
+ *
+ * @param key the key to be queried to filter an arc, or the empty string to query the well-known attribute.
+ * @param value a list of values that will be preserved.
+ */
+
+ public IntegerLabelFilter(final String key, int... value) {
+ this.key = key;
+ values = new IntOpenHashSet(value);
+ }
+
+ /** Creates a new integer-label filter.
+ *
+ * @param keyAndvalues the key to be queried to filter an arc,
+ * or the empty string to query the well-known attribute, followed by a list of values that will be preserved.
+ */
+ public IntegerLabelFilter(final String... keyAndvalues) {
+ if (keyAndvalues.length == 0) throw new IllegalArgumentException("You must specificy a key name");
+ this.key = keyAndvalues[0].length() == 0 ? null : keyAndvalues[0];
+ values = new IntOpenHashSet(keyAndvalues.length);
+ for(int i = 1; i < keyAndvalues.length; i++) values.add(Integer.parseInt(keyAndvalues[i]));
+ }
+
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return values.contains(key == null ? label.getInt() : label.getInt(key));
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/Label.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/Label.java
new file mode 100644
index 0000000..5125129
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/Label.java
@@ -0,0 +1,290 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.FlyweightPrototype;
+import it.unimi.dsi.lang.ObjectParser;
+
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+/** A set of attributes that can be used to decorate a node or
+ * an arc of a graph. Attributes appear in the form of &lt;<var>key</var>,<var>value</var>&gt;
+ * pairs, where keys are of type {@link String}. Among attributes,
+ * one (called the <em>well-known attribute</em>), has a special status:
+ * its key can be obtained by using the {@link #wellKnownAttributeKey()} method.
+ *
+ * <p>Values associated to attributes can be anything: the value can be
+ * obtained (in the form of an object) with {@link #get(String)}.
+ * If the value is of primitive type, the alternative type-specific method
+ * (e.g., {@link #getInt(String)}, or {@link #getChar(String)}) can be
+ * called, with the proviso that such methods may throw an {@link java.lang.IllegalArgumentException}
+ * if the attribute type can not be converted to the one specified without loss of information.
+ *
+ * <p>The value of the well-known attribute can be obtained with {@link #get()},
+ * or with the appropriate type-specific version of the method.
+ *
+ * <h2>Serialisation</h2>
+ *
+ * <p>Implementations must provide {@link #toBitStream(OutputBitStream, int)} and {@link #fromBitStream(InputBitStream, int)}
+ * methods that serialise to a bitstream and deserialise to a bitstream a label, respectively. Since
+ * {@link #fromBitStream(InputBitStream, int)} has no length information, the label format must
+ * be self-delimiting. This can be obtained with a fixed length scheme (see, e.g., {@link FixedWidthIntLabel}),
+ * or using self-delimiting codes (see, e.g., {@link GammaCodedIntLabel}).
+ *
+ * <p>The methods {@link #toBitStream(OutputBitStream,int)}
+ * and {@link #fromBitStream(InputBitStream,int)} are given as an additional information the number of source
+ * node of the arc over which this label is put. They may use this information to decide how the
+ * label should be stored (typically, to do a more clever compression job).
+ *
+ * <p>The advantage of fixed-width labels (i.e., those for which {@link #fixedWidth()} does not return -1)
+ * is that when loading a {@link BitStreamArcLabelledImmutableGraph} with an offset step larger than 1 the position in the bitstream
+ * for the labels of a node can be calculated more quickly, as the computation just requires the outdegree
+ * of the nodes, whereas in general one has to skip in-between labels with an explicit deserialisation.
+ *
+ * <h2>String-based constructors</h2>
+ *
+ * <p>By convention, all concrete classes implementing this interface must follow the {@link ObjectParser} conventions:
+ * in particular, they must provide a constructor accepting strings (either in fixed or variable number) where the first string is the key.
+ * The constructor must perform data validation and build an instance with a default value (e.g., 0 for numerical labels). The
+ * constructor is used, for instance, by {@link BitStreamArcLabelledImmutableGraph} to instantiate a label prototype.
+ * Finally, the method {@link #toSpec()} must return a string that is accepted by {@link ObjectParser}.
+ */
+
+
+public interface Label extends FlyweightPrototype<Label> {
+ /** Returns the well-known attribute key.
+ *
+ * @return the well-known attribute key.
+ */
+ public String wellKnownAttributeKey();
+
+ /** All attribute keys (in arbitrary order).
+ *
+ * @return the keys of all attributes.
+ */
+ public String[] attributeKeys();
+
+ /** The types of all attributes in the same order as they are returned by {@link #attributeKeys()}.
+ *
+ * @return the type of all attributes.
+ */
+ public Class<?>[] attributeTypes();
+
+ /** The value associated to the attribute with given key.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws NoSuchElementException if the attribute key is not one of the attributes of this label.
+ */
+ public Object get(String key) throws NoSuchElementException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a byte.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public byte getByte(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a short.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public short getShort(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a int.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public int getInt(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a long.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public long getLong(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a float.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public float getFloat(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a double.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public double getDouble(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a char.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public char getChar(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a boolean.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public boolean getBoolean(String key) throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ */
+ public Object get() throws NoSuchElementException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a byte.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public byte getByte() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a short.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public short getShort() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a int.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public int getInt() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a long.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public long getLong() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a float.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public float getFloat() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a double.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public double getDouble() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a char.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public char getChar() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a boolean.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public boolean getBoolean() throws IllegalArgumentException;
+
+ /** Returns a copy of this label.
+ *
+ * @return a new label that copies this one.
+ */
+ @Override
+ public Label copy();
+
+ /** Returns a string representing the specification of this label.
+ *
+ * <p>Each label class can be instantiated in several ways (e.g., {@link FixedWidthIntLabel}
+ * requires a name for the well-known attribute and a number of bits). This method must return
+ * a representation that can be used by {@link ObjectParser} to instantiate the class, and
+ * consequently there <strong>must</strong> exist a matching constructor whose arguments are strings.
+ *
+ * <p>There is an equation that must be always satisfied:
+ * <pre style="text-align:center; padding: .5em">
+ * ObjectParser.fromSpec(x.toSpec()).toSpec().equals(x.toSpec())
+ * </pre>
+ * @return a string representing the specification of this label.
+ * @see ObjectParser#fromSpec(String, Class)
+ */
+ public String toSpec();
+
+ /** Fills this label with data from the given input bit stream, knowing the source node of the arc.
+ * If {@link #fixedWidth()} is not negative, the value returned must coincide with {@link #fixedWidth()}.
+ * This method is optional.
+ *
+ * @param inputBitStream an input bit stream offering a label.
+ * @param source the source node.
+ * @return the number of bits read to fill this label.
+ */
+ public int fromBitStream(InputBitStream inputBitStream, int source) throws IOException, UnsupportedOperationException;
+
+ /** Writes out this label to the given input bit stream, in self-delimiting form, knowing the source node of the arc.
+ * If {@link #fixedWidth()} is not negative, the value returned must coincide with {@link #fixedWidth()}.
+ * This method is optional.
+ *
+ * @param outputBitStream an output bit stream where the label will be written.
+ * @param source the source node.
+ * @return the number of bits written.
+ */
+ public int toBitStream(OutputBitStream outputBitStream, int source) throws IOException, UnsupportedOperationException;
+
+ /** Returns the fixed length of this label, in bits, if this label has fixed width.
+ *
+ * @return the fixed length of this label, or -1 if this label has not fixed width.
+ */
+ public int fixedWidth();
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/LabelMergeStrategy.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/LabelMergeStrategy.java
new file mode 100644
index 0000000..9495b57
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/LabelMergeStrategy.java
@@ -0,0 +1,44 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A way to merge two labels into one; the actual merge is performed by the {@link #merge(Label, Label)}
+ * method. Usually, strategies require that the two labels provided are of
+ * the same kind (i.e., instances of the same {@link it.unimi.dsi.webgraph.labelling.Label}
+ * class). Moreover, some strategies only accept label of a certain type,
+ * and throw an {@link java.lang.IllegalArgumentException} if the type
+ * is wrong.
+ *
+ */
+public interface LabelMergeStrategy {
+
+ /** Merges two given labels; either label may be <code>null</code>, but not
+ * both. Implementing classes may decide to throw an {@link IllegalArgumentException}
+ * if the labels provided are not of the same type, or not of a
+ * specific type.
+ *
+ * @param first the first label to be merged.
+ * @param second the second label to be merged.
+ * @return the resulting label (note that the returned label may be reused by the
+ * implementing class, so users are invited to make a {@link Label#copy()}
+ * of it if they need to keep the label in between calls).
+ */
+ public Label merge(Label first, Label second);
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/LabelSemiring.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/LabelSemiring.java
new file mode 100644
index 0000000..c4c607b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/LabelSemiring.java
@@ -0,0 +1,80 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2008-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.webgraph.Transform;
+
+/** A semiring used to compose labels.
+ * <p>When {@linkplain Transform#compose(it.unimi.dsi.webgraph.ImmutableGraph, it.unimi.dsi.webgraph.ImmutableGraph) composing}
+ * two labelled graphs, we need a way to combine labels along a path, and a way to combine labels from different
+ * paths connecting two nodes. These two operations are implemented by
+ * {@link #multiply(Label, Label)} and {@link #add(Label, Label)}. The name of the two
+ * methods are due to the fact that their operations must define a <em>semiring</em>
+ * for which you must also provide a {@link #zero()} and a {@link #one()}. For instance,
+ * if a graph is labelled with weights, a semiring implementing {@link #multiply(Label, Label)} by
+ * a standard sum and {@link #add(Label, Label)} using the minimum operator will give a composition
+ * strategy that computes the shortest path connecting two nodes.
+ *
+ * <p>Usually, strategies require that the two labels provided are of
+ * the same kind (i.e., instances of the same {@link it.unimi.dsi.webgraph.labelling.Label}
+ * class). Moreover, some strategies only accept label of a certain type,
+ * and throw an {@link java.lang.IllegalArgumentException} if the type
+ * is wrong.
+ */
+public interface LabelSemiring {
+
+ /** Multiply two given labels; either label may be <code>null</code>, but not
+ * both. Implementing classes may decide to throw an {@link IllegalArgumentException}
+ * if the labels provided are not of the same type, or not of a
+ * specific type.
+ *
+ * @param first the first label to be multiplied.
+ * @param second the second label to be multiplied.
+ * @return the resulting label (note that the returned label may be reused by the
+ * implementing class, so users are invited to make a {@link Label#copy()}
+ * of it if they need to keep the label in between calls).
+ */
+ public Label multiply(Label first, Label second);
+
+ /** Adds two given labels; either label may be <code>null</code>, but not
+ * both. Implementing classes may decide to throw an {@link IllegalArgumentException}
+ * if the labels provided are not of the same type, or not of a
+ * specific type.
+ *
+ * @param first the first label to be added.
+ * @param second the second label to be added.
+ * @return the resulting label (note that the returned label may be reused by the
+ * implementing class, so users are invited to make a {@link Label#copy()}
+ * of it if they need to keep the label in between calls).
+ */
+ public Label add(Label first, Label second);
+
+ /** Returns the zero of {@link #add(Label, Label)}.
+ *
+ * @return the zero of {@link #add(Label, Label)}.
+ */
+ public Label zero();
+
+ /** Returns the one of {@link #multiply(Label, Label)}.
+ *
+ * @return the one of {@link #multiply(Label, Label)}.
+ */
+ public Label one();
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/Labels.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/Labels.java
new file mode 100644
index 0000000..78900b0
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/Labels.java
@@ -0,0 +1,32 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+public class Labels {
+
+ /** A strategy that keeps the first label, if present, and the second only
+ * if the first is not present.
+ */
+ public static final LabelMergeStrategy KEEP_FIRST_MERGE_STRATEGY = new LabelMergeStrategy() {
+ @Override
+ public Label merge(Label first, Label second) {
+ return first != null? first : second;
+ }
+ };
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/UnionArcLabelledImmutableGraph.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/UnionArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..595e803
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/UnionArcLabelledImmutableGraph.java
@@ -0,0 +1,350 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.util.Arrays;
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.objects.ObjectArrays;
+import it.unimi.dsi.fastutil.objects.ObjectIterators;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.UnionImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** An arc-labelled immutable graph representing the union of two given such graphs.
+ * Here by &ldquo;union&rdquo; we mean that an arc will belong to the union iff it belongs to at least one of the two graphs (the number of
+ * nodes of the union is taken to be the maximum among the number of nodes of each graph). Labels are assumed to have the same
+ * prototype in both graphs, and are treated as follows: if an arc is present in but one graph, its label in the resulting
+ * graph is going to be the label of the arc in the graph where it comes from; if an arc is present in both graphs, the labels
+ * are combined using a provided {@link LabelMergeStrategy}.
+ *
+ * <h2>Remarks about the implementation</h2>
+ *
+ * <p>Due to the lack of multiple inheritance, we could not extend both {@link UnionImmutableGraph}
+ * and {@link ArcLabelledImmutableGraph}, hence we forcedly decided to extend the latter. The possibility of using delegation
+ * on the former was also discarded because the code for reading and merging labels is so tightly coupled with the rest that it
+ * would have been essentially useless (and even dangerous) to delegate the iteration methods. As a result, some of the code of this
+ * class is actually almost a duplicate of the code of {@link UnionImmutableGraph}.
+ */
+public class UnionArcLabelledImmutableGraph extends ArcLabelledImmutableGraph {
+ @SuppressWarnings("unused")
+ private static final Logger LOGGER = LoggerFactory.getLogger(Transform.class);
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ private static final boolean ASSERTS = false;
+
+ private static final int INITIAL_ARRAY_SIZE = 16;
+
+ private final ArcLabelledImmutableGraph g0, g1;
+ private final int n0, n1, numNodes;
+
+ /** The strategy used to merge labels when the same arc is present in both graphs. */
+ private final LabelMergeStrategy labelMergeStrategy;
+
+ /** The node whose successors are cached, or -1 if no successors are currently cached. */
+ private int cachedNode = -1;
+
+ /** The outdegree of the cached node, if any. */
+ private int outdegree ;
+
+ /** The successors of the cached node, if any; note that the array might be larger. */
+ private int cache[];
+
+ /** The labels on the arcs going out of the cached node, if any; note that the array might be larger. */
+ private Label labelCache[];
+ /** The prototype for the labels of this graph. */
+ private final Label prototype;
+
+ @Override
+ public UnionArcLabelledImmutableGraph copy() {
+ return new UnionArcLabelledImmutableGraph(g0.copy(), g1.copy(), labelMergeStrategy);
+ }
+
+ /** Creates the union of two given graphs.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @param labelMergeStrategy the strategy used to merge labels when the same arc is present in both graphs.
+ */
+ public UnionArcLabelledImmutableGraph(ArcLabelledImmutableGraph g0, ArcLabelledImmutableGraph g1, LabelMergeStrategy labelMergeStrategy) {
+ this.g0 = g0;
+ this.g1 = g1;
+ this.labelMergeStrategy = labelMergeStrategy;
+ n0 = g0.numNodes();
+ n1 = g1.numNodes();
+ numNodes = Math.max(n0, n1);
+ if (g0.prototype().getClass() != g1.prototype().getClass()) throw new IllegalArgumentException("The two graphs have different label classes (" + g0.prototype().getClass().getSimpleName() + ", " +g1.prototype().getClass().getSimpleName() + ")");
+ prototype = g0.prototype();
+ }
+
+
+ private static class InternalNodeIterator extends ArcLabelledNodeIterator {
+ private final static Label[] EMPTY_LABEL_ARRAY = new Label[0];
+ /** If outdegree is nonnegative, the successors of the current node (this array may be, however, larger). */
+ private int cache[];
+ /** If outdegree is nonnegative, the labels on the arcs going out of the current node (this array may be, however, larger). */
+ private Label labelCache[];
+ /** The outdegree of the current node, or -1 if the successor array for the current node has not been computed yet. */
+ private int outdegree = -1;
+ private ArcLabelledNodeIterator i0;
+ private ArcLabelledNodeIterator i1;
+ private final LabelMergeStrategy labelMergeStrategy;
+
+ public InternalNodeIterator(final ArcLabelledNodeIterator i0, final ArcLabelledNodeIterator i1, final LabelMergeStrategy labelMergeStrategy) {
+ this(i0, i1, labelMergeStrategy, -1, IntArrays.EMPTY_ARRAY, EMPTY_LABEL_ARRAY);
+ }
+
+ public InternalNodeIterator(final ArcLabelledNodeIterator i0, final ArcLabelledNodeIterator i1, final LabelMergeStrategy labelMergeStrategy, final int outdegree, final int[] cache, final Label[] labelCache) {
+ this.i0 = i0;
+ this.i1 = i1;
+ this.labelMergeStrategy = labelMergeStrategy;
+ this.outdegree = outdegree;
+ this.cache = cache;
+ this.labelCache = labelCache;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return i0 != null && i0.hasNext() || i1 != null && i1.hasNext();
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ outdegree = -1;
+ int result = -1;
+ if (i0 != null) {
+ if (i0.hasNext()) result = i0.nextInt();
+ else i0 = null;
+ }
+ if (i1 != null) {
+ if (i1.hasNext()) result = i1.nextInt();
+ else i1 = null;
+ }
+ return result;
+ }
+
+ @Override
+ public int[] successorArray() {
+ if (outdegree != -1) return cache;
+ if (i0 == null) {
+ outdegree = i1.outdegree();
+ cache = i1.successorArray();
+ labelCache = i1.labelArray();
+ return cache;
+ }
+ if (i1 == null) {
+ outdegree = i0.outdegree();
+ cache = i0.successorArray();
+ labelCache = i0.labelArray();
+ return cache;
+ }
+ // We need to perform a manual merge
+ ArcLabelledNodeIterator.LabelledArcIterator succ0 = i0.successors();
+ ArcLabelledNodeIterator.LabelledArcIterator succ1 = i1.successors();
+ int s0 = -1, s1 = -1;
+ Label l0 = null, l1 = null;
+ outdegree = 0;
+ // Note that the parallel OR is necessary.
+ while ((s0 != -1 || (s0 = succ0.nextInt()) != -1) | (s1 != -1 || (s1 = succ1.nextInt()) != -1)) {
+ if (s0 != -1) l0 = succ0.label().copy();
+ if (s1 != -1) l1 = succ1.label().copy();
+ if (ASSERTS) assert s0 >= 0 || s1 >= 0;
+ cache = IntArrays.grow(cache, outdegree + 1);
+ labelCache = ObjectArrays.grow(labelCache, outdegree + 1);
+ if (s1 < 0 || 0 <= s0 && s0 < s1) {
+ cache[outdegree] = s0;
+ labelCache[outdegree] = l0;
+ s0 = -1;
+ } else if (s0 < 0 || 0 <= s1 && s1 < s0) {
+ cache[outdegree] = s1;
+ labelCache[outdegree] = l1;
+ s1 = -1;
+ } else {
+ if (ASSERTS) assert s0 == s1 && s0 >= 0;
+ cache[outdegree] = s0;
+ labelCache[outdegree] = labelMergeStrategy.merge(l0, l1);
+ s0 = s1 = -1;
+ }
+ outdegree++;
+ }
+ return cache;
+ }
+
+ @Override
+ public int outdegree() {
+ successorArray(); // So that the cache is filled up
+ return outdegree;
+ }
+
+ @Override
+ public Label[] labelArray() {
+ successorArray(); // So that the cache is filled up
+ return labelCache;
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ successorArray(); // So that the cache is filled up
+ return new LabelledArcIterator() {
+ int nextToBeReturned = 0;
+
+ @Override
+ public Label label() {
+ return labelCache[nextToBeReturned - 1];
+ }
+
+ @Override
+ public int nextInt() {
+ if (nextToBeReturned == outdegree) return -1;
+ return cache[nextToBeReturned++];
+ }
+
+ @Override
+ public int skip(int x) {
+ int skipped = Math.min(x, outdegree - nextToBeReturned);
+ nextToBeReturned += skipped;
+ return skipped;
+ }
+ };
+ }
+
+ @Override
+ public ArcLabelledNodeIterator copy(final int upperBound) {
+ return new InternalNodeIterator(i0 == null? i0 : i0.copy(upperBound), i1 == null? i1 : i1.copy(upperBound), labelMergeStrategy, outdegree, Arrays.copyOf(cache, Math.max(0, outdegree)), Arrays.copyOf(labelCache, Math.max(0, outdegree)));
+ }
+ };
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final int from) {
+ return new InternalNodeIterator(from < n0? g0.nodeIterator(from) : null, from < n1? g1.nodeIterator(from) : null, labelMergeStrategy);
+ }
+
+ @Override
+ public int numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return g0.randomAccess() && g1.randomAccess();
+ }
+
+ @Override
+ public boolean hasCopiableIterators() {
+ return g0.hasCopiableIterators() && g1.hasCopiableIterators();
+ }
+
+ @Override
+ public int[] successorArray(final int x) {
+ fillCache(x);
+ return cache;
+ }
+
+ private void fillCache(final int x) {
+ if (x == cachedNode) return;
+ // We need to perform a manual merge
+ ArcLabelledNodeIterator.LabelledArcIterator succ0 = (LabelledArcIterator) (x < n0? g0.successors(x) : ObjectIterators.EMPTY_ITERATOR);
+ ArcLabelledNodeIterator.LabelledArcIterator succ1 = (LabelledArcIterator) (x < n1? g1.successors(x) : ObjectIterators.EMPTY_ITERATOR);
+ int outdegree = 0;
+ int s0 = -1, s1 = -1;
+ Label l0 = null, l1 = null;
+ int[] cache = new int[INITIAL_ARRAY_SIZE];
+ Label[] labelCache = new Label[INITIAL_ARRAY_SIZE];
+ while ((s0 != -1 || (s0 = succ0.nextInt()) != -1) | (s1 != -1 || (s1 = succ1.nextInt()) != -1)) {
+ if (s0 != -1) l0 = succ0.label().copy();
+ if (s1 != -1) l1 = succ1.label().copy();
+ if (ASSERTS) assert s0 >= 0 || s1 >= 0;
+ cache = IntArrays.grow(cache, outdegree + 1);
+ labelCache = ObjectArrays.grow(labelCache, outdegree + 1);
+ if (s1 < 0 || 0 <= s0 && s0 < s1) {
+ cache[outdegree] = s0;
+ labelCache[outdegree] = l0;
+ s0 = -1;
+ } else if (s0 < 0 || 0 <= s1 && s1 < s0) {
+ cache[outdegree] = s1;
+ labelCache[outdegree] = l1;
+ s1 = -1;
+ } else {
+ if (ASSERTS) assert s0 == s1 && s0 >= 0;
+ cache[outdegree] = s0;
+ labelCache[outdegree] = labelMergeStrategy.merge(l0, l1);
+ s0 = s1 = -1;
+ }
+ outdegree++;
+ }
+
+ this.cache = cache;
+ this.labelCache = labelCache;
+ this.outdegree = outdegree;
+ cachedNode = x;
+ }
+
+ @Override
+ public int outdegree(int x) {
+ fillCache(x);
+ return outdegree;
+ }
+
+ @Override
+ public Label[] labelArray(int x) {
+ fillCache(x);
+ return labelCache;
+ }
+
+ @Override
+ public LabelledArcIterator successors(int x) {
+ fillCache(x);
+ final int outdegree = this.outdegree;
+ final int[] cache = this.cache;
+ final Label[] labelCache = this.labelCache;
+
+ return new LabelledArcIterator() {
+ int nextToBeReturned = -1;
+
+ @Override
+ public Label label() {
+ return labelCache[nextToBeReturned];
+ }
+
+ @Override
+ public int nextInt() {
+ if (++nextToBeReturned >= outdegree) return -1;
+ return cache[nextToBeReturned];
+ }
+
+ @Override
+ public int skip(int n) {
+ int skipped = Math.min(n, outdegree - nextToBeReturned - 1);
+ if (skipped < 0) return 0;
+ nextToBeReturned += skipped;
+ return skipped;
+ }
+ };
+ }
+
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/package.html b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/package.html
new file mode 100644
index 0000000..01b0daa
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/labelling/package.html
@@ -0,0 +1,49 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>Webgraph</title>
+ </head>
+
+ <body>
+
+ <P>Main classes implementing labelling for {@linkplain it.unimi.dsi.webgraph.ImmutableGraph immutable graphs}.
+ A labelled immutable graph is a graph endowed with labels, on its arcs and/or on its nodes; currently, only
+ arc labelling is implemented (since node labelling can be easily dealt with outside of the WebGraph framework, anyway).
+
+ <p><strong>Warning</strong>: this package is experimental.
+
+ <H1>Labels</H1>
+
+ <P>A label is just an instance of a class implementing the {@link it.unimi.dsi.webgraph.labelling.Label} interface: essentially,
+ for maximum versatility, is a set of key/value pairs, where keys are strings and values can be essentially anything; in most simple cases,
+ though, labels will be made by a single key/value pair (and the key will be, of course, irrelevant).
+ All arcs of the same graph will have labels of the same class, and for this reason labels offer a {@link it.unimi.dsi.webgraph.labelling.Label#copy()}
+ method that allows the prototype design pattern to be used.
+
+ <P>The only requirement for the serialisation of labels is that every label can be written as a self-delimiting bit sequence (via the
+ {@link it.unimi.dsi.webgraph.labelling.Label#toBitStream(it.unimi.dsi.io.OutputBitStream,int)} method); essentially,
+ two kinds of label exists: fixed-width labels (that write themselves using always the same, fixed number of bits) or
+ variable-width labels; you can know whether a label has fixed width or not by calling {@link it.unimi.dsi.webgraph.labelling.Label#fixedWidth()}
+ (this method will return -1 if the width is variable).
+
+ <P>As an example, single-attribute integer label classes are implemented, {@linkplain it.unimi.dsi.webgraph.labelling.FixedWidthIntLabel one using fixed width}
+ and {@linkplain it.unimi.dsi.webgraph.labelling.GammaCodedIntLabel another using &gamma;-coding}.
+
+ <H2>Labelled graphs</H2>
+
+ <P>An {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph arc-labelled immutable graphs} is an
+ {@linkplain it.unimi.dsi.webgraph.ImmutableGraph immutable graphs} with labels on its arcs; it rewrites the immutable graphs methods
+ covariantly so that, for example, when one iterates on the successors of a node using {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph#successors(int)}
+ not a simple {@link it.unimi.dsi.fastutil.ints.IntIterator} is returned (iterating over the nodes that are successors of the given node), but rather
+ a {@link it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator} (that returns every time a node/label pair).
+
+ <P>Even though different implementations of {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph arc-labelled immutable graphs}
+ may exist, we provide one ({@link it.unimi.dsi.webgraph.labelling.BitStreamArcLabelledImmutableGraph}) that assumes that an immutable graph has been
+ provided and that labels have been written onto a label file in the same order as the arcs of the immutable graph would be returned by the
+ {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph#nodeIterator()} method. An additional offset file must be provided that
+ allows one to know the offset (in bit) within the label file where the labels of the arcs going out of a given node start.
+ These data are generated using the <code>store()</code> methods whose implementation is suggested in
+ the class documentation of {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph}.
+
+ </body>
+</html>
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/package.html b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/package.html
new file mode 100644
index 0000000..ff5e29c
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>Webgraph</title>
+ </head>
+
+ <body>
+
+ <P>Main classes implementing the WebGraph algorithms.
+
+
+ </body>
+</html>
diff --git a/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/test/SpeedTest.java b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/test/SpeedTest.java
new file mode 100644
index 0000000..7c64294
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/it/unimi/dsi/webgraph/test/SpeedTest.java
@@ -0,0 +1,188 @@
+package it.unimi.dsi.webgraph.test;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2003-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph.LoadMethod;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.LazyIntSkippableIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+
+public class SpeedTest {
+ private final static int WARMUP = 3;
+ private final static int REPEAT = 10;
+ private SpeedTest() {}
+
+ @SuppressWarnings("boxing")
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, JSAPException, IOException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, ClassNotFoundException, InstantiationException {
+ final SimpleJSAP jsap = new SimpleJSAP(SpeedTest.class.getName(), "Tests the access speed of an ImmutableGraph. By default, the graph is enumerated sequentially, but you can specify a number of nodes to be accessed randomly.\n\nThis class executes " + WARMUP + " warmup iterations, and then averages the timings of the following " + REPEAT + " iterations.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'g', "graphClass", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The basename is a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new FlaggedOption("seed", JSAP.LONG_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'S', "seed", "A seed for the pseudorandom number generator."),
+ new FlaggedOption("random", JSAP.LONGSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'r', "random", "Perform a random-access test on this number of nodes instead of enumerating sequentially the whole graph."),
+ new FlaggedOption("adjacency", JSAP.LONGSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'a', "adjacency", "Perform an adjacency test on this number of random pairs instead of enumerating sequentially the whole graph."),
+ new Switch("first", 'f', "first", "Just enumerate the first successor of each tested node."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean random = jsapResult.userSpecified("random");
+ final boolean adjacency = jsapResult.userSpecified("adjacency");
+ if (random && adjacency) throw new IllegalArgumentException("You cannot specify a random and an adjacency test at the same time");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean first = jsapResult.userSpecified("first");
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final String basename = jsapResult.getString("basename");
+ if (graphClass != null && spec) throw new IllegalArgumentException("Options --graph-class and --spec are incompatible.");
+
+ final ProgressLogger pl = new ProgressLogger();
+ final long seed = jsapResult.userSpecified("seed") ? jsapResult.getLong("seed") : Util.randomSeed();
+ final XoRoShiRo128PlusRandom r = new XoRoShiRo128PlusRandom();
+
+ System.err.println("Seed: " + seed);
+
+ // The number of overall links, unless first is true, in which case the number of tested nodes.
+ long totLinks = 0;
+ long cumulativeTime = 0;
+ long z = -1;
+ final long samples;
+ final ImmutableGraph graph;
+
+ if (random) {
+ if (jsapResult.userSpecified("graphClass")) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.STANDARD.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ else if (spec) graph = ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ else graph = ImmutableGraph.load(basename, pl);
+
+ final int n = graph.numNodes();
+ samples = jsapResult.getLong("random");
+
+ r.setSeed(seed);
+ if (first) totLinks = samples;
+ else for(long i = samples; i-- != 0;) totLinks += graph.outdegree(r.nextInt(n));
+
+ System.err.println(first ? "Accessing the first link on " + samples + " random nodes using ImmutableGraph.successors()..." : "Accessing links on " + samples + " random nodes using ImmutableGraph.successors()...");
+
+ for(int k = WARMUP + REPEAT; k-- != 0;) {
+ r.setSeed(seed);
+ long time = -System.nanoTime();
+ if (first)
+ for(long i = samples; i-- != 0;) z ^= graph.successors(r.nextInt(n)).nextInt();
+ else
+ for(long i = samples; i-- != 0;)
+ for(final LazyIntIterator links = graph.successors(r.nextInt(n)); links.nextInt() != - 1;);
+
+ time += System.nanoTime();
+
+ if (k < REPEAT) cumulativeTime += time;
+ System.err.printf("Intermediate time: %3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ time / 1E9, samples, totLinks, (samples * 1E9) / time, (totLinks * 1E9) / time, time / (double)samples, time / (double)totLinks);
+ }
+ final double averageTime = cumulativeTime / (double)REPEAT;
+ System.out.printf("Time: %.3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ averageTime / 1E9, samples, totLinks, (samples * 1E9) / averageTime, (totLinks * 1E9) / averageTime, averageTime / samples, averageTime / totLinks);
+ }
+ else if (adjacency) {
+ if (jsapResult.userSpecified("graphClass")) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.STANDARD.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ else if (spec) graph = ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ else graph = ImmutableGraph.load(basename, pl);
+
+ final int n = graph.numNodes();
+ samples = jsapResult.getLong("adjacency");
+
+ r.setSeed(seed);
+
+ System.err.println("Testing adjacency on " + samples + " random pairs...");
+
+ for(int k = WARMUP + REPEAT; k-- != 0;) {
+ r.setSeed(seed);
+ long time = -System.nanoTime();
+ for(long i = samples; i-- != 0;) {
+ final LazyIntIterator iterator = graph.successors(r.nextInt(n));
+ final int other = r.nextInt(n);
+ if (iterator instanceof LazyIntSkippableIterator) z ^= ((LazyIntSkippableIterator)iterator).skipTo(other);
+ else for(;;) {
+ final int s = iterator.nextInt();
+ if (s == -1 || s >= other) break;
+ }
+ }
+
+ time += System.nanoTime();
+
+ if (k < REPEAT) cumulativeTime += time;
+ System.err.printf("Intermediate time: %3fs nodes: %d; nodes/s: %.3f ns/node: %3f\n",
+ time / 1E9, samples, (samples * 1E9) / time, time / (double)samples);
+ }
+ final double averageTime = cumulativeTime / (double)REPEAT;
+ System.out.printf("Time: %.3fs nodes: %d;nodes/s: %.3f ns/node: %3f\n",
+ averageTime / 1E9, samples, (samples * 1E9) / averageTime, averageTime / samples);
+ } else {
+ if (first) throw new IllegalArgumentException("Option --first requires --random.");
+ if (jsapResult.userSpecified("graphClass")) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.STANDARD.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ else if (spec) graph = ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ else graph = ImmutableGraph.load(basename, pl);
+
+ samples = graph.numNodes();
+
+ System.err.println("Accessing links sequentially using ImmutableGraph.successorArray()...");
+
+ for(int k = WARMUP + REPEAT; k-- != 0;) {
+ long time = -System.nanoTime();
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ totLinks = 0;
+ for(long i = samples; i-- != 0;) {
+ nodeIterator.nextInt();
+ totLinks += nodeIterator.outdegree();
+ nodeIterator.successorArray();
+ }
+ time += System.nanoTime();
+
+ if (k < REPEAT) cumulativeTime += time;
+ System.err.printf("Intermediate time: %3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ time / 1E9, samples, totLinks, (samples * 1E9) / time, (totLinks * 1E9) / time, time / (double)samples, time / (double)totLinks);
+ }
+ final double averageTime = cumulativeTime / (double)REPEAT;
+ System.out.printf("Time: %.3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ averageTime / 1E9, samples, totLinks, (samples * 1E9) / averageTime, (totLinks * 1E9) / averageTime, averageTime / samples, averageTime / totLinks);
+ }
+
+ if (z == 0) System.err.println((char)0);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/src/overview.html b/third_party/webgraph-3.6.1/src/overview.html
new file mode 100644
index 0000000..693608a
--- /dev/null
+++ b/third_party/webgraph-3.6.1/src/overview.html
@@ -0,0 +1,137 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>WebGraph</title>
+ </head>
+
+ <body>
+
+ <P>WebGraph is a framework to study the web graph. It provides simple ways to manage
+ very large graphs, exploiting modern compression techniques. More precisely,
+ it is currently made of:
+ <OL>
+
+ <LI>A set of simple codes, called <em>&zeta; codes</em>, which are
+ particularly suitable for storing web graphs (or, in general, integers
+ with a power-law distribution in a certain exponent range).
+
+ <LI>Algorithms for compressing web graphs that exploit gap compression and
+ differential compression (<i>&agrave; la</i> <A
+ HREF="http://citeseer.nj.nec.com/randall01link.html">LINK</A>),
+ intervalisation and &zeta; codes to provide a high compression ratio (see <A HREF="http://law.dsi.unimi.it/datasets.php">our datasets</A>). The
+ algorithms are controlled by several parameters, which provide
+ different tradeoffs between access speed and compression ratio.
+
+ <LI>Algorithms for accessing a compressed graph without actually decompressing it,
+ using lazy techniques that delay the decompression until it is actually necessary.
+
+ <LI>Algorithms for analysing very large graphs, such as
+ {@link it.unimi.dsi.webgraph.algo.HyperBall}, which
+ has been used to show that Facebook has just <a href="http://vigna.dsi.unimi.it/papers.php#BBRFDS">four degrees of separation</a>.
+
+ <LI>This package, providing a complete, documented implementation of
+ the algorithms above in Java. It is <A
+ HREF="http://www.gnu.org/philosophy/free-sw.html">free software</A>
+ distributed under the <A
+ HREF="http://www.gnu.org/copyleft/gpl.html"><ACRONYM TITLE="GNU's not
+ Unix">GNU</ACRONYM> General Public License</A>.
+
+ <LI>Data sets for very large graph (e.g., a billion of links). These are either
+ gathered from public sources (such as <A HREF="http://www-diglib.stanford.edu/~testbed/doc2/WebBase/">WebBase</A>),
+ or gathered by <A HREF="http://law.dsi.unimi.it/software.php#ubicrawler">UbiCrawler</A>.
+
+ </OL>
+
+ <P>In the end, with WebGraph you can access and analyse very large web graphs. Using WebGraph is as easy as installing a few
+ jar files and downloading a data set.
+
+ <p>You are welcome to use and improve WebGraph! If you find our software useful for your research, please quote
+ our paper &ldquo;<a href="http://vigna.dsi.unimi.it/papers.php#BoVWFI">The WebGraph Framework I: Compression Techniques</a>&rdquo;, by Paolo Boldi and
+ Sebastiano Vigna, in <i>Proc&#46; of the Thirteenth World&ndash;Wide Web
+ Conference</i>, pages 595&minus;601, 2004, ACM Press.
+
+ <h2>Looking around</h2>
+
+ <P>For in-depth information on the Webgraph framework, you should have
+ a look at its <A HREF="http://webgraph.dsi.unimi.it/">home page</A>,
+ where you can find some papers about the compression techniques it uses.
+ Datasets are available at the <a href="http://law.di.unimi.it/">
+ <acronym title="Laboratory for Web Algorithmics">LAW</acronym> web site</a>.
+
+ <P>The classes of interest for the casual Webgraph user are {@link
+ it.unimi.dsi.webgraph.ImmutableGraph}, which specifies the access
+ methods for an immutable graph, {@link it.unimi.dsi.webgraph.BVGraph},
+ which allow to retrieve or recompress a graph stored in the format
+ described in <a
+ href="http://vigna.dsi.unimi.it/papers.php#BoVWFI"><i>The WebGraph
+ Framework I: Compression Techniques</i></a>, and {@link it.unimi.dsi.webgraph.Transform}, which
+ provides several ways to transform an {@link it.unimi.dsi.webgraph.ImmutableGraph}.
+
+ <p>If you plan on building your graphs dynamically, the class
+ {@link it.unimi.dsi.webgraph.ArrayListMutableGraph} makes it possible
+ to create incrementally a graph and then extract an {@linkplain
+ it.unimi.dsi.webgraph.ArrayListMutableGraph#immutableView() immutable view}.
+
+ <P>The package {@link it.unimi.dsi.webgraph.examples} contains useful
+ examples that show how to access sequentially and randomly an immutable
+ graph.
+
+ <h2>Exporting to other formats</h2>
+
+ <p>{@link it.unimi.dsi.webgraph.ASCIIGraph} and {@link it.unimi.dsi.webgraph.ArcListASCIIGraph}
+ have main methods that can be used to save an immutable graph, as long as you can load it, in ASCII form.
+ With data in {@link it.unimi.dsi.webgraph.BVGraph} or {@link it.unimi.dsi.webgraph.EFGraph} format this is as simple as
+<pre>
+java -server it.unimi.dsi.webgraph.ASCIIGraph <var>sourcebasename</var> <var>dest</var>
+</pre>
+ or
+<pre>
+java -server it.unimi.dsi.webgraph.ArcListASCIIGraph <var>sourcebasename</var> <var>dest</var>
+</pre>
+
+ <p>Please consult the documentation and the command-line help of these two classes to get more information.
+
+ <h2>Importing your data</h2>
+
+ <p>If you want to import your own data into WebGraph, you must write
+ an implementation of {@link it.unimi.dsi.webgraph.ImmutableGraph} that
+ exposes your data. A simple example is given in {@link it.unimi.dsi.webgraph.examples.IntegerListImmutableGraph},
+ a stub class exposing a simple, noncompressed binary format as an {@link it.unimi.dsi.webgraph.ImmutableGraph}.
+ Once your data is exposed in that way, you can get a compressed version
+ using the <code>store()</code> method of your class of interest. Often, there
+ is a main method (see, e.g., {@link it.unimi.dsi.webgraph.BVGraph}) that
+ will load your class and invoke <code>store()</code> for you.
+
+ <p>For example, you can use an immutable graph inside the <a href="http://jung.sourceforge.net/">Jung</a> framework using our
+ {@link it.unimi.dsi.webgraph.jung.JungAdapter}.
+
+
+ <p>As an alternative, the class {@link it.unimi.dsi.webgraph.ASCIIGraph}
+ can be used to read graphs specified in a very simple ASCII format. The class
+ implements {@link it.unimi.dsi.webgraph.ASCIIGraph#loadOnce(java.io.InputStream)} so
+ that the file can be just piped into a class offering a main method that supports
+ <code>loadOnce()</code> (e.g., {@link it.unimi.dsi.webgraph.BVGraph}).
+ You can also generate a graph in ASCII format and read it using
+ {@link it.unimi.dsi.webgraph.ASCIIGraph#loadOffline(CharSequence)}&mdash;the
+ graph will not be loaded into main memory.
+
+ <p>{@link it.unimi.dsi.webgraph.ASCIIGraph} requires listing the successors of each
+ node on a separate line. If your graph is specified arc by arc (one arc per line) you
+ can use {@link it.unimi.dsi.webgraph.ArcListASCIIGraph} instead.
+ {@link it.unimi.dsi.webgraph.ShiftedByOneArcListASCIIGraph} can be used if your input
+ data numbers (rather insensibly) nodes starting from one.
+
+ <p>Another possibility is to specify your graph <em>{@linkplain it.unimi.dsi.webgraph.IncrementalImmutableSequentialGraph incrementally}</em>.
+ which just involves enumerating arrays of successors for each node.
+
+ <h2>Importing your <em>labelled</em> data</h2>
+
+ <p>Arc-labelled graphs are represented using implementations of {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph}.
+ Most arc-labelled graphs are based on an underlying {@link it.unimi.dsi.webgraph.ImmutableGraph}, and
+ the {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph} implementation just provides
+ label handling. The example {@link it.unimi.dsi.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph}
+ shows how to expose your data as an instance of {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph},
+ so you can save your data using your preferred combination of implementations.
+
+ </body>
+</html>
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ArcListASCIIGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ArcListASCIIGraphTest.java
new file mode 100644
index 0000000..006f896
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ArcListASCIIGraphTest.java
@@ -0,0 +1,57 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+
+import org.junit.Test;
+
+public class ArcListASCIIGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testLoadOnce() throws UnsupportedEncodingException, IOException {
+
+ ArcListASCIIGraph g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("0 2\n0 1\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 \t 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("2 0\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("1 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("0 1\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("\n\n0 1\n2 1\n\n".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ArrayListMutableGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ArrayListMutableGraphTest.java
new file mode 100644
index 0000000..23b4338
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ArrayListMutableGraphTest.java
@@ -0,0 +1,55 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+import org.junit.Test;
+
+public class ArrayListMutableGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testConstructor() throws IllegalArgumentException, SecurityException {
+ for(int n = 1; n < 8; n++) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing type " + type + ", n=" + n + "...");
+ ArrayListMutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false) :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n) :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n);
+ final ImmutableGraph immutableView = g.immutableView();
+ assertGraph(immutableView);
+ assertEquals(g, new ArrayListMutableGraph(immutableView));
+ int[][] arc = new int[(int)g.numArcs()][2];
+ for(int i = 0, k = 0; i < g.numNodes(); i++)
+ for(IntIterator successors = g.successors(i); successors.hasNext();)
+ arc[k++] = new int[] { i, successors.nextInt() };
+
+ assertEquals(g, new ArrayListMutableGraph(g.numNodes(), arc));
+ }
+ }
+ }
+
+ @Test
+ public void testHashCode() {
+ ArrayListMutableGraph g = ArrayListMutableGraph.newCompleteGraph(10, false);
+ assertEquals(g.immutableView().hashCode(), g.hashCode());
+
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/BVGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/BVGraphTest.java
new file mode 100644
index 0000000..8cd542d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/BVGraphTest.java
@@ -0,0 +1,216 @@
+package it.unimi.dsi.webgraph;
+
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.bits.Fast;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Properties;
+import java.util.zip.GZIPInputStream;
+
+import org.junit.Test;
+
+public class BVGraphTest extends WebGraphTestCase {
+
+ public static File storeTempGraph(final ImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(BVGraphTest.class.getSimpleName(), "test");
+ BVGraph.store(g, basename.toString());
+ return basename;
+ }
+
+ public static File storeTempGraph(final ImmutableGraph g, int windowSize, int maxRefCount, int minIntervalLength, int flags) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(BVGraphTest.class.getSimpleName(), "test");
+ BVGraph.store(g, basename.toString(), windowSize, maxRefCount, minIntervalLength, 3, flags);
+ return basename;
+ }
+
+ @Test
+ public void testCompression() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n = 1; n < 8; n++) { // Graph construction parameter
+ for(int type = 1; type < 3; type++) {
+ final ImmutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+ for(int w = 0; w < 3; w++) { // Window size
+ for(int r = 0; r < (w == 0 ? 1 : 3); r++) { // Max backward references
+ for(int i = 0; i < 4; i++) { // Minimum interval length; 0 is NO_INTERVALS
+ System.err.println("Testing type " + type + ", n=" + n + ", w=" + w + ", r=" + r + ", i=" + i + "...");
+ final File basename = BVGraphTest.storeTempGraph(g, w, r, i, 0);
+ final Properties properties = new Properties();
+ final FileInputStream propertyFile = new FileInputStream(basename + BVGraph.PROPERTIES_EXTENSION);
+ properties.load(propertyFile);
+ propertyFile.close();
+ assertEquals(new File(basename + BVGraph.GRAPH_EXTENSION).length(),
+ (Long.parseLong(properties.getProperty("bitsforoutdegrees"))+
+ Long.parseLong(properties.getProperty("bitsforreferences"))+
+ Long.parseLong(properties.getProperty("bitsforblocks"))+
+ Long.parseLong(properties.getProperty("bitsforintervals"))+
+ Long.parseLong(properties.getProperty("bitsforresiduals")) + 7) / 8
+ );
+
+ assertEquals(g.numArcs(), Long.parseLong(properties.getProperty("copiedarcs")) + Long.parseLong(properties.getProperty("intervalisedarcs")) + Long.parseLong(properties.getProperty("residualarcs")));
+ ImmutableGraph h;
+
+ System.err.println("Testing offline...");
+ h = BVGraph.loadOffline(basename.toString());
+ assertGraph(h);
+ assertEquals(g, h);
+
+ // We try to force deallocation of memory-mapped graphs
+ System.gc();
+
+ System.err.println("Testing mapped...");
+ h = BVGraph.loadMapped(basename.toString());
+ assertGraph(h);
+ assertEquals(g, h);
+
+ System.err.println("Testing standard...");
+ h = BVGraph.load(basename.toString());
+ assertGraph(h);
+ assertEquals(g, h);
+
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testLarge() throws IOException {
+ ASCIIGraph asciiGraph = ASCIIGraph.loadOnce(new GZIPInputStream(getClass().getResourceAsStream("cnr-2000.graph-txt.gz")));
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ assertEquals(asciiGraph, g);
+
+ asciiGraph = ASCIIGraph.loadOnce(new GZIPInputStream(getClass().getResourceAsStream("cnr-2000.graph-txt.gz")));
+ NodeIterator nodeIterator = asciiGraph.nodeIterator();
+ for(int i = 0; i < g.numNodes(); i++) {
+ nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ assertEquals(d, g.outdegree(i));
+ LazyIntIterator asciiSuccessors = nodeIterator.successors(), successors = g.successors(i);
+ for(int j = 0; j <= d; j++) assertEquals(asciiSuccessors.nextInt(), successors.nextInt());
+ }
+
+ deleteGraph(path);
+ }
+
+ @Test
+ public void testStats() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ System.err.println("*******"+ g.getClass());
+ // We overwrite the previously created temporary graph
+ BVGraph.store(g, path + "2");
+ System.err.println("*******");
+
+ // Test statistics
+ final int[] bin = new int[32];
+ NodeIterator nodeIterator = g.nodeIterator();
+ for(int i = 0; i < g.numNodes(); i++) {
+ nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ int[] a = nodeIterator.successorArray();
+ if (d > 0) {
+ for(int j = d - 1; j-- != 0;) bin[Fast.mostSignificantBit(a[j + 1] - a[j])]++;
+ final int msb = Fast.mostSignificantBit(Fast.int2nat(a[0] - i));
+ if (msb >= 0) bin[msb]++;
+ }
+ }
+
+
+ Properties properties = new Properties();
+ final FileInputStream inStream = new FileInputStream(path + "2" + BVGraph.PROPERTIES_EXTENSION);
+ properties.load(inStream);
+ inStream.close();
+ String stats = properties.getProperty("successorexpstats");
+ String[] s = stats.split(",");
+ for(int i = s.length; i-- != 0;) assertEquals(bin[i], Integer.parseInt(s[i]));
+
+ long gap = 1, totGap = 0, tot = 0;
+ double totLogGap = 0;
+ for(int i = 0; i < s.length; i++) {
+ totGap += (gap * 2 + gap - 1) * Integer.parseInt(s[i]);
+ totLogGap += (Fast.log2(gap * 2 + gap + 1) - 1) * Integer.parseInt(s[i]);
+ tot += Integer.parseInt(s[i]);
+ gap *= 2;
+ }
+
+ assertEquals((double)totGap / (tot * 2), Double.parseDouble(properties.getProperty("successoravggap")), 1E-3);
+ assertEquals(totLogGap / tot, Double.parseDouble(properties.getProperty("successoravgloggap")), 1E-3);
+
+ assertEquals(new File(path + "2" + BVGraph.GRAPH_EXTENSION).length(),
+ (Long.parseLong(properties.getProperty("bitsforoutdegrees"))+
+ Long.parseLong(properties.getProperty("bitsforreferences"))+
+ Long.parseLong(properties.getProperty("bitsforblocks"))+
+ Long.parseLong(properties.getProperty("bitsforintervals"))+
+ Long.parseLong(properties.getProperty("bitsforresiduals")) + 7) / 8
+ );
+
+ assertEquals(g.numArcs(), Long.parseLong(properties.getProperty("copiedarcs")) + Long.parseLong(properties.getProperty("intervalisedarcs")) + Long.parseLong(properties.getProperty("residualarcs")));
+
+ // To test residual stats, we compress with no intervalisation etc.
+ BVGraph.store(g, path + "2", 0, 0, 0, 3, 0);
+
+ // Test statistics
+ Arrays.fill(bin, 0);
+ nodeIterator = g.nodeIterator();
+ for(int i = 0; i < g.numNodes(); i++) {
+ nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ int[] a = nodeIterator.successorArray();
+ if (d > 0) {
+ for(int j = d - 1; j-- != 0;) bin[Fast.mostSignificantBit(a[j + 1] - a[j])]++;
+ final int msb = Fast.mostSignificantBit(Fast.int2nat(a[0] - i));
+ if (msb >= 0) bin[msb]++;
+ }
+ }
+
+ /* TODO: write test for residuals
+ stats = properties.getProperty("residualexpstats");
+ s = stats.split(",");
+ for(int i = s.length; i-- != 0;) assertEquals(bin[i], Integer.parseInt(s[i]));
+
+
+ gap = 1;
+ totGap = 0;
+ tot = 0;
+ totLogGap = 0;
+ for(int i = 0; i < s.length; i++) {
+ totGap += (gap * 2 + gap - 1) * Integer.parseInt(s[i]);
+ totLogGap += (Fast.log2(gap * 2 + gap + 1) - 1) * Integer.parseInt(s[i]);
+ tot += Integer.parseInt(s[i]);
+ gap *= 2;
+ }
+ assertEquals((double)totGap / (tot * 2), Double.parseDouble(properties.getProperty("residualavggap")), 1E-3);
+ assertEquals(totLogGap / tot, Double.parseDouble(properties.getProperty("residualavgloggap")), 1E-3);
+ */
+
+ deleteGraph(path);
+ deleteGraph(path + "2");
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/BuildHostMapTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/BuildHostMapTest.java
new file mode 100644
index 0000000..ada598e
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/BuildHostMapTest.java
@@ -0,0 +1,145 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2010-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertArrayEquals;
+import it.unimi.dsi.fastutil.ints.IntIterators;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.BufferedReader;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.io.StringReader;
+import java.net.URISyntaxException;
+
+import org.junit.Test;
+
+public class BuildHostMapTest extends WebGraphTestCase {
+
+ @Test
+ public void testSimpleNoLogger() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://a/b\nhttp://c\nhttp://a.b:81/\nhttp://c/c\nhttp://a:80/\nhttps://a/\nhttps://a.b\nhttp://159.149.130.49/"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, false, null);
+ mapDos.close();
+ hosts.close();
+ DataInputStream dis = new DataInputStream(new FastByteArrayInputStream(mapFbaos.array, 0, mapFbaos.length));
+ assertEquals(0, dis.readInt());
+ assertEquals(1, dis.readInt());
+ assertEquals(2, dis.readInt());
+ assertEquals(1, dis.readInt());
+ assertEquals(0, dis.readInt());
+ assertEquals(0, dis.readInt());
+ assertEquals(2, dis.readInt());
+ assertEquals(3, dis.readInt());
+ assertEquals(0, dis.available());
+ dis.close();
+ BufferedReader hostsIn = new BufferedReader(new InputStreamReader(new FastByteArrayInputStream(hostsStream.array, 0, hostsStream.length)));
+ assertEquals("a", hostsIn.readLine());
+ assertEquals("c", hostsIn.readLine());
+ assertEquals("a.b", hostsIn.readLine());
+ assertEquals("159.149.130.49", hostsIn.readLine());
+ assertEquals(null, hostsIn.readLine());
+ hostsIn.close();
+ assertArrayEquals(new int[] { 3, 2, 2, 1 }, IntIterators.unwrap(BinIO.asIntIterator(new DataInputStream(new FastByteArrayInputStream(countFbaos.array, 0, countFbaos.length)))));
+ }
+
+ @Test
+ public void testSimpleLogger() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://a/b\nhttp://c\nhttp://a.b/\nhttp://c/c\nhttp://a/\nhttps://a/\nhttps://a.b"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, false, new ProgressLogger());
+ mapDos.close();
+ hosts.close();
+ DataInputStream dis = new DataInputStream(new FastByteArrayInputStream(mapFbaos.array, 0, mapFbaos.length));
+ assertEquals(0, dis.readInt());
+ assertEquals(1, dis.readInt());
+ assertEquals(2, dis.readInt());
+ assertEquals(1, dis.readInt());
+ assertEquals(0, dis.readInt());
+ assertEquals(0, dis.readInt());
+ assertEquals(2, dis.readInt());
+ assertEquals(0, dis.available());
+ dis.close();
+ BufferedReader hostsIn = new BufferedReader(new InputStreamReader(new FastByteArrayInputStream(hostsStream.array, 0, hostsStream.length)));
+ assertEquals("a", hostsIn.readLine());
+ assertEquals("c", hostsIn.readLine());
+ assertEquals("a.b", hostsIn.readLine());
+ assertEquals(null, hostsIn.readLine());
+ hostsIn.close();
+ assertArrayEquals(new int[] { 3, 2, 2 }, IntIterators.unwrap(BinIO.asIntIterator(new DataInputStream(new FastByteArrayInputStream(countFbaos.array, 0, countFbaos.length)))));
+ }
+
+ @Test
+ public void testTopPrivateDomainNoLogger() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://b.a.co.uk/b\nhttp://c.a.co.uk\nhttp://a.b.co.uk\nhttp://159.149.130.49/"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, true, null);
+ mapDos.close();
+ hosts.close();
+ DataInputStream dis = new DataInputStream(new FastByteArrayInputStream(mapFbaos.array, 0, mapFbaos.length));
+ assertEquals(0, dis.readInt());
+ assertEquals(0, dis.readInt());
+ assertEquals(1, dis.readInt());
+ assertEquals(2, dis.readInt());
+ assertEquals(0, dis.available());
+ dis.close();
+ BufferedReader hostsIn = new BufferedReader(new InputStreamReader(new FastByteArrayInputStream(hostsStream.array, 0, hostsStream.length)));
+ assertEquals("a.co.uk", hostsIn.readLine());
+ assertEquals("b.co.uk", hostsIn.readLine());
+ assertEquals("159.149.130.49", hostsIn.readLine());
+ assertEquals(null, hostsIn.readLine());
+ hostsIn.close();
+ assertArrayEquals(new int[] { 2, 1, 1 }, IntIterators.unwrap(BinIO.asIntIterator(new DataInputStream(new FastByteArrayInputStream(countFbaos.array, 0, countFbaos.length)))));
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testMalformed() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://a/b\nhttp://c\nhttp//a.b/\nhttp://c/c\nhttp://a/\nhttps://a/\nhttps://a.b"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, false, null);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/CliqueGraph.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/CliqueGraph.java
new file mode 100644
index 0000000..b2511d3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/CliqueGraph.java
@@ -0,0 +1,111 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ Copyright (C) 2010-2017 Sebastiano Vigna
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+
+ */
+
+
+/** A bidirectional chain of cliques.
+ *
+ * @author Sebastiano Vigna
+ */
+
+public final class CliqueGraph extends ImmutableGraph {
+ /** The number of nodes in the graph. */
+ private final int n;
+ /** The number of elements per clique. */
+ private final int c;
+
+ /** Creates a new bidirectional chain of cliques of given size.
+ *
+ * @param n the overall number of nodes (will be rounded down to the nearest multiple of <code>c</code>).
+ * @param c the size of each clique.
+ */
+ public CliqueGraph(int n, int c) {
+ this.n = n - n % c;
+ this.c = c;
+
+ }
+
+ /** Creates a new clique of given size.
+ *
+ * @param n the size of the clique.
+ */
+ public CliqueGraph(int n) {
+ this(n, n);
+ }
+
+ /** Creates a new bidirectional chain of cliques of given size.
+ *
+ * @param n the overall number of nodes (will be rounded down to the nearest multiple of <code>c</code>).
+ * @param c the size of each clique.
+ */
+ public CliqueGraph(String n, String c) {
+ this(Integer.parseInt(n), Integer.parseInt(c));
+ }
+
+ /** Creates a new clique of given size.
+ *
+ * @param n the size of the clique.
+ */
+ public CliqueGraph(String n) {
+ this(Integer.parseInt(n));
+ }
+
+ @Override
+ public ImmutableGraph copy() {
+ return this;
+ }
+
+ @Override
+ public int numNodes() {
+ return n;
+ }
+
+ @Override
+ public long numArcs() {
+ return (long)n * c - n + (n != c ? 2 * (n / c) : 0);
+ }
+
+ @Override
+ public int outdegree(int x) {
+ return c - 1 + (x % c == 0 && n != c ? 2 : 0);
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return true;
+ }
+
+ @Override
+ public int[] successorArray(final int x) {
+ int[] succ = new int[outdegree(x)];
+ final int start = x - x % c;
+ if (succ.length == c - 1) {
+ for(int i = 0, j = 0; i < c; i++) if (start+ i != x) succ[j++] = start + i;
+ }
+ else {
+ succ[0] = (x - c + n) % n;
+ for(int i = 0, j = 1; i < c; i++) if (start + i != x) succ[j++] = start + i;
+ succ[c] = (x + c) % n;
+ }
+
+ return succ;
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/DegreeRangeImmutableSubgraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/DegreeRangeImmutableSubgraphTest.java
new file mode 100644
index 0000000..f1449f7
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/DegreeRangeImmutableSubgraphTest.java
@@ -0,0 +1,46 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+public class DegreeRangeImmutableSubgraphTest extends WebGraphTestCase {
+
+ @Test
+ public void test() {
+ for(int i = 10; i < 100000; i *= 10) {
+ final double p = 5. / i;
+ ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(i, p, 0, false)).immutableView();
+ final int[] map = new int[g.numNodes()];
+ final int min = 2;
+ final int max = 4;
+ for(int j = 0, k = 0; j < g.numNodes(); j++)
+ map[j] = g.outdegree(j) >= min && g.outdegree(j) < max ? k++ : -1;
+ DegreeRangeImmutableSubgraph s = new DegreeRangeImmutableSubgraph(g, min, max);
+ assertGraph(s);
+ assertEquals(Transform.map(g, map), s);
+ s = new DegreeRangeImmutableSubgraph(g, 0, i);
+ assertGraph(s);
+ assertEquals(g, s);
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/EFGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/EFGraphTest.java
new file mode 100644
index 0000000..3870804
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/EFGraphTest.java
@@ -0,0 +1,173 @@
+package it.unimi.dsi.webgraph;
+
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.nio.ByteOrder;
+import java.util.Properties;
+
+import org.junit.Test;
+
+public class EFGraphTest extends WebGraphTestCase {
+
+ public static File storeTempGraph(final ImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(EFGraphTest.class.getSimpleName(), "test");
+ EFGraph.store(g, basename.toString());
+ return basename;
+ }
+
+ public static File storeTempGraph(final ImmutableGraph g, final int log2Quantum, final int cacheSize, final ByteOrder byteOrder) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(EFGraphTest.class.getSimpleName(), "test");
+ EFGraph.store(g, basename.toString(), log2Quantum, cacheSize, byteOrder, null);
+ return basename;
+ }
+
+ @Test
+ public void testCompression() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n = 1; n < 10; n++) { // Graph construction parameter
+ for(int type = 1; type < 3; type++) {
+ final ImmutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+
+ for(ByteOrder byteOrder: new ByteOrder[] { ByteOrder.LITTLE_ENDIAN, ByteOrder.BIG_ENDIAN }) {
+ for(int cacheSize = 1; cacheSize < 128 * 1024; cacheSize *= 2) {
+ for(int log2Quantum = 0; log2Quantum < 8; log2Quantum++) {
+ System.err.println("Testing type " + type + ", n=" + n + ", byteOrder=" + byteOrder + ", cacheSize=" + cacheSize + ", log2Quantum=" + log2Quantum + "...");
+ final File basename = EFGraphTest.storeTempGraph(g, log2Quantum, cacheSize, byteOrder);
+ final Properties properties = new Properties();
+ final FileInputStream propertyFile = new FileInputStream(basename + EFGraph.PROPERTIES_EXTENSION);
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ //System.err.println(properties);
+
+ ImmutableGraph h;
+
+ System.err.println("Testing standard...");
+ h = EFGraph.load(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ System.err.println("Testing mapped...");
+ h = EFGraph.loadMapped(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testErdosRenyi() throws IOException {
+ for(int size: new int[] { 10, 100, 1000, 10000 }) {
+ for(boolean upperBound: new boolean[] { false, true }) {
+ final String basename = File.createTempFile(getClass().getSimpleName(), "test").toString();
+ final ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .001, 0, false)).immutableView();
+ EFGraph.store(g, upperBound ? size * size : size, basename, 3, 1024, ByteOrder.nativeOrder(), null);
+ final EFGraph efGraph = (EFGraph)ImmutableGraph.load(basename);
+ assertEquals(g, efGraph);
+
+ for(int i = 0; i < size; i++) {
+ for(int j = i + 1; j < size; j++) {
+ LongOpenHashSet a = new LongOpenHashSet();
+ LongOpenHashSet b = new LongOpenHashSet();
+ LazyIntIterator sa = g.successors(i);
+ LazyIntIterator sb = g.successors(j);
+ for(long s; (s = sa.nextInt()) != -1;) a.add(s);
+ for(long t; (t = sb.nextInt()) != -1;) b.add(t);
+
+ a.retainAll(b);
+ b.clear();
+ final LazyIntSkippableIterator sx = efGraph.successors(i);
+ final LazyIntSkippableIterator sy = efGraph.successors(j);
+
+ int x = sx.nextInt();
+ int y = sy.nextInt();
+
+ while(x != -1 && x != LazyIntSkippableIterator.END_OF_LIST && y != -1 && y != LazyIntSkippableIterator.END_OF_LIST) {
+ if (x == y) {
+ b.add (x);
+ x = sx.nextInt();
+ }
+ else if(x < y) x = sx.skipTo(y);
+ else y = sy.skipTo(x);
+ }
+
+ assertEquals(a, b);
+ }
+ }
+
+
+ new File(basename).delete();
+ new File(basename + EFGraph.GRAPH_EXTENSION).delete();
+ new File(basename + EFGraph.OFFSETS_EXTENSION).delete();
+ new File(basename + EFGraph.PROPERTIES_EXTENSION).delete();
+ }
+ }
+ }
+
+ @Test
+ public void testSkipFirst() throws IOException {
+ final String basename = File.createTempFile(getClass().getSimpleName(), "test").toString();
+ final ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(1000, .01, 0, false)).immutableView();
+ EFGraph.store(g, 1000, basename, 3, 1024, ByteOrder.nativeOrder(), null);
+ final EFGraph efGraph = (EFGraph)ImmutableGraph.load(basename);
+ assertEquals(g, efGraph);
+
+ for(int i = 0; i < 1000; i++) {
+ for(int j = 0; j < 1000; j++) {
+ LazyIntSkippableIterator sa = efGraph.successors(i);
+ final int x = sa.skipTo(j);
+ sa = efGraph.successors(i);
+ for(;;) {
+ final int y = sa.nextInt();
+ if (y >= j) {
+ assertEquals(y, x);
+ break;
+ }
+ else if (y == -1) {
+ if (x != LazyIntSkippableIterator.END_OF_LIST) fail();
+ break;
+ }
+ }
+ }
+ }
+ new File(basename).delete();
+ new File(basename + EFGraph.GRAPH_EXTENSION).delete();
+ new File(basename + EFGraph.OFFSETS_EXTENSION).delete();
+ new File(basename + EFGraph.PROPERTIES_EXTENSION).delete();
+
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ImmutableGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ImmutableGraphTest.java
new file mode 100644
index 0000000..6eb5622
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ImmutableGraphTest.java
@@ -0,0 +1,139 @@
+package it.unimi.dsi.webgraph;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandomGenerator;
+import it.unimi.dsi.webgraph.Transform.ArcFilter;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+/*
+ Copyright (C) 2010-2017 Paolo Boldi, Sebastiano Vigna
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+
+ */
+
+
+
+public class ImmutableGraphTest {
+
+ public final static boolean DEBUG = false;
+
+ public void assertSplitIterators(final String graphFilename) throws IOException {
+ XoRoShiRo128PlusRandomGenerator r = new XoRoShiRo128PlusRandomGenerator(0);
+ int i = 0;
+ for (;;) {
+ ImmutableGraph g = ImmutableGraph.loadOffline(graphFilename);
+ switch (i) {
+ case 0:
+ WebGraphTestCase.assertSplitIterator(g, 4);
+ i++; break;
+ case 1:
+ WebGraphTestCase.assertSplitIterator(g, 1);
+ i++; break;
+ case 2:
+ if (g.numNodes() / 4 > 0) WebGraphTestCase.assertSplitIterator(g, Math.max(1, g.numNodes() / 4));
+ i++; break;
+ case 3:
+ WebGraphTestCase.assertSplitIterator(g, g.numNodes());
+ i++; break;
+ case 4:
+ WebGraphTestCase.assertSplitIterator(g, Math.max(1, g.numNodes() / (r.nextInt(10) + 1)));
+ i++; break;
+ case 5:
+ WebGraphTestCase.assertSplitIterator(g, r.nextInt(10) + 1);
+ i++; break;
+ default:
+ return;
+ }
+ }
+ }
+
+ @Test
+ public void testBVGraphSplitIteratorsOffline() throws IllegalArgumentException, SecurityException, IOException {
+ for (int size: new int[] { 5, 10, 100 })
+ for (double p: new double[] { .1, .3, .5, .9 }) {
+ ErdosRenyiGraph eg = new ErdosRenyiGraph(size, p, true);
+ File graphFile = BVGraphTest.storeTempGraph(eg);
+ ImmutableGraph graph;
+ graph = ImmutableGraph.load(graphFile.getAbsolutePath());
+ WebGraphTestCase.assertGraph(graph);
+ graph = ImmutableGraph.loadOffline(graphFile.getAbsolutePath());
+ WebGraphTestCase.assertGraph(graph);
+ assertSplitIterators(graphFile.getAbsolutePath());
+ graphFile.delete();
+ }
+ }
+
+ @Test
+ public void testTransformFilterSplitIterators() throws IllegalArgumentException, SecurityException, IOException {
+ XoRoShiRo128PlusRandomGenerator r = new XoRoShiRo128PlusRandomGenerator(0);
+ for (int size: new int[] { 5, 10, 100 })
+ for (double p: new double[] { .1, .3, .5, .9 }) {
+ ErdosRenyiGraph eg = new ErdosRenyiGraph(size, p, true);
+ File graphFile = BVGraphTest.storeTempGraph(eg);
+ ImmutableGraph graph;
+ graph = ImmutableGraph.load(graphFile.getAbsolutePath());
+ ImmutableGraph filteredArcs = Transform.filterArcs(graph, new ArcFilter() {
+ @Override
+ public boolean accept(int i, int j) {
+ return i % 3 == 1 && j % 5 > 3;
+ }
+ });
+ WebGraphTestCase.assertSplitIterator(filteredArcs, Math.max(1, r.nextInt(size)));
+ }
+ }
+
+ @Test
+ public void testImmutableSubgraphSplitIterators() throws IllegalArgumentException, SecurityException, IOException {
+ XoRoShiRo128PlusRandomGenerator r = new XoRoShiRo128PlusRandomGenerator(2);
+ for (int size: new int[] { 5, 10, 100 })
+ for (double p: new double[] { .1, .3, .5, .9 }) {
+ ErdosRenyiGraph eg = new ErdosRenyiGraph(size, p, true);
+ File graphFile = BVGraphTest.storeTempGraph(eg);
+ ImmutableGraph graph;
+ graph = ImmutableGraph.load(graphFile.getAbsolutePath());
+
+ IntSet nodeSet = new IntOpenHashSet();
+ for (int i = 0; i < size; i++) if (r.nextBoolean()) nodeSet.add(i);
+ int[] nodeArray = nodeSet.toIntArray();
+ Arrays.sort(nodeArray);
+ WebGraphTestCase.assertSplitIterator(new ImmutableSubgraph(graph, nodeArray), Math.max(1, r.nextInt(nodeArray.length)));
+ }
+ }
+
+ @Test
+ public void testUnionImmutableGraphSplitIterators() throws IllegalArgumentException, SecurityException, IOException {
+ XoRoShiRo128PlusRandomGenerator r = new XoRoShiRo128PlusRandomGenerator(0);
+ for (int size: new int[] { 5, 10, 100 })
+ for (double p: new double[] { .1, .3, .5, .9 }) {
+ ErdosRenyiGraph eg0 = new ErdosRenyiGraph(size, p, true);
+ File graphFile0 = BVGraphTest.storeTempGraph(eg0);
+ ImmutableGraph graph0;
+ graph0 = ImmutableGraph.load(graphFile0.getAbsolutePath());
+ ErdosRenyiGraph eg1 = new ErdosRenyiGraph(size, p, true);
+ File graphFile1 = BVGraphTest.storeTempGraph(eg1);
+ ImmutableGraph graph1;
+ graph1 = ImmutableGraph.load(graphFile1.getAbsolutePath());
+ WebGraphTestCase.assertSplitIterator(new UnionImmutableGraph(graph0, graph1), Math.max(1, r.nextInt(graph0.numNodes() + graph1.numNodes())));
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ImmutableSubgraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ImmutableSubgraphTest.java
new file mode 100644
index 0000000..feb4e43
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ImmutableSubgraphTest.java
@@ -0,0 +1,88 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.Random;
+
+import org.junit.Test;
+
+public class ImmutableSubgraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testSubgraphs() {
+ ImmutableGraph g, sg;
+ final long seed = System.currentTimeMillis();
+ System.err.println("Seed: " + seed);
+ final Random random = new Random(seed);
+
+ for(int n = 1; n < 10; n++) { // Graph construction parameter
+ g = ArrayListMutableGraph.newCompleteGraph(n, false).immutableView();
+ int[] randPerm = new int[n];
+ for(int i = n; i-- != 0;) randPerm[i] = i;
+ Collections.shuffle(IntArrayList.wrap(randPerm), random);
+
+ for(int s = 1; s <= n; s++) {
+ Arrays.sort(randPerm, 0, s);
+ int nodes[] = new int[s];
+ System.arraycopy(randPerm, 0, nodes, 0, s);
+ sg = new ImmutableSubgraph(g, nodes);
+ assertGraph(sg);
+ final ArrayListMutableGraph completeGraph = ArrayListMutableGraph.newCompleteGraph(s, false);
+ assertEquals(completeGraph.immutableView(), sg);
+ assertEquals(sg, ImmutableSubgraph.asImmutableSubgraph(completeGraph.immutableView()));
+ assertEquals(sg.hashCode(), completeGraph.hashCode());
+ }
+
+ g = ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView();
+ for(int s = 1; s <= n; s++) {
+ int[] nodes = new int[(1 << s) - 1];
+ for(int j = (1 << s) - 1; j-- != 0;) nodes[j] = j;
+ sg = new ImmutableSubgraph(g, nodes);
+ assertGraph(sg);
+ final ArrayListMutableGraph completeBinaryIntree = ArrayListMutableGraph.newCompleteBinaryIntree(s - 1);
+ final ImmutableGraph immutableView = completeBinaryIntree.immutableView();
+ assertEquals(immutableView, sg);
+ assertEquals(sg, ImmutableSubgraph.asImmutableSubgraph(immutableView));
+ assertEquals(sg.hashCode(), completeBinaryIntree.hashCode());
+ }
+
+ g = ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+ for(int s = 1; s <= n; s++) {
+ int[] nodes = new int[(1 << s) - 1];
+ for(int j = (1 << s) - 1; j-- != 0;) nodes[j] = j;
+ sg = new ImmutableSubgraph(g, nodes);
+ assertGraph(sg);
+
+ final ArrayListMutableGraph completeBinaryOuttree = ArrayListMutableGraph.newCompleteBinaryOuttree(s - 1);
+ final ImmutableGraph immutableView = completeBinaryOuttree.immutableView();
+ assertEquals(immutableView, sg);
+ assertEquals(sg, ImmutableSubgraph.asImmutableSubgraph(immutableView));
+ assertEquals(sg.hashCode(), completeBinaryOuttree.hashCode());
+ }
+
+ }
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/IncrementalImmutableSequentialGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/IncrementalImmutableSequentialGraphTest.java
new file mode 100644
index 0000000..ddcdfb9
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/IncrementalImmutableSequentialGraphTest.java
@@ -0,0 +1,63 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+
+import org.junit.Test;
+
+public class IncrementalImmutableSequentialGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testErdosRenyi() throws IOException, InterruptedException, ExecutionException {
+ final String basename = File.createTempFile(IncrementalImmutableSequentialGraph.class.getSimpleName() + "-", "-temp").toString();
+ for(int size: new int[] { 10, 100, 1000, 10000 }) {
+ final ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .001, 0, false)).immutableView();
+ final IncrementalImmutableSequentialGraph incrementalImmutableSequentialGraph = new IncrementalImmutableSequentialGraph();
+ final Future<Void> future = Executors.newSingleThreadExecutor().submit(new Callable<Void>() {
+ @Override
+ public Void call() throws IOException {
+ BVGraph.store(incrementalImmutableSequentialGraph, basename);
+ return null;
+ }
+ });
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ nodeIterator.nextInt();
+ incrementalImmutableSequentialGraph.add(nodeIterator.successorArray(), 0, nodeIterator.outdegree());
+ }
+
+ incrementalImmutableSequentialGraph.add(IncrementalImmutableSequentialGraph.END_OF_GRAPH);
+
+ future.get();
+ assertEquals(g, ImmutableGraph.load(basename));
+ }
+
+ deleteGraph(basename);
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/MaskedIntIteratorTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/MaskedIntIteratorTest.java
new file mode 100644
index 0000000..798f848
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/MaskedIntIteratorTest.java
@@ -0,0 +1,110 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.fastutil.ints.IntIterators;
+
+import java.util.Random;
+
+import org.junit.Test;
+
+public class MaskedIntIteratorTest {
+
+ public void test(final int length, final int numberOfZeroes) {
+ long seed = System.currentTimeMillis();
+ Random random = new Random(seed);
+ System.err.println("Seed: " + seed);
+ // Reads the length and number of 0s
+ final int x[] = new int[length];
+ boolean keep[] = new boolean[length];
+ IntArrayList res = new IntArrayList();
+ IntArrayList blocks = new IntArrayList();
+ int i, j, p = 0;
+ boolean dep;
+
+ // Generate
+ for (i = 0; i < length; i++) p = x[i] = p + random.nextInt(1000);
+ for (i = 0; i < length-numberOfZeroes; i++) keep[i] = true;
+ for (i = 0; i < length; i++) {
+ j = i + (int)(Math.random() * (length - i));
+ dep = keep[i]; keep[i] = keep[j]; keep[j] = dep;
+ }
+
+ // Compute result
+ for (i = 0; i < length; i++) if (keep[i]) res.add(x[i]);
+ res.trim();
+ int result[] = res.elements();
+
+ // Prepare blocks
+ boolean lookAt = true;
+ int curr = 0;
+ for (i = 0; i < length; i++) {
+ if (keep[i] == lookAt) curr++;
+ else {
+ blocks.add(curr);
+ lookAt = !lookAt;
+ curr = 1;
+ }
+ }
+ blocks.trim();
+ final int bs[] = blocks.elements();
+
+ // Output
+ System.out.println("GENERATED:");
+ for (i = 0; i < length; i++) {
+ if (keep[i]) System.out.print('*');
+ System.out.print(x[i] + " ");
+ }
+ System.out.println("\nBLOCKS:");
+ for (i = 0; i < bs.length; i++)
+ System.out.print(bs[i] + " ");
+ System.out.println("\nEXPECTED RESULT:");
+ for (i = 0; i < result.length; i++)
+ System.out.print(result[i] + " ");
+ System.out.println();
+
+ LazyIntIterator maskedIterator = new MaskedIntIterator(bs, LazyIntIterators.lazy(new IntArrayList(x).iterator()));
+
+ for (i = 0; i < result.length; i++) assertEquals(i + ": ", result[i], maskedIterator.nextInt());
+ assertEquals(-1, maskedIterator.nextInt());
+
+ // Test skips
+ maskedIterator = new MaskedIntIterator(bs, LazyIntIterators.lazy(new IntArrayList(x).iterator()));
+ IntIterator results = IntIterators.wrap(result);
+
+ for (i = 0; i < result.length; i++) {
+ int toSkip = random.nextInt(5);
+ assertEquals(results.skip(toSkip), maskedIterator.skip(toSkip));
+ if (results.hasNext()) assertEquals(i + ": ", results.nextInt(), maskedIterator.nextInt());
+ }
+ assertEquals(-1, maskedIterator.nextInt());
+
+ }
+
+ @Test
+ public void test() {
+ for(int i = 0; i < 20; i++)
+ for(int j = 0; j < 20; j++)
+ test(i, j);
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/MergedIntIteratorTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/MergedIntIteratorTest.java
new file mode 100644
index 0000000..c5fc5ac
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/MergedIntIteratorTest.java
@@ -0,0 +1,63 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.ints.IntAVLTreeSet;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+import java.util.Random;
+
+import org.junit.Test;
+
+public class MergedIntIteratorTest {
+
+ public void testMerge(int n0, int n1) {
+ Random r = new Random();
+ int x0[] = new int[n0];
+ int x1[] = new int[n1];
+ int i, p = 0;
+
+ // Generate
+ for (i = 0; i < n0; i++) p = x0[i] = p + r.nextInt(10);
+ p = 0;
+ for (i = 0; i < n1; i++) p = x1[i] = p + (int)(Math.random() * 10);
+
+ IntAVLTreeSet s0 = new IntAVLTreeSet(x0);
+ IntAVLTreeSet s1 = new IntAVLTreeSet(x1);
+ IntAVLTreeSet res = new IntAVLTreeSet(s0);
+ res.addAll(s1);
+
+ MergedIntIterator m = new MergedIntIterator(LazyIntIterators.lazy(s0.iterator()), LazyIntIterators.lazy(s1.iterator()));
+ IntIterator it = res.iterator();
+
+ int x;
+ while ((x = m.nextInt()) != -1) assertEquals(it.nextInt(), x);
+ assertEquals(Boolean.valueOf(it.hasNext()), Boolean.valueOf(m.nextInt() != -1));
+ }
+
+ @Test
+ public void testMerge() {
+ for(int i = 0; i < 10; i++) {
+ testMerge(i, i);
+ testMerge(i, i + 1);
+ testMerge(i, i * 2);
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ScatteredArcsASCIIGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ScatteredArcsASCIIGraphTest.java
new file mode 100644
index 0000000..8012f26
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ScatteredArcsASCIIGraphTest.java
@@ -0,0 +1,210 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.objects.Object2LongArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2LongFunction;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Iterator;
+import java.util.List;
+
+import org.junit.Test;
+
+public class ScatteredArcsASCIIGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testConstructor() throws UnsupportedEncodingException, IOException {
+
+ ScatteredArcsASCIIGraph g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("-1 15\n15 2\n2 -1\nOOPS!\n-1 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2},{1,2},{2,0}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { -1, 15, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 0\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 0, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("1 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n# comment\n2\n2 1\n2 X".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 \t 2\n2 0\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 0\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 0, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("1 2".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n# comment\n2\n2 1\n2 X".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 0\n0 1\n0 2\n2 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")), true, true, 2);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ }
+
+
+ @Test
+ public void testConstructorWithStrings() throws UnsupportedEncodingException, IOException {
+ Object2LongFunction<String> map = new Object2LongArrayMap<>();
+ map.defaultReturnValue(-1);
+
+ map.clear();
+ map.put("0", 0);
+ map.put("1", 1);
+ map.put("2", 2);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")), map, null, 3));
+
+ map.clear();
+ map.put("-1", 1);
+ map.put("15", 0);
+ map.put("2", 2);
+ final ImmutableGraph g = new ArrayListMutableGraph(3, new int[][] {{0,2},{1,0},{1,2},{2,1}}).immutableView();
+ assertEquals(g, new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("-1 15\n15 2\n2 -1\nOOPS!\n-1 2".getBytes("ASCII")), map, null, 3));
+ assertEquals(g, new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("-1 15\n15 2\n2 -1\nOOPS!\n-1 2\n32 2\n2 32".getBytes("ASCII")), map, null, 3));
+
+ map.clear();
+ map.put("topo", 0);
+ map.put("cane", 1);
+ map.put("topocane", 2);
+ assertEquals(g, new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("topocane cane\ncane topo\ncane topocane\ntopo topocane\n".getBytes("ASCII")), map, null, 3));
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testTargetOutOfRange() throws UnsupportedEncodingException, IOException {
+ Object2LongFunction<String> map = new Object2LongArrayMap<>();
+ map.defaultReturnValue(-1);
+ map.put("0", 0);
+ map.put("1", 1);
+ map.put("2", 2);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2".getBytes("ASCII")), map, null, 2));
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testSourceOutOfRange() throws UnsupportedEncodingException, IOException {
+ Object2LongFunction<String> map = new Object2LongArrayMap<>();
+ map.defaultReturnValue(-1);
+ map.put("0", 0);
+ map.put("1", 1);
+ map.put("2", 2);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n2 0".getBytes("ASCII")), map, null, 2));
+ }
+
+ private static Iterator<long[]> toIterator(final String s) {
+ String[] arcs = s.split("\n");
+ List<long[]> arcSet = new ArrayList<>();
+ for (String arc: arcs) {
+ String[] parts = arc.split(" ");
+ arcSet.add(new long[] { Long.parseLong(parts[0]), Long.parseLong(parts[1]) });
+ }
+ return arcSet.iterator();
+ }
+
+ @Test
+ public void testConstructorWithArray() throws IOException {
+ ScatteredArcsASCIIGraph g = new ScatteredArcsASCIIGraph(toIterator("0 1\n0 2\n1 0\n1 2\n2 0\n2 1"), false, false, 100, null, null);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ System.out.println(Arrays.toString(g.ids));
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("-1 15\n15 2\n2 -1\n-1 2"), false, false, 100, null, null);
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2},{1,2},{2,0}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { -1, 15, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("2 0\n2 1"), false, false, 100, null, null);
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 0, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("1 2"), false, false, 100, null, null);
+ assertEquals(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("2 1"), false, false, 100, null, null);
+ assertEquals(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("0 1\n2 1"), false, false, 100, null, null);
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("0 1\n0 2\n1 0\n1 2\n2 0\n2 1"), true, false, 1, null, null);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("2 0\n2 1"), true, false, 1, null, null);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2}}).immutableView()), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 2, 0, 1 }, g.ids);
+
+ g = new ScatteredArcsASCIIGraph(toIterator("0 0\n0 1\n0 2\n2 2\n1 0\n1 2\n2 0\n2 1"), true, true, 2, null, null);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ assertArrayEquals(new long[] { 0, 1, 2 }, g.ids);
+
+
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ShiftedByOneArcListASCIIGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ShiftedByOneArcListASCIIGraphTest.java
new file mode 100644
index 0000000..54fd756
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/ShiftedByOneArcListASCIIGraphTest.java
@@ -0,0 +1,103 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.nio.charset.StandardCharsets;
+
+import org.apache.commons.io.FileUtils;
+import org.junit.Test;
+
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+
+public class ShiftedByOneArcListASCIIGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testLoadOnce() throws UnsupportedEncodingException, IOException {
+
+ ArcListASCIIGraph g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("1 3\n1 2\n2 1\n2 3\n3 1\n3 2".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("3 1\n3 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("2 3".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("3 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("1 2\n3 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ }
+
+ @Test
+ public void testLoad() throws UnsupportedEncodingException, IOException {
+ File file = File.createTempFile(ShiftedByOneArcListASCIIGraphTest.class.getSimpleName(), ".txt");
+ file.deleteOnExit();
+ FileUtils.writeStringToFile(file, "1 3\n1 2\n2 1\n2 3\n3 1\n3 2", StandardCharsets.US_ASCII);
+ ImmutableGraph g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 1\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "2 3", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "1 2\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ }
+
+ @Test
+ public void testLoadMapped() throws IOException {
+ File file = File.createTempFile(ShiftedByOneArcListASCIIGraphTest.class.getSimpleName(), ".txt");
+ file.deleteOnExit();
+ FileUtils.writeStringToFile(file, "1 3\n1 2\n2 1\n2 3\n3 1\n3 2", StandardCharsets.US_ASCII);
+ ImmutableGraph g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 1\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "2 3", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+
+ FileUtils.writeStringToFile(file, "1 2\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(g).immutableView());
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/TransformTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/TransformTest.java
new file mode 100644
index 0000000..2fd1e3e
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/TransformTest.java
@@ -0,0 +1,647 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+import it.unimi.dsi.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.webgraph.labelling.BitStreamArcLabelledGraphTest;
+import it.unimi.dsi.webgraph.labelling.GammaCodedIntLabel;
+import it.unimi.dsi.webgraph.labelling.Label;
+import it.unimi.dsi.webgraph.labelling.LabelSemiring;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+
+import org.junit.Test;
+
+public class TransformTest extends WebGraphTestCase {
+
+ public static File storeTempGraph(final ImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(TransformTest.class.getSimpleName(), "-test");
+ BVGraph.store(g, basename.toString());
+ return basename;
+ }
+
+ @Test
+ public void testMapExpand() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ArrayListMutableGraph.newCompleteGraph(4, false).immutableView();
+ g2 = Transform.map(g, new int[] { 0, 2, 4, 6 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(7, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(int i, int j) {
+ return i % 2 == 0 && j % 2 == 0 && i != j;
+ }
+
+ }).immutableView(), g2);
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { 0, 3, 3 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(4, new int[][] { { 0, 3 }, { 3, 0 }, { 3, 3 } }).immutableView(), g2);
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { 4, 4, 4 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(5, new int[][] { { 4, 4 } }).immutableView(), g2);
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { 6, 5, 4 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(7, new int[][] { { 6, 5 }, { 5, 4 }, { 4, 6 } }).immutableView(), g2);
+
+ }
+
+ @Test
+ public void testMapPermutation() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { 2, 1, 0 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(3, new int[][] { { 0, 2 }, { 2, 1 }, { 1, 0 } }).immutableView(), g2);
+ }
+
+ @Test
+ public void testInjective() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 }, { 0, 2 } }).immutableView();
+ g2 = Transform.map(g, new int[] { 2, -1, 0 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(3, new int[][] { { 2, 0 } }).immutableView(), g2);
+ }
+
+ @Test
+ public void testMapCollapse() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { 0, 0, 0 });
+ assertGraph(g2);
+ assertEquals(1, g2.numNodes());
+ }
+
+ @Test
+ public void testMapClear() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { -1, -1, -1 });
+ assertGraph(g2);
+ assertEquals(0, g2.numNodes());
+ }
+
+ @Test
+ public void testMapKeepMiddle() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { -1, 0, -1 });
+ assertGraph(g2);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(1, false).immutableView(), g2);
+
+ g = ArrayListMutableGraph.newDirectedCycle(3).immutableView();
+ g2 = Transform.map(g, new int[] { -1, 2, -1 });
+ assertGraph(g2);
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {}).immutableView(), g2);
+ }
+
+ /** Test introduced after finding a bug in remapping: accessing two successors in parallel gave wrong results. */
+ @Test
+ public void testMapTwoSuccessors() {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = new ArrayListMutableGraph(5, new int[][] {{0,1},{0,2},{0,4},{1,2},{1,3},{1,4},{4,2}}).immutableView();
+ // Now we test in parallel (on g) the successor lists of nodes 0 (1,2,4) and 1 (2,3,4)
+ int[] expected0 = { 1, 2, 4 };
+ int[] expected1 = { 2, 3, 4 };
+ LazyIntIterator it0 = g.successors(0);
+ LazyIntIterator it1 = g.successors(1);
+ for (int i = 0; i < 3; i++) {
+ assertEquals(expected0[i], it0.nextInt());
+ assertEquals(expected1[i], it1.nextInt());
+ }
+ assertEquals(-1, it0.nextInt());
+ assertEquals(-1, it1.nextInt());
+ g2 = Transform.map(g, new int[] { 0, 1, 2, 3, 4 }); // The permutation is immaterial: we use the identity
+ assertGraph(g2);
+ // Now we test in parallel (on g2) the successor lists of nodes 0 (1,2,4) and 1 (2,3,4)
+ it0 = g2.successors(0);
+ it1 = g2.successors(1);
+ for (int i = 0; i < 3; i++) {
+ assertEquals(expected0[i], it0.nextInt());
+ assertEquals(expected1[i], it1.nextInt());
+ }
+ assertEquals(-1, it0.nextInt());
+ assertEquals(-1, it1.nextInt());
+ }
+
+
+ @Test
+ public void testLex() {
+ ImmutableGraph g = new ArrayListMutableGraph(3, new int[][] {{ 0, 2 }, { 1, 1 }, { 1, 2 }, { 2, 0 }, { 2, 1 }, { 2, 2 } }).immutableView();
+ int p[] = Transform.lexicographicalPermutation(g);
+ assertArrayEquals(new int[] { 0, 1, 2 }, p);
+
+ g = new ArrayListMutableGraph(3, new int[][] {{ 0, 0 }, { 0, 1 }, { 0, 2 }, { 1, 1 }, { 1, 2 }, { 2, 2 } }).immutableView();
+ p = Transform.lexicographicalPermutation(g);
+ assertArrayEquals(new int[] { 2, 1, 0 }, p);
+ }
+
+
+ @Test
+ public void testFilters() throws IllegalArgumentException, SecurityException {
+ ImmutableGraph graph = new ArrayListMutableGraph(6,
+ new int[][] {
+ { 0, 1 },
+ { 0, 2 },
+ { 1, 1 },
+ { 1, 3 },
+ { 2, 1 },
+ { 4, 5 },
+ }
+ ).immutableView();
+
+ ImmutableGraph filtered = Transform.filterArcs(graph, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(int i, int j) {
+ return i < j;
+ }
+ }, null);
+
+ assertGraph(filtered);
+
+ NodeIterator nodeIterator = filtered.nodeIterator();
+ LazyIntIterator iterator;
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(0, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(1, iterator.nextInt());
+ assertEquals(2, iterator.nextInt());
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(1, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(3, iterator.nextInt());
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(2, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(3, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextInt());
+ assertEquals(4, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(5, iterator.nextInt());
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(5, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextInt());
+ assertFalse(nodeIterator.hasNext());
+ }
+
+
+ @Test
+ public void testLabelledFilters() throws IllegalArgumentException, SecurityException, IOException {
+ IntegerTriplesArcLabelledImmutableGraph graph = new IntegerTriplesArcLabelledImmutableGraph(
+ new int[][] {
+ { 0, 1, 2 },
+ { 0, 2, 3 },
+ { 1, 1, 4 },
+ { 1, 3, 5 },
+ { 2, 1, 6 },
+ { 4, 5, 7 },
+ }
+ );
+
+ ArcLabelledImmutableGraph filtered = Transform.filterArcs(graph, new Transform.LabelledArcFilter() {
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return i < j;
+ }
+ }, null);
+
+ ArcLabelledNodeIterator nodeIterator = filtered.nodeIterator();
+ LabelledArcIterator iterator;
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(0, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(1, iterator.nextInt());
+ assertEquals(2, iterator.label().getInt());
+ assertEquals(2, iterator.nextInt());
+ assertEquals(3, iterator.label().getInt());
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(1, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(3, iterator.nextInt());
+ assertEquals(5, iterator.label().getInt());
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(2, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(3, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextInt());
+ assertEquals(4, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(5, iterator.nextInt());
+ assertEquals(7, iterator.label().getInt());
+ assertEquals(-1, iterator.nextInt());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(5, nodeIterator.nextInt());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextInt());
+ assertFalse(nodeIterator.hasNext());
+
+ File file = BitStreamArcLabelledGraphTest.storeTempGraph(graph);
+ ArcLabelledImmutableGraph graph2 = ArcLabelledImmutableGraph.load(file.toString());
+
+ filtered = Transform.filterArcs(graph2, new Transform.LabelledArcFilter() {
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return i < j;
+ }
+ }, null);
+
+ iterator = filtered.successors(0);
+ assertEquals(1, iterator.nextInt());
+ assertEquals(2, iterator.label().getInt());
+ assertEquals(2, iterator.nextInt());
+ assertEquals(3, iterator.label().getInt());
+ assertEquals(-1, iterator.nextInt());
+ iterator = filtered.successors(1);
+ assertEquals(3, iterator.nextInt());
+ assertEquals(5, iterator.label().getInt());
+ assertEquals(-1, iterator.nextInt());
+ iterator = filtered.successors(2);
+ assertEquals(-1, iterator.nextInt());
+ iterator = filtered.successors(3);
+ assertEquals(-1, iterator.nextInt());
+ iterator = filtered.successors(4);
+ assertEquals(5, iterator.nextInt());
+ assertEquals(7, iterator.label().getInt());
+ assertEquals(-1, iterator.nextInt());
+ iterator = filtered.successors(5);
+ assertEquals(-1, iterator.nextInt());
+
+ }
+
+ @Test
+ public void testCompose() {
+ ImmutableGraph g0 = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 0, 2 } }).immutableView();
+ ImmutableGraph g1 = new ArrayListMutableGraph(3, new int[][] { { 1, 0 }, { 2, 1 } }).immutableView();
+
+ ImmutableGraph c = Transform.compose(g0, g1);
+
+ NodeIterator n = c.nodeIterator();
+ assertTrue(n.hasNext());
+ assertEquals(0, n.nextInt());
+ LazyIntIterator i = n.successors();
+ assertEquals(0, i.nextInt());
+ assertEquals(1, i.nextInt());
+ assertEquals(-1, i.nextInt());
+ assertEquals(1, n.nextInt());
+ i = n.successors();
+ assertEquals(-1, i.nextInt());
+ assertTrue(n.hasNext());
+ assertEquals(2, n.nextInt());
+ i = n.successors();
+ assertEquals(-1, i.nextInt());
+ assertFalse(n.hasNext());
+
+ assertEquals(c, c.copy());
+ assertEquals(c.copy(), c);
+
+ assertGraph(c);
+ }
+
+ @Test
+ public void testLabelledCompose() throws IllegalArgumentException, SecurityException, IOException {
+ IntegerTriplesArcLabelledImmutableGraph integerTriplesArcLabelledImmutableGraph = new IntegerTriplesArcLabelledImmutableGraph(
+ new int[][] {
+ { 0, 1, 2 },
+ { 0, 2, 10 },
+ { 0, 3, 1 },
+ { 1, 2, 4 },
+ { 3, 2, 1 },
+ }
+ );
+ File file = BitStreamArcLabelledGraphTest.storeTempGraph(integerTriplesArcLabelledImmutableGraph);
+ ArcLabelledImmutableGraph graph = ArcLabelledImmutableGraph.load(file.toString());
+
+ ArcLabelledImmutableGraph composed = Transform.compose(graph, graph, new LabelSemiring() {
+ private final GammaCodedIntLabel one = new GammaCodedIntLabel("FOO");
+ private final GammaCodedIntLabel zero = new GammaCodedIntLabel("FOO");
+ {
+ one.value = 0;
+ zero.value = Integer.MAX_VALUE;
+ }
+
+ @Override
+ public Label add(Label first, Label second) {
+ GammaCodedIntLabel result = new GammaCodedIntLabel("FOO");
+ result.value = Math.min(first.getInt(), second.getInt());
+ return result;
+ }
+
+ @Override
+ public Label multiply(Label first, Label second) {
+ GammaCodedIntLabel result = new GammaCodedIntLabel("FOO");
+ result.value = first.getInt() + second.getInt();
+ return result;
+ }
+
+ @Override
+ public Label one() {
+ return one;
+ }
+
+ @Override
+ public Label zero() {
+ return zero;
+ }
+ });
+
+ ArcLabelledNodeIterator n = composed.nodeIterator();
+ assertTrue(n.hasNext());
+ assertEquals(0, n.nextInt());
+ LabelledArcIterator i = n.successors();
+ assertEquals(2, i.nextInt());
+ assertEquals(2, i.label().getInt());
+ assertEquals(-1, i.nextInt());
+ assertEquals(1, n.nextInt());
+ i = n.successors();
+ assertEquals(-1, i.nextInt());
+ assertTrue(n.hasNext());
+ assertEquals(2, n.nextInt());
+ i = n.successors();
+ assertEquals(-1, i.nextInt());
+ assertTrue(n.hasNext());
+ assertEquals(3, n.nextInt());
+ i = n.successors();
+ assertEquals(-1, i.nextInt());
+ assertFalse(n.hasNext());
+ assertGraph(composed);
+
+ ArcLabelledImmutableGraph composed2 = Transform.compose(integerTriplesArcLabelledImmutableGraph, integerTriplesArcLabelledImmutableGraph, new LabelSemiring() {
+ private final GammaCodedIntLabel one = new GammaCodedIntLabel("FOO");
+ private final GammaCodedIntLabel zero = new GammaCodedIntLabel("FOO");
+ {
+ one.value = 0;
+ zero.value = Integer.MAX_VALUE;
+ }
+
+ @Override
+ public Label add(Label first, Label second) {
+ GammaCodedIntLabel result = new GammaCodedIntLabel("FOO");
+ result.value = Math.min(first.getInt(), second.getInt());
+ return result;
+ }
+
+ @Override
+ public Label multiply(Label first, Label second) {
+ GammaCodedIntLabel result = new GammaCodedIntLabel("FOO");
+ result.value = first.getInt() + second.getInt();
+ return result;
+ }
+
+ @Override
+ public Label one() {
+ return one;
+ }
+
+ @Override
+ public Label zero() {
+ return zero;
+ }
+ });
+ assertGraph(composed2);
+
+ }
+
+ @Test
+ public void testTranspose() throws IOException {
+ ImmutableGraph g = new ErdosRenyiGraph(5, .5, 0, false);
+ ImmutableGraph gt = Transform.transpose(g);
+ assertEquals(gt, Transform.transposeOffline(g, 5));
+ assertEquals(g, Transform.transposeOffline(Transform.transposeOffline(g, 100), 5));
+
+ g = new ErdosRenyiGraph(100, .50, 0, false);
+ gt = Transform.transpose(g);
+ assertEquals(gt, Transform.transposeOffline(g, 100));
+ assertEquals(g, Transform.transposeOffline(Transform.transposeOffline(g, 100), 100));
+
+ g = new ErdosRenyiGraph(1000, .20, 0, false);
+ gt = Transform.transpose(g);
+ assertEquals(gt, Transform.transposeOffline(g, 10000));
+ assertEquals(g, Transform.transposeOffline(Transform.transposeOffline(g, 10000), 10000));
+ }
+
+ @Test
+ public void testLabelledTranspose() throws IllegalArgumentException, SecurityException, IOException {
+ IntegerTriplesArcLabelledImmutableGraph integerTriplesArcLabelledImmutableGraph = new IntegerTriplesArcLabelledImmutableGraph(
+ new int[][] {
+ { 0, 1, 2 },
+ { 0, 2, 10 },
+ { 0, 3, 1 },
+ { 1, 2, 4 },
+ { 3, 2, 1 },
+ }
+ );
+ File file = BitStreamArcLabelledGraphTest.storeTempGraph(integerTriplesArcLabelledImmutableGraph);
+ ArcLabelledImmutableGraph graph = ArcLabelledImmutableGraph.load(file.toString());
+
+ assertEquals(graph, Transform.transposeOffline(Transform.transposeOffline(graph, 2), 2));
+ assertGraph(Transform.transposeOffline(graph, 2));
+ assertEquals(graph, Transform.transposeOffline(Transform.transposeOffline(integerTriplesArcLabelledImmutableGraph, 2), 2));
+ assertGraph(Transform.transposeOffline(integerTriplesArcLabelledImmutableGraph, 2));
+ }
+
+
+ @Test
+ public void testMapOffline() throws IOException {
+ ImmutableGraph g = new ErdosRenyiGraph(10, .50, 0, false);
+ int[] perm = Util.identity(g.numNodes());
+ Collections.shuffle(IntArrayList.wrap(perm), new XoRoShiRo128PlusRandom(0));
+ int[] inv = Util.invertPermutation(perm);
+ ImmutableGraph gm = Transform.map(new ArrayListMutableGraph(g).immutableView(), perm);
+ assertEquals(gm, Transform.mapOffline(g, perm, 100));
+ assertEquals(g, Transform.mapOffline(Transform.mapOffline(g, perm, 100), inv, 100));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity(g.numNodes());
+ perm[perm.length -1] = -1;
+ gm = Transform.map(new ArrayListMutableGraph(g).immutableView(), perm);
+ assertEquals(gm, Transform.mapOffline(g, perm, 100));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity(g.numNodes());
+ Collections.shuffle(IntArrayList.wrap(perm), new XoRoShiRo128PlusRandom(0));
+ perm[0] = -1; perm[perm.length / 2] = -1;
+ gm = Transform.map(new ArrayListMutableGraph(g).immutableView(), perm);
+ assertEquals(gm, Transform.mapOffline(g, perm, 100));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity(g.numNodes());
+ perm[1] = 0; perm[perm.length - 2] = perm.length - 1;
+ gm = Transform.map(new ArrayListMutableGraph(g).immutableView(), perm);
+ assertEquals(gm, Transform.mapOffline(g, perm, 100));
+ assertEquals(gm, gm.copy());
+
+ g = new ErdosRenyiGraph(1000, .20, 0, false);
+ perm = Util.identity(g.numNodes());
+ Collections.shuffle(IntArrayList.wrap(perm), new XoRoShiRo128PlusRandom(0));
+ inv = Util.invertPermutation(perm);
+ gm = Transform.map(new ArrayListMutableGraph(g).immutableView(), perm);
+ assertEquals(gm, Transform.mapOffline(g, perm, 10000));
+ assertEquals(g, Transform.mapOffline(Transform.mapOffline(g, perm, 10000), inv, 10000));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity(g.numNodes());
+ Collections.shuffle(IntArrayList.wrap(perm), new XoRoShiRo128PlusRandom(0));
+ perm[0] = -1; perm[perm.length / 2] = -1; perm[perm.length / 4] = -1; perm[3 * perm.length / 4] = -1;
+ gm = Transform.map(new ArrayListMutableGraph(g).immutableView(), perm);
+ assertEquals(gm, Transform.mapOffline(g, perm, 10000));
+ assertEquals(gm, gm.copy());
+ }
+
+ @Test
+ public void testSymmetrize() throws IOException {
+ ImmutableGraph g = new ErdosRenyiGraph(5, .50, 0, false);
+ ImmutableGraph gs = Transform.symmetrize(g);
+ assertEquals(gs, Transform.symmetrizeOffline(g, 5));
+ assertEquals(gs, Transform.symmetrizeOffline(Transform.symmetrizeOffline(g, 100), 5));
+ assertEquals(gs, gs.copy());
+
+ g = new ErdosRenyiGraph(100, .50, 0, false);
+ gs = Transform.symmetrize(g);
+ assertEquals(gs, Transform.symmetrizeOffline(g, 100));
+ assertEquals(gs, Transform.symmetrizeOffline(Transform.symmetrizeOffline(g, 100), 100));
+ assertEquals(gs, gs.copy());
+
+ g = new ErdosRenyiGraph(1000, .20, 0, false);
+ gs = Transform.symmetrize(g);
+ assertEquals(gs, Transform.symmetrizeOffline(g, 10000));
+ assertEquals(gs, Transform.symmetrizeOffline(Transform.symmetrizeOffline(g, 10000), 10000));
+ assertEquals(gs, gs.copy());
+ }
+
+
+ @Test
+ public void testBatchGraphSplit() throws IOException {
+ ImmutableGraph g;
+ File tempGraph;
+ g = new ErdosRenyiGraph(5, .5, 0, false);
+ tempGraph = storeTempGraph(Transform.transposeOffline(g, 5));
+ assertEquals(Transform.transpose(g), ImmutableGraph.load(tempGraph.toString()));
+ deleteGraph(tempGraph);
+
+ g = new ErdosRenyiGraph(100, .5, 0, false);
+ tempGraph = storeTempGraph(Transform.transposeOffline(g, 100));
+ assertEquals(Transform.transpose(g), ImmutableGraph.load(tempGraph.toString()));
+ deleteGraph(tempGraph);
+
+ g = new ErdosRenyiGraph(1000, .20, 0, false);
+ tempGraph = storeTempGraph(Transform.transposeOffline(g, 10000));
+ assertEquals(Transform.transpose(g), ImmutableGraph.load(tempGraph.toString()));
+ deleteGraph(tempGraph);
+ }
+
+ @Test
+ public void testFilteredGraphSplit() throws IOException {
+ ImmutableGraph g;
+ File tempGraph;
+ g = new ArrayListMutableGraph(new ErdosRenyiGraph(5, .5, 0, false)).immutableView();
+ tempGraph = storeTempGraph(Transform.filterArcs(g, Transform.NO_LOOPS));
+ assertEquals(Transform.filterArcs(g, Transform.NO_LOOPS), ImmutableGraph.load(tempGraph.toString()));
+ deleteGraph(tempGraph);
+
+ g = new ArrayListMutableGraph(new ErdosRenyiGraph(100, .5, 0, false)).immutableView();
+ tempGraph = storeTempGraph(Transform.filterArcs(g, Transform.NO_LOOPS));
+ assertEquals(Transform.filterArcs(g, Transform.NO_LOOPS), ImmutableGraph.load(tempGraph.toString()));
+ deleteGraph(tempGraph);
+
+ g = new ArrayListMutableGraph(new ErdosRenyiGraph(1000, .20, 0, false)).immutableView();
+ tempGraph = storeTempGraph(Transform.filterArcs(g, Transform.NO_LOOPS));
+ assertEquals(Transform.filterArcs(g, Transform.NO_LOOPS), ImmutableGraph.load(tempGraph.toString()));
+ deleteGraph(tempGraph);
+ }
+
+ @Test
+ public void testArcLabelledFilteredGraphSplit() throws IOException {
+ // Graph (x,Math.min(x+1,99),1 mod 4),...(x,Math.min(x+k,99),k mod 4) with x=0..99 and k=5
+ ArrayList<int[]> arcs = new ArrayList<>();
+ for (int x = 0; x < 100; x++)
+ for (int i = 1; i <= Math.min(5, 99 - x); i++)
+ arcs.add(new int[] { x, x + i, i % 4 });
+ arcs.add(new int[] { 99, 98, 1 }); // Needed to avoid cutting the graph short
+ ArrayList<int[]> arcsFiltered = new ArrayList<>();
+ for (int x = 0; x < 100; x++)
+ for (int i = 1; i <= Math.min(5, 99 - x); i++)
+ if (i % 4 != 3 && x % 5 == 4 && (x + i) % 3 == 2)
+ arcsFiltered.add(new int[] { x, x + i, i % 4 });
+ arcsFiltered.add(new int[] { 99, 98, 1 }); // Needed to avoid cutting the graph short
+ int[][] result = new int[arcs.size()][];
+ arcs.toArray(result);
+ int[][] resultFiltered = new int[arcsFiltered.size()][];
+ arcsFiltered.toArray(resultFiltered);
+ File file = BitStreamArcLabelledGraphTest.storeTempGraph(new IntegerTriplesArcLabelledImmutableGraph(result));
+ ArcLabelledImmutableGraph graph = ArcLabelledImmutableGraph.load(file.toString());
+ File filteredFile = BitStreamArcLabelledGraphTest.storeTempGraph(new IntegerTriplesArcLabelledImmutableGraph(resultFiltered));
+ ArcLabelledImmutableGraph filteredGraph = ArcLabelledImmutableGraph.load(filteredFile.toString());
+ ArcLabelledImmutableGraph transformed = Transform.filterArcs(graph, new LabelledArcFilter() {
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return label.getInt() % 4 != 3 && i % 5 == 4 && j % 3 == 2;
+ }
+ });
+ assertEquals(filteredGraph, transformed);
+ assertGraph(transformed);
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/WebGraphTestCase.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/WebGraphTestCase.java
new file mode 100644
index 0000000..3307acb
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/WebGraphTestCase.java
@@ -0,0 +1,247 @@
+package it.unimi.dsi.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+import it.unimi.dsi.fastutil.ints.IntSet;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.webgraph.labelling.Label;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+/** A base test class providing additional assertions
+ * for {@linkplain it.unimi.dsi.webgraph.ImmutableGraph immutable graphs}.
+ */
+
+public abstract class WebGraphTestCase {
+
+ private static void copy(InputStream in, OutputStream out) throws IOException {
+ int c;
+ while((c = in.read()) != -1) out.write(c);
+ out.close();
+ }
+
+ /** Returns a path to a temporary graph that copies a resource graph with given basename.
+ *
+ * @param basename the basename.
+ * @return the graph.
+ * @throws IOException
+ */
+ public String getGraphPath(final String basename) throws IOException {
+ File file = File.createTempFile(getClass().getSimpleName(), "graph");
+ file.delete();
+
+ copy(BVGraphTest.class.getResourceAsStream(basename + BVGraph.GRAPH_EXTENSION), new FileOutputStream(file.getCanonicalPath() + BVGraph.GRAPH_EXTENSION));
+ copy(BVGraphTest.class.getResourceAsStream(basename + BVGraph.OFFSETS_EXTENSION), new FileOutputStream(file.getCanonicalPath() + BVGraph.OFFSETS_EXTENSION));
+ copy(BVGraphTest.class.getResourceAsStream(basename + BVGraph.PROPERTIES_EXTENSION), new FileOutputStream(file.getCanonicalPath() + BVGraph.PROPERTIES_EXTENSION));
+
+ return file.getCanonicalPath();
+ }
+
+ public static void assertSplitIterator(final ImmutableGraph g, final int howMany) {
+ int n = g.numNodes();
+
+ // Get successors
+ IntSet[] successors = new IntSet[n];
+ boolean[] read = new boolean[n];
+ int howManyRead = 0;
+ NodeIterator iterator = g.nodeIterator();
+ while (iterator.hasNext()) {
+ int x = iterator.nextInt();
+ successors[x] = new IntOpenHashSet();
+ LazyIntIterator succ = iterator.successors();
+ int y;
+ while ((y = succ.nextInt()) != -1) successors[x].add(y);
+ }
+
+ // Get a split
+ NodeIterator[] splitNodeIterators = g.splitNodeIterators(howMany);
+ if (ImmutableGraphTest.DEBUG) System.out.println("Iteration started");
+ for (NodeIterator it: splitNodeIterators) {
+ if (ImmutableGraphTest.DEBUG) System.out.println("One iterator");
+ if (it == null) break;
+ if (ImmutableGraphTest.DEBUG) System.out.println("Non-void iterator");
+ while (it.hasNext()) {
+ int x = it.nextInt();
+ if (ImmutableGraphTest.DEBUG) System.out.println("Restituisce " + x);
+ assertFalse("Node " + x + " already returned", read[x]);
+ IntSet returned = new IntOpenHashSet();
+ int y;
+ LazyIntIterator succ = it.successors();
+ while ((y = succ.nextInt()) != -1) returned.add(y);
+ assertEquals("Successors of node " + x, successors[x], returned);
+ read[x] = true;
+ howManyRead++;
+ }
+ }
+ assertEquals(n, howManyRead);
+ }
+
+ /** Cleans up a temporary graph.
+ *
+ * @param basename the basename.
+ */
+
+ public static void deleteGraph(final String basename) {
+ deleteGraph(new File(basename));
+ }
+
+
+ /** Cleans up a temporary graph.
+ *
+ * @param basename the basename.
+ */
+ public static void deleteGraph(final File basename) {
+ new File(basename + BVGraph.GRAPH_EXTENSION).delete();
+ new File(basename + BVGraph.OFFSETS_EXTENSION).delete();
+ new File(basename + BVGraph.OFFSETS_BIG_LIST_EXTENSION).delete();
+ new File(basename + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ }
+
+ /** Performs a stress-test of an immutable graph. All available methods
+ * for accessing outdegrees and successors are cross-checked, including
+ * {@linkplain ImmutableGraph#splitNodeIterators(int) split iterators}.
+ *
+ * @param g the immutable graph to be tested.
+ */
+
+ public static void assertGraph(ImmutableGraph g) {
+ assertGraph(g, true);
+ }
+
+ /** Performs a stress-test of an immutable graph. All available methods
+ * for accessing outdegrees and successors are cross-checked.
+ *
+ * @param g the immutable graph to be tested.
+ * @param doSplitIterators whether to test {@linkplain ImmutableGraph#splitNodeIterators(int) split iterators}.
+ */
+ public static void assertGraph(ImmutableGraph g, final boolean doSplitIterators) {
+
+ NodeIterator nodeIterator0 = g.nodeIterator(), nodeIterator1 = g.nodeIterator();
+ int d, s0[];
+ Label l0[];
+ LazyIntIterator s1;
+ int m = 0;
+ int curr;
+ // Check that iterator and array methods return the same values in sequential scans.
+ for(int i = g.numNodes(); i-- != 0;) {
+ curr = nodeIterator0.nextInt();
+ assertEquals(curr, nodeIterator1.nextInt());
+ d = nodeIterator0.outdegree();
+ m += d;
+ assertEquals(d, nodeIterator1.outdegree());
+
+ s0 = nodeIterator0.successorArray();
+ s1 = nodeIterator1.successors();
+ for(int k = 0; k < d; k++) assertEquals(s0[k], s1.nextInt());
+ assertEquals(-1, s1.nextInt());
+
+ if (g instanceof ArcLabelledImmutableGraph) {
+ l0 = ((ArcLabelledNodeIterator)nodeIterator0).labelArray();
+ s1 = ((ArcLabelledNodeIterator)nodeIterator1).successors();
+ for(int k = 0; k < d; k++) {
+ s1.nextInt();
+ assertEquals(l0[k], ((LabelledArcIterator)s1).label());
+ }
+ }
+
+ assertEquals(-1, s1.nextInt());
+ }
+
+ try {
+ assertEquals(m, g.numArcs());
+ }
+ catch(UnsupportedOperationException ignore) {} // A graph might not support numArcs().
+ assertFalse(nodeIterator0.hasNext());
+ assertFalse(nodeIterator1.hasNext());
+
+ // Check split iterator
+ if (doSplitIterators) {
+ assertSplitIterator(g, g.numNodes());
+ assertSplitIterator(g, 1);
+ if (g.numNodes() / 4 > 0) assertSplitIterator(g, g.numNodes() / 4);
+ assertSplitIterator(g, 4);
+ }
+
+ if (! g.randomAccess()) return;
+
+ // Check that sequential iterator methods and random methods do coincide.
+ String msg;
+
+ for(int s = 0; s < g.numNodes() - 1; s++) {
+ nodeIterator1 = g.nodeIterator(s);
+ for(int i = g.numNodes() - s; i-- != 0;) {
+ curr = nodeIterator1.nextInt();
+ msg = "Node " + curr + ", starting from " + s + ":";
+ d = g.outdegree(curr);
+ assertEquals(msg, d, nodeIterator1.outdegree());
+ s0 = g.successorArray(curr);
+ s1 = nodeIterator1.successors();
+ for(int k = 0; k < d; k++) assertEquals(msg, s0[k], s1.nextInt());
+ s1 = g.successors(curr);
+ for(int k = 0; k < d; k++) assertEquals(msg, s0[k], s1.nextInt());
+ assertEquals(msg, -1, s1.nextInt());
+
+ if (g instanceof ArcLabelledImmutableGraph) {
+ l0 = ((ArcLabelledImmutableGraph)g).labelArray(curr);
+ s1 = ((ArcLabelledNodeIterator)nodeIterator1).successors();
+ for(int k = 0; k < d; k++) {
+ s1.nextInt();
+ assertEquals(msg, l0[k], ((LabelledArcIterator)s1).label());
+ }
+ s1 = g.successors(curr);
+ for(int k = 0; k < d; k++) {
+ s1.nextInt();
+ assertEquals(msg, l0[k], ((LabelledArcIterator)s1).label());
+ }
+ assertEquals(msg, -1, s1.nextInt());
+ }
+ }
+ }
+
+ // Check that cross-access works.
+
+ nodeIterator0 = g.nodeIterator();
+ for(int s = 0; s < g.numNodes(); s++) {
+ d = g.outdegree(s);
+ nodeIterator0.nextInt();
+ LazyIntIterator successors = g.successors(s);
+ int[] succ = nodeIterator0.successorArray();
+ for(int i = 0; i < d; i++) {
+ final int t = successors.nextInt();
+ assertEquals(succ[i], t);
+ g.outdegree(t);
+ }
+
+ }
+ // Check copies
+ assertEquals(g, g.copy());
+
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ApproximateNeighbourhoodFunctionsTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ApproximateNeighbourhoodFunctionsTest.java
new file mode 100644
index 0000000..8fbbe2b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ApproximateNeighbourhoodFunctionsTest.java
@@ -0,0 +1,78 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.fastutil.objects.ObjectList;
+import it.unimi.dsi.stat.Jackknife;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import java.util.Arrays;
+
+import org.junit.Test;
+
+public class ApproximateNeighbourhoodFunctionsTest extends WebGraphTestCase {
+
+ @Test
+ public void testCombine() {
+ final double[] a = { 1.0, 2.0 };
+ final double[] b = { 0.5 };
+
+ double[] combine = ApproximateNeighbourhoodFunctions.combine(ObjectArrayList.wrap(new double[][] { a, b }));
+ assertEquals(2, combine.length);
+ assertEquals(1.5 / 2, combine[0], 0);
+ assertEquals(2.5 / 2, combine[1], 0);
+
+ final double[] c = { 1, 0.5 };
+
+ combine = ApproximateNeighbourhoodFunctions.combine(ObjectArrayList.wrap(new double[][] { c, c, c }));
+ assertEquals(2, combine.length);
+ assertEquals(1, combine[0], 0);
+ assertEquals(1, combine[1], 0);
+ }
+
+ @Test
+ public void testEvenOut() {
+ final double[] a = { 1, 2 };
+ final double[] b = { .5 };
+ final double[] c = { .4, .4, .6 };
+
+ ObjectList<double[]> evenOut = ApproximateNeighbourhoodFunctions.evenOut(ObjectArrayList.wrap(new double[][] { a, b, c }));
+ assertArrayEquals(evenOut.get(0), new double[] { 1, 2, 2 }, 0);
+ assertArrayEquals(evenOut.get(1), new double[] { .5, .5, .5 }, 0);
+ assertArrayEquals(evenOut.get(2), new double[] { .4, .4, .6 }, 0);
+ }
+
+ @Test
+ public void testStatistics() {
+ final double[][] s = { { 1, 2, 3 }, { 1, 2, 3 } };
+
+ Jackknife jackknife = Jackknife.compute(Arrays.asList(s), ApproximateNeighbourhoodFunctions.CDF);
+ assertArrayEquals(new double[] { 1./3, 2./3, 1 }, jackknife.estimate, 1E-50);
+ jackknife = Jackknife.compute(Arrays.asList(s), ApproximateNeighbourhoodFunctions.PMF);
+ assertArrayEquals(new double[] { 1./3, 1./3, 1./3 }, jackknife.estimate, 1E-50);
+ jackknife = Jackknife.compute(Arrays.asList(s), ApproximateNeighbourhoodFunctions.AVERAGE_DISTANCE);
+ assertArrayEquals(new double[] { 1 }, jackknife.estimate, 1E-50);
+ jackknife = Jackknife.compute(Arrays.asList(s), ApproximateNeighbourhoodFunctions.SPID);
+ assertArrayEquals(new double[] { 2./3 }, jackknife.estimate, 1E-15);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/BetweennessCentralityTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/BetweennessCentralityTest.java
new file mode 100644
index 0000000..fe13486
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/BetweennessCentralityTest.java
@@ -0,0 +1,240 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.util.Arrays;
+
+import org.junit.Test;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class BetweennessCentralityTest {
+
+ @Test
+ public void testPath() throws InterruptedException {
+ final ImmutableGraph graph = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(graph);
+ betweennessCentrality.compute();
+
+ assertEquals(0, betweennessCentrality.betweenness[0], 1E-5);
+ assertEquals(1, betweennessCentrality.betweenness[1], 1E-5);
+ assertEquals(0, betweennessCentrality.betweenness[2], 1E-5);
+ }
+
+ @Test
+ public void testLozenge() throws InterruptedException {
+ final ImmutableGraph graph = new ArrayListMutableGraph(4, new int[][] { { 0, 1 }, { 0, 2 }, { 1, 3 }, { 2, 3 } }).immutableView();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(graph);
+ betweennessCentrality.compute();
+
+ assertEquals(0, betweennessCentrality.betweenness[0], 1E-5);
+ assertEquals(0.5, betweennessCentrality.betweenness[1], 1E-5);
+ assertEquals(0.5, betweennessCentrality.betweenness[2], 1E-5);
+ assertEquals(0, betweennessCentrality.betweenness[3], 1E-5);
+ }
+
+ @Test
+ public void testCycle() throws InterruptedException {
+ for(int size: new int[] { 10, 50, 100 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newDirectedCycle(size).immutableView();
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(graph);
+ betweennessCentrality.compute();
+
+ final double[] expected = new double[size];
+ Arrays.fill(expected, (size - 1) * (size - 2) / 2.0);
+ for(int i = size; i-- != 0;) assertEquals(expected[i], betweennessCentrality.betweenness[i], 1E-12);
+ }
+ }
+
+ @Test
+ public void testClique() throws InterruptedException {
+ for(int size: new int[] { 10, 50, 100 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(graph);
+ betweennessCentrality.compute();
+
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 0);
+ for(int i = size; i-- != 0;) assertEquals(expected[i], betweennessCentrality.betweenness[i], 1E-12);
+ }
+ }
+
+ @Test
+ public void testCliqueNobridgeCycle() throws InterruptedException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ ImmutableGraph g = mg.immutableView();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(g);
+ betweennessCentrality.compute();
+
+ final double[] expected = new double[k + p];
+
+ for (int i = 0; i < k; i++) expected[i] = 0;
+ for (int i = k; i < k + p; i++) expected[i] = (p - 1) * (p - 2) / 2.0;
+
+ for (int i = 0; i < k + p; i++) assertEquals(expected[i], betweennessCentrality.betweenness[i], 1E-12);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueForwardbridgeCycle() throws InterruptedException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(g);
+ betweennessCentrality.compute();
+
+ final double[] expected = new double[k + p];
+
+ for (int i = 0; i < k - 1; i++) expected[i] = 0;
+ expected[k - 1] = p * (k - 1);
+ for (int d = 0; d < p; d++) expected[k + d] = k * (p - d - 1) + (p - 1) * (p - 2) / 2.0;
+
+ for (int i = 0; i < k + p; i++) assertEquals(expected[i], betweennessCentrality.betweenness[i], 1E-12);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBackbridgeCycle() throws InterruptedException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ ImmutableGraph g = mg.immutableView();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(g);
+ betweennessCentrality.compute();
+
+ final double[] expected = new double[k + p];
+
+ for (int i = 0; i < k - 1; i++) expected[i] = 0;
+ expected[k - 1] = p * (k - 1);
+ for (int d = 0; d < p; d++) expected[k + d] = k * (d - 1 + (d == 0? p : 0)) + (p - 1) * (p - 2) / 2.0;
+
+ for (int i = 0; i < k + p; i++) assertEquals(expected[i], betweennessCentrality.betweenness[i], 1E-12);
+ }
+ }
+ }
+
+ @Test
+ public void testCliqueBibridgeCycle() throws InterruptedException {
+ for(int p: new int[] { 10, 50, 100 }) {
+ for(int k: new int[] { 10, 50, 100 }) {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(p + k);
+ for(int i = 0; i < k; i++)
+ for(int j = 0; j < k; j++)
+ if (i != j) mg.addArc(i, j);
+ for(int i = 0; i < p; i++) mg.addArc(k + i, k + (i + 1) % p);
+ mg.addArc(k, k - 1);
+ mg.addArc(k - 1, k);
+ ImmutableGraph g = mg.immutableView();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(g);
+ betweennessCentrality.compute();
+
+ final double[] expected = new double[k + p];
+
+ for (int i = 0; i < k - 1; i++) expected[i] = 0;
+ expected[k - 1] = 2 * p * (k - 1);
+ expected[k] = 2 * k * (p - 1) + (p - 1) * (p - 2) / 2.0;
+ for (int d = 1; d < p; d++) expected[k + d] = k * (p - 2) + (p - 1) * (p - 2) / 2.0;
+
+ for (int i = 0; i < k + p; i++) assertEquals(expected[i], betweennessCentrality.betweenness[i], 1E-12);
+ }
+ }
+ }
+
+ @Test
+ public void testRandom() throws InterruptedException {
+ for (double p: new double[] { .1, .2, .5, .7 })
+ for(int size: new int[] { 10, 50, 100 }) {
+ final ImmutableGraph graph = new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false)).immutableView();
+ final BetweennessCentrality betweennessCentralityMultipleVisits = new BetweennessCentrality(graph);
+ betweennessCentralityMultipleVisits.compute();
+
+ final BetweennessCentrality betweennessCentrality = new BetweennessCentrality(graph);
+ betweennessCentrality.compute();
+
+ assertArrayEquals(betweennessCentrality.betweenness, betweennessCentralityMultipleVisits.betweenness, 1E-15);
+ }
+ }
+
+ @Test
+ public void testOverflowOK() throws InterruptedException {
+ final int blocks = 20;
+ final int blockSize = 10;
+ final int n = blocks * blockSize;
+
+ final ArrayListMutableGraph arrayListMutableGraph = new ArrayListMutableGraph(n);
+
+ for(int i = blocks; i-- != 0;)
+ for(int j = blockSize - 1; j-- != 0;) {
+ arrayListMutableGraph.addArc(i * blockSize, i * blockSize + j + 1);
+ arrayListMutableGraph.addArc(i * blockSize + j + 1, (i + 1) * blockSize % n);
+ }
+
+ new BetweennessCentrality(arrayListMutableGraph.immutableView()).compute();
+ }
+
+ @Test(expected=BetweennessCentrality.PathCountOverflowException.class)
+ public void testOverflowNotOK() throws InterruptedException {
+ final int blocks = 40;
+ final int blockSize = 10;
+ final int n = blocks * blockSize;
+
+ final ArrayListMutableGraph arrayListMutableGraph = new ArrayListMutableGraph(n);
+
+ for(int i = blocks; i-- != 0;)
+ for(int j = blockSize - 1; j-- != 0;) {
+ arrayListMutableGraph.addArc(i * blockSize, i * blockSize + j + 1);
+ arrayListMutableGraph.addArc(i * blockSize + j + 1, (i + 1) * blockSize % n);
+ }
+
+ new BetweennessCentrality(arrayListMutableGraph.immutableView()).compute();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ConnectedComponentsTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ConnectedComponentsTest.java
new file mode 100644
index 0000000..ddd9b7d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ConnectedComponentsTest.java
@@ -0,0 +1,64 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+
+public class ConnectedComponentsTest extends WebGraphTestCase {
+ public static void sameComponents(ImmutableGraph g) {
+ StronglyConnectedComponentsTarjan stronglyConnectedComponents = StronglyConnectedComponentsTarjan.compute(g, false, new ProgressLogger());
+ int[] size2 = stronglyConnectedComponents.computeSizes();
+ stronglyConnectedComponents.sortBySize(size2);
+
+ for(int t = 0; t < 3; t++) {
+ ConnectedComponents connectedComponents = ConnectedComponents.compute(g, t, new ProgressLogger());
+ int[] size = connectedComponents.computeSizes();
+ connectedComponents.sortBySize(size);
+ for(int i = g.numNodes(); i-- != 0;)
+ for(int j = i; j-- != 0;)
+ assert((connectedComponents.component[i] == connectedComponents.component[j]) == (stronglyConnectedComponents.component[i] == stronglyConnectedComponents.component[j]));
+ }
+ }
+
+ @Test
+ public void testSmall() {
+ sameComponents(ArrayListMutableGraph.newBidirectionalCycle(40).immutableView());
+ }
+
+ @Test
+ public void testBinaryTree() {
+ sameComponents(Transform.symmetrize(ArrayListMutableGraph.newCompleteBinaryIntree(10).immutableView()));
+ }
+
+ @Test
+ public void testErdosRenyi() {
+ for(int size: new int[] { 10, 100, 1000 })
+ for(int attempt = 0; attempt < 5; attempt++)
+ sameComponents(Transform.symmetrize(new ArrayListMutableGraph(new ErdosRenyiGraph(size, .001, attempt + 1, true)).immutableView()));
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/EliasFanoCumulativeOutdegreeListTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/EliasFanoCumulativeOutdegreeListTest.java
new file mode 100644
index 0000000..49196a3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/EliasFanoCumulativeOutdegreeListTest.java
@@ -0,0 +1,87 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+
+public class EliasFanoCumulativeOutdegreeListTest extends WebGraphTestCase {
+
+ @Test
+ public void testEliasFano() {
+ final ImmutableGraph graph =new ArrayListMutableGraph(new ErdosRenyiGraph(10000, .001, 0, false)).immutableView();
+ for(int mask: new int[] { 0, 1, 3 }) {
+ final EliasFanoCumulativeOutdegreeList eliasFanoMonotoneLongBigList = new EliasFanoCumulativeOutdegreeList(graph, graph.numArcs(), mask);
+ final int n = graph.numNodes();
+ final long m = graph.numArcs();
+
+ for(long i = 1; i < m;) {
+ final long s = eliasFanoMonotoneLongBigList.skipTo(i);
+ assertEquals(0, eliasFanoMonotoneLongBigList.currentIndex() & mask);
+ int j = 0;
+ long c = 0;
+ while(j < n) if ((c += graph.outdegree(j++)) >= i && (j & mask) == 0) break;
+ assertEquals(j, eliasFanoMonotoneLongBigList.currentIndex());
+ assertEquals(c, s);
+ i = c + 1;
+ }
+
+ for(long i = 1; i < m;) {
+ final long s = eliasFanoMonotoneLongBigList.skipTo(i);
+ assertEquals(0, eliasFanoMonotoneLongBigList.currentIndex() & mask);
+ int j = 0;
+ long c = 0;
+ while(j < n) if ((c += graph.outdegree(j++)) >= i && (j & mask) == 0) break;
+ assertEquals(j, eliasFanoMonotoneLongBigList.currentIndex());
+ assertEquals(c, s);
+ i = c + (m - c) / 2;
+ }
+
+ if (mask == 0) {
+ long c = 0;
+ for(int i = 0; i < n - 1; i++) {
+ c += graph.outdegree(i);
+ if (graph.outdegree(i) != 0) {
+ long s = eliasFanoMonotoneLongBigList.skipTo(c);
+ assertEquals(i + 1, eliasFanoMonotoneLongBigList.currentIndex());
+ assertEquals(c, s);
+ if (graph.outdegree(i + 1) != 0) {
+ s = eliasFanoMonotoneLongBigList.skipTo(c + 1);
+ assertEquals(i + 2, eliasFanoMonotoneLongBigList.currentIndex());
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testZeroLength() {
+ final ImmutableGraph graph = new ArrayListMutableGraph().immutableView();
+ final EliasFanoCumulativeOutdegreeList eliasFanoMonotoneLongBigList = new EliasFanoCumulativeOutdegreeList(graph, graph.numArcs(), 0);
+ assertEquals(-1, eliasFanoMonotoneLongBigList.currentIndex());
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/EstimateEffectiveDiameterTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/EstimateEffectiveDiameterTest.java
new file mode 100644
index 0000000..0a2eb51
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/EstimateEffectiveDiameterTest.java
@@ -0,0 +1,128 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.CliqueGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class EstimateEffectiveDiameterTest extends WebGraphTestCase {
+
+ @Test
+ public void testSmall() throws IOException {
+ final ImmutableGraph g = ArrayListMutableGraph.newBidirectionalCycle(40).immutableView();
+
+ final HyperBall hyperBall = new HyperBall(g, 8, 0);
+ hyperBall.run(Integer.MAX_VALUE, -1);
+ assertEquals(17, NeighbourhoodFunction.effectiveDiameter(.9, hyperBall.neighbourhoodFunction.toDoubleArray()), 1);
+ hyperBall.close();
+ }
+
+ @Test
+ public void testCycleOfCliques() throws IOException {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph();
+ // Creates a bidirectional cycle of k n-cliques, each connected with the next by 2*b<n arcs
+ // Expected diameter: k + 1
+ final int n = 20, k = 100, b = 6;
+ mg.addNodes(n * k);
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < n; j++)
+ for (int h = 0; h < n; h++)
+ mg.addArc(n * i + j, n * i + h);
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < b; j++) {
+ mg.addArc(n * i + j, n * ((i + 1) % k) + n - 1 - j);
+ mg.addArc(n * ((i + 1) % k) + n - 1 - j, n * i + j);
+ }
+ ImmutableGraph g = mg.immutableView();
+
+ final HyperBall hyperBall = new HyperBall(g, 8, 0);
+ hyperBall.run(Integer.MAX_VALUE, -1);
+ double estimation = NeighbourhoodFunction.effectiveDiameter(1, hyperBall.neighbourhoodFunction.toDoubleArray());
+ double expected = k + 1;
+ double relativeError = Math.abs(estimation - expected) / expected;
+ System.err.println("Estimate: " + estimation);
+ System.err.println("Relative error in estimate (should be <0.05): " + relativeError);
+ assertTrue(relativeError < 0.05); // Accept error within 5%
+
+ hyperBall.close();
+ }
+
+ @Test
+ public void testTwoCyclesOfCliques() throws IOException {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph();
+ // Creates two bidirectional cycles of k n-cliques (kx nx-cliques, resp.), each connected with the next by 2*b<n (2*bx<nx, resp.) arcs
+ // We expect that more than 90% of the pairs are within distance k or kx, so the effective diameter should be k
+ final int n = 16, k = 10, b = 6;
+ final int firstNodeSecondClique = n * k;
+ final int nx = 3, kx = 5, bx = 1;
+ mg.addNodes(n * k + nx * kx);
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < n; j++)
+ for (int h = 0; h < n; h++)
+ mg.addArc(n * i + j, n * i + h);
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < b; j++) {
+ mg.addArc(n * i + j, n * ((i + 1) % k) + n - 1 - j);
+ mg.addArc(n * ((i + 1) % k) + n - 1 - j, n * i + j);
+ }
+ for (int i = 0; i < kx; i++)
+ for (int j = 0; j < nx; j++)
+ for (int h = 0; h < nx; h++)
+ mg.addArc(firstNodeSecondClique + nx * i + j, firstNodeSecondClique + nx * i + h);
+ for (int i = 0; i < kx; i++)
+ for (int j = 0; j < bx; j++) {
+ mg.addArc(firstNodeSecondClique + nx * i + j, firstNodeSecondClique + nx * ((i + 1) % kx) + nx - 1 - j);
+ mg.addArc(firstNodeSecondClique + nx * ((i + 1) % kx) + nx - 1 - j, firstNodeSecondClique + nx * i + j);
+ }
+ final HyperBall hyperBall = new HyperBall(mg.immutableView(), 8, 0);
+ hyperBall.run(Integer.MAX_VALUE, -1);
+
+ assertEquals(k, NeighbourhoodFunction.effectiveDiameter(.99, hyperBall.neighbourhoodFunction.toDoubleArray()), 1);
+
+ hyperBall.close();
+ }
+
+
+ @Test
+ public void testCliqueGraph() throws IOException {
+ HyperBall hyperBall = new HyperBall(new CliqueGraph(100, 5), 8, 0);
+ hyperBall.run(1000, 1E-3);
+ hyperBall.close();
+ }
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ final HyperBall hyperBall = new HyperBall(g, 8, 0);
+ hyperBall.run(Integer.MAX_VALUE, -1);
+ assertEquals(NeighbourhoodFunction.effectiveDiameter(.9, HyperBallSlowTest.cnr2000NF), NeighbourhoodFunction.effectiveDiameter(.9, hyperBall.neighbourhoodFunction.toDoubleArray()), 1);
+ hyperBall.close();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ExactNeighbourhoodFunction.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ExactNeighbourhoodFunction.java
new file mode 100644
index 0000000..7afa1ba
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ExactNeighbourhoodFunction.java
@@ -0,0 +1,119 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+public class ExactNeighbourhoodFunction {
+ private static final Logger LOGGER = LoggerFactory.getLogger(NeighbourhoodFunction.class);
+ private final ImmutableGraph graph;
+ private final ProgressLogger pl;
+ private int n;
+ private LongArrayBitVector[] neighbours;
+
+ public ExactNeighbourhoodFunction(ImmutableGraph graph, ProgressLogger pl) {
+ this.graph = graph;
+ // TODO Auto-generated constructor stub
+ this.pl = pl;
+ this.n = graph.numNodes();
+ this.neighbours = new LongArrayBitVector[n];
+ for(int i = n; i-- != 0;) (neighbours[i] = LongArrayBitVector.ofLength(n)).set(i);
+ }
+
+ public double[] neighbourhoodFunction() {
+ DoubleArrayList neighbourhoodFunction = new DoubleArrayList();
+ neighbourhoodFunction.add(n);
+ long prevCount = -1, count;
+ for(;;) {
+ final LongArrayBitVector[] newNeighbours = new LongArrayBitVector[n];
+ for(int i = n; i-- != 0;) newNeighbours[i] = LongArrayBitVector.copy(neighbours[i]);
+ NodeIterator nodeIterator = graph.nodeIterator();
+ for(int i = 0; i < n; i++) {
+ nodeIterator.nextInt();
+ final int d = nodeIterator.outdegree();
+ final int[] successor = nodeIterator.successorArray();
+
+ for(int j = 0; j < d; j++) newNeighbours[i].or(neighbours[successor[j]]);
+
+ }
+
+ neighbours = newNeighbours;
+
+ count = 0;
+ for(int j = 0; j < n; j++) count += newNeighbours[j].count();
+ if (prevCount == count) break;
+ prevCount = count;
+
+ neighbourhoodFunction.add(count);
+
+ if (pl != null) {
+ pl.update();
+ pl.logger().info("Pairs: " + count);
+ }
+ }
+
+ return neighbourhoodFunction.toDoubleArray();
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException, IllegalArgumentException, ClassNotFoundException, IllegalAccessException, InvocationTargetException, InstantiationException, NoSuchMethodException {
+ SimpleJSAP jsap = new SimpleJSAP(ExactNeighbourhoodFunction.class.getName(), "Prints the neighbourhood function.",
+ new Parameter[] {
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean spec = jsapResult.getBoolean("spec");
+ final String basename = jsapResult.getString("basename");
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+
+ final ImmutableGraph graph = spec ? ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE) : ImmutableGraph.loadOffline(basename);
+
+ final ExactNeighbourhoodFunction neighbourhoodFunction = new ExactNeighbourhoodFunction(graph, pl);
+ pl.start("Computing...");
+ TextIO.storeDoubles(neighbourhoodFunction.neighbourhoodFunction(), System.out);
+ pl.done();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/FourSweepIterativeFringeDiameterTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/FourSweepIterativeFringeDiameterTest.java
new file mode 100644
index 0000000..4f53103
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/FourSweepIterativeFringeDiameterTest.java
@@ -0,0 +1,86 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.CliqueGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+@Deprecated
+public class FourSweepIterativeFringeDiameterTest extends WebGraphTestCase {
+
+ @Test
+ public void testSmall() {
+ final ImmutableGraph g = ArrayListMutableGraph.newBidirectionalCycle(40).immutableView();
+ assertEquals(20, FourSweepIterativeFringeDiameter.run(g, 0, new ProgressLogger(), 0));
+ }
+
+ @Test
+ public void testCycleOfCliques() {
+ ArrayListMutableGraph mg = new ArrayListMutableGraph();
+ // Creates a bidirectional cycle of k n-cliques, each connected with the next by 2*b<n arcs
+ // Expected diameter: k + 1
+ final int n = 20, k = 100, b = 6;
+ mg.addNodes(n * k);
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < n; j++)
+ for (int h = 0; h < n; h++)
+ mg.addArc(n * i + j, n * i + h);
+ for (int i = 0; i < k; i++)
+ for (int j = 0; j < b; j++) {
+ mg.addArc(n * i + j, n * ((i + 1) % k) + n - 1 - j);
+ mg.addArc(n * ((i + 1) % k) + n - 1 - j, n * i + j);
+ }
+ ImmutableGraph g = mg.immutableView();
+
+ assertEquals(k + 1, FourSweepIterativeFringeDiameter.run(g, 0, null, 0));
+ }
+
+ @Test
+ public void testCliqueGraph() {
+ assertEquals(12, FourSweepIterativeFringeDiameter.run(new CliqueGraph(100, 5), 0, new ProgressLogger(), 0));
+ }
+
+ @Test
+ public void testBinaryTree() {
+ assertEquals(20, FourSweepIterativeFringeDiameter.run(Transform.symmetrize(ArrayListMutableGraph.newCompleteBinaryIntree(10).immutableView()), 0, new ProgressLogger(), 0));
+ }
+
+ @Test
+ public void testErdosRenyi() {
+ assertEquals(2, FourSweepIterativeFringeDiameter.run(Transform.symmetrize(new ArrayListMutableGraph(new ErdosRenyiGraph(1000, .5, 0, false)).immutableView()), 0, new ProgressLogger(), 0));
+ }
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = Transform.symmetrize(ImmutableGraph.load(path));
+ assertEquals(34, FourSweepIterativeFringeDiameter.run(g, 0, new ProgressLogger(), 0));
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/GeometricCentralitiesTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/GeometricCentralitiesTest.java
new file mode 100644
index 0000000..fa4655b
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/GeometricCentralitiesTest.java
@@ -0,0 +1,98 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.Test;
+
+
+
+//RELEASE-STATUS: DIST
+
+public class GeometricCentralitiesTest {
+
+ @Test
+ public void testPath() throws InterruptedException {
+ final ImmutableGraph graph = Transform.transpose(new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView());
+
+ final GeometricCentralities centralities = new GeometricCentralities(graph);
+ centralities.compute();
+
+ assertEquals(0, centralities.closeness[0], 0);
+ assertEquals(1, centralities.closeness[1], 0);
+ assertEquals(1./3, centralities.closeness[2], 0);
+
+ assertEquals(1, centralities.lin[0], 0);
+ assertEquals(4, centralities.lin[1], 0);
+ assertEquals(3, centralities.lin[2], 0);
+
+ assertEquals(0, centralities.harmonic[0], 0);
+ assertEquals(1, centralities.harmonic[1], 0);
+ assertEquals(3./2, centralities.harmonic[2], 0);
+ }
+
+ @Test
+ public void testCycle() throws InterruptedException {
+ for(int size: new int[] { 10, 50, 100 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newDirectedCycle(size).immutableView();
+ final GeometricCentralities centralities = new GeometricCentralities(graph);
+ centralities.compute();
+
+ final double[] expected = new double[size];
+ Arrays.fill(expected, 2. / (size * (size - 1.)));
+ for(int i = size; i-- != 0;) assertEquals(expected[i], centralities.closeness[i], 1E-15);
+ Arrays.fill(expected, size * 2. / (size - 1.));
+ for(int i = size; i-- != 0;) assertEquals(expected[i], centralities.lin[i], 1E-15);
+ double s = 0;
+ for(int i = size; i-- != 1;) s += 1. / i;
+ Arrays.fill(expected, s);
+ for(int i = size; i-- != 0;) assertEquals(expected[i], centralities.harmonic[i], 1E-14);
+ }
+ }
+
+ @Test
+ public void testErdosRenyi() throws IOException, InterruptedException {
+ for(int size: new int[] { 10, 100 }) {
+ for(double density: new double[] { 0.0001, 0.001, 0.01 }) {
+ final ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, density, 0, false)).immutableView();
+ final HyperBall hanf = new HyperBall(g, Transform.transpose(g), 20, null, 0, 0, 0, false, true, true, null, 0);
+ hanf.init();
+ do hanf.iterate(); while(hanf.modified() != 0);
+ final GeometricCentralities centralities = new GeometricCentralities(g);
+ centralities.compute();
+
+ for(int i = 0; i < size; i++)
+ assertEquals(hanf.sumOfInverseDistances[i], centralities.harmonic[i], 1E-3);
+ for(int i = 0; i < size; i++)
+ assertEquals(hanf.sumOfDistances[i] == 0 ? 0 : 1 / hanf.sumOfDistances[i], centralities.closeness[i], 1E-5);
+ for(int i = 0; i < size; i++)
+ assertEquals(hanf.sumOfDistances[i] == 0 ? 1 : hanf.count(i) * hanf.count(i) / hanf.sumOfDistances[i], centralities.lin[i], 1E-3);
+ hanf.close();
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/HyperBallTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/HyperBallTest.java
new file mode 100644
index 0000000..79da7e3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/HyperBallTest.java
@@ -0,0 +1,466 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.Test;
+
+
+public class HyperBallTest extends WebGraphTestCase {
+ // Below this threshold errors due to block-by-block summing start to appear.
+ protected static final double THRESHOLD = 1E-9;
+
+ /** Checks that the state of two HyperBall implementation (as
+ * returned by {@link HyperLogLogCounterArray#registers()}) are exactly the same. */
+ public final static void assertState(final int size, final int log2m, final LongBigList[] a, final LongBigList[] b) {
+ final int m = 1 << log2m;
+ for(int i = 0; i < size; i++) {
+ for(int j = 0; j < m; j++) {
+ final long index = ((long)i << log2m) + j;
+ final int chunk = (int)(index >>> HyperLogLogCounterArray.CHUNK_SHIFT);
+ final long offset = index & HyperLogLogCounterArray.CHUNK_MASK;
+ assertEquals("Counter " + i + ", register " + j + ": ", a[chunk].getLong(offset), b[chunk].getLong(offset));
+ }
+ }
+ }
+
+ @Test
+ public void testTrivial() throws IOException {
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteBinaryIntree(10).immutableView();
+ HyperBall hyperBall = new HyperBall(g, g, 7, null, 0, 0, 0, false, false, false, null, 0);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.close();
+
+ hyperBall = new HyperBall(g, g, 7, null, 0, 0, 0, true, false, false, null, 0);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.close();
+
+ }
+
+ protected static void assertRelativeError(double sequentialCurrent, double current, double threshold) {
+ assertTrue(sequentialCurrent + " != " + current + ", " + Math.abs(current - sequentialCurrent) / current + " > " + threshold, Math.abs(current - sequentialCurrent) / current <= THRESHOLD);
+ }
+
+ /* All tests in this class check that 2 times the theoretical relative standard deviation
+ * is attained in 9 trials out of 10. The theory (in particular, the Vysochanskii-Petunin inequality)
+ * indeed says it should happen 90% of the times. */
+
+ @Test
+ public void testClique() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6, 8 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 10, 100, 500 }) {
+ int correct = 0;
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10, 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ if (Math.abs(size * size - current) <= 2 * rsd * size * size) correct++;
+
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ assertTrue(size + ":" + rsd + " " + correct + " < " + 9, correct >= 9);
+ }
+ }
+ }
+
+ @Test
+ public void testErdosRenyi() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6, 8 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ do {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ } while(hyperBall.modified() != 0);
+
+ hyperBall.init();
+ sequentialHyperBall.init();
+ do {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ } while(hyperBall.modified() != 0);
+
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCycle() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 100, 500, 1000 }) {
+ final int[] correct = new int[size + 1];
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ImmutableGraph g = ArrayListMutableGraph.newDirectedCycle(size).immutableView();
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ for(int i = 2; i <= size; i++) {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ if (Math.abs(size * i - current) <= 2 * rsd * size * i) correct[i]++;
+ }
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ for(int i = 2; i <= size; i++) assertTrue(size + ":" + rsd + " " + correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ }
+
+ }
+
+ @Test
+ public void testLine() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 100, 500, 1000 }) {
+ final int[] correct = new int[size + 1];
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ArrayListMutableGraph directedCycle = ArrayListMutableGraph.newDirectedCycle(size);
+ directedCycle.removeArc(0, 1);
+ ImmutableGraph g = directedCycle.immutableView();
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ for(int i = 2; i <= size; i++) {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ long result = 0;
+ for(int j = 0; j < i; j++) result += (size - j);
+ if (Math.abs(result - current) <= 2 * rsd * size * i) correct[i]++;
+ }
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ for(int i = 2; i <= size; i++) assertTrue(size + ":" + rsd + " " + correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ }
+
+ }
+
+ @Test
+ public void testOutdirectedStar() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 100, 500, 1000 }) {
+ int correct = 0;
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(size);
+ for(int i = 1; i < size; i++) mg.addArc(0, i);
+ ImmutableGraph g = mg.immutableView();
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ if (Math.abs(size * 2 - 1 - current) <= 2 * rsd * (size * 2 - 1)) correct++;
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ assertTrue(size + ":" + rsd + " " + correct + " < " + 9, correct >= 9);
+ }
+ }
+ }
+
+ @Test
+ public void testTree() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6, 7, 8, 10, 12 }) {
+ double rsd = HyperBall.relativeStandardDeviation(log2m);
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView();
+ final int[] correct = new int[3];
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " attempt: " + attempt);
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 29) <= 2 * rsd * 29) correct[0]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 41) <= 2 * rsd * 41) correct[1]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 49) <= 2 * rsd * 49) correct[2]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ // Test that you can reuse the object
+
+ hyperBall.init();
+ sequentialHyperBall.init();
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 29) <= 2 * rsd * 29) correct[0]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 41) <= 2 * rsd * 41) correct[1]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 49) <= 2 * rsd * 49) correct[2]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ //System.err.println(Arrays.toString(correct));
+ for(int i = 0; i < 3; i++) assertTrue(rsd + " " + correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ }
+
+ @Test(expected=IllegalStateException.class)
+ public void testInitClosed() throws IOException {
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView();
+ HyperBall hyperBall = new HyperBall(g, 8);
+ hyperBall.close();
+ hyperBall.init();
+ }
+
+ @Test(expected=IllegalStateException.class)
+ public void testInitIterate() throws IOException {
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView();
+ HyperBall hyperBall = new HyperBall(g, 8);
+ hyperBall.close();
+ hyperBall.iterate();
+ }
+
+ private int[] distancesFrom(final ImmutableGraph graph, final int from) {
+ final IntArrayFIFOQueue queue = new IntArrayFIFOQueue();
+ final int n = graph.numNodes();
+ final int[] dist = new int[n];
+ Arrays.fill(dist, Integer.MAX_VALUE); // Initially, all distances are infinity.
+
+ queue.enqueue(from);
+ dist[from] = 0;
+
+ LazyIntIterator successors;
+
+ while(! queue.isEmpty()) {
+ int curr = queue.dequeueInt();
+ successors = graph.successors(curr);
+ int d = graph.outdegree(curr);
+ while(d-- != 0) {
+ int succ = successors.nextInt();
+ if (dist[succ] == Integer.MAX_VALUE) {
+ dist[succ] = dist[curr] + 1;
+ queue.enqueue(succ);
+ }
+ }
+ }
+
+ return dist;
+ }
+
+ @Test
+ public void testErdosRenyiEccentricity() throws IOException {
+ XoRoShiRo128PlusRandom rand = new XoRoShiRo128PlusRandom(1);
+ for(int log2m: new int[] { 15 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt + 1, false)).immutableView();
+ HyperBall hyperBall =
+ new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, true, false, null, attempt);
+ hyperBall.init();
+ do {
+ hyperBall.iterate();
+ } while(hyperBall.modified() != 0);
+
+ int n = g.numNodes();
+ for (int i = 0; i < 10; i++) {
+ int from = rand.nextInt(n);
+ int dist[] = distancesFrom(g, from);
+ long totDist = 0;
+ int reachable = 0;
+ for (int k = 0; k < n; k++)
+ if (dist[k] < Integer.MAX_VALUE) {
+ reachable++;
+ totDist += dist[k];
+ }
+ assertEquals(1.0, reachable / hyperBall.count(from), 0.20);
+
+ double expEcc = (double)totDist / reachable;
+ double computedEcc = hyperBall.sumOfDistances[from] / hyperBall.count(from);
+ if (expEcc == 0) assertEquals(0.0, computedEcc, 1E-3);
+ else assertEquals(1.0, expEcc / computedEcc, 0.15);
+ }
+
+ hyperBall.close();
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testErdosRenyiHarmonic() throws IOException {
+ XoRoShiRo128PlusRandom rand = new XoRoShiRo128PlusRandom(1);
+ for(int log2m: new int[] { 15 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(boolean weights: new boolean[] { false, true }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ final int[] weight;
+ if (weights) {
+ weight = new int[size];
+ for(int i = size; i-- != 0;) weight[i] = i % 4;
+ }
+ else weight = null;
+
+ HyperBall hyperBall =
+ new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, true, true, null, weight, attempt);
+ hyperBall.init();
+ do {
+ hyperBall.iterate();
+ } while(hyperBall.modified() != 0);
+
+ int n = g.numNodes();
+ for (int i = 0; i < 10; i++) {
+ int from = rand.nextInt(n);
+ int dist[] = distancesFrom(g, from);
+ double totDist = 0;
+ for (int k = 0; k < n; k++)
+ if (dist[k] < Integer.MAX_VALUE && dist[k] > 0)
+ totDist += (double)(weight == null ? 1 : weight[k]) / dist[k];
+ double expHarm = n / totDist;
+ double computedHarm = n / hyperBall.sumOfInverseDistances[from];
+ if (totDist != 0) assertEquals(1.0, expHarm / computedHarm, 0.1);
+ }
+
+ hyperBall.close();
+ }
+ }
+ }
+ }
+ }
+
+
+ @Test
+ public void testErdosRenyiGain() throws IOException {
+ for(int log2m: new int[] { 15 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ImmutableGraph g = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ HyperBall hyperBall =
+ new HyperBall(g, attempt % 3 == 0 ? null : Transform.transpose(g), log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, true, true, new Int2DoubleFunction[] {
+ new HyperBall.AbstractDiscountFunction() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int distance) {
+ return distance;
+ }
+ },
+ new HyperBall.AbstractDiscountFunction() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int distance) {
+ return 1. / distance;
+ }
+ }
+ },
+ attempt);
+ hyperBall.init();
+ do {
+ hyperBall.iterate();
+ } while(hyperBall.modified() != 0);
+
+ int n = g.numNodes();
+ for (int i = 0; i < n; i++) {
+ assertEquals(hyperBall.sumOfDistances[i], hyperBall.discountedCentrality[0][i], 1E-5);
+ assertEquals(hyperBall.sumOfInverseDistances[i], hyperBall.discountedCentrality[1][i], 1E-5);
+ }
+ hyperBall.close();
+ }
+ }
+ }
+ }
+ }
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/NeighbourhoodFunctionTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/NeighbourhoodFunctionTest.java
new file mode 100644
index 0000000..5217648
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/NeighbourhoodFunctionTest.java
@@ -0,0 +1,79 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import org.junit.Test;
+
+
+public class NeighbourhoodFunctionTest extends WebGraphTestCase {
+
+ @Test
+ public void testClique() {
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteGraph(10, false).immutableView();
+ double[] computeNeighbourhoodFunction = NeighbourhoodFunction.compute(g);
+ assertEquals(2, computeNeighbourhoodFunction.length);
+ assertEquals(10, computeNeighbourhoodFunction[0], Double.MIN_VALUE);
+ assertEquals(100, computeNeighbourhoodFunction[1], Double.MIN_VALUE);
+ }
+
+ @Test
+ public void testCycle() {
+ ImmutableGraph g = ArrayListMutableGraph.newDirectedCycle(5).immutableView();
+ double[] computeNeighbourhoodFunction = NeighbourhoodFunction.compute(g);
+ assertEquals(5, computeNeighbourhoodFunction.length);
+ assertEquals(5, computeNeighbourhoodFunction[0], Double.MIN_VALUE);
+ assertEquals(10, computeNeighbourhoodFunction[1], Double.MIN_VALUE);
+ assertEquals(15, computeNeighbourhoodFunction[2], Double.MIN_VALUE);
+ assertEquals(20, computeNeighbourhoodFunction[3], Double.MIN_VALUE);
+ assertEquals(25, computeNeighbourhoodFunction[4], Double.MIN_VALUE);
+ }
+
+ @Test
+ public void testTree() {
+ ImmutableGraph g = ArrayListMutableGraph.newCompleteBinaryIntree(1).immutableView();
+ double[] computeNeighbourhoodFunction = NeighbourhoodFunction.compute(g);
+ assertEquals(2, computeNeighbourhoodFunction.length);
+ assertEquals(3, computeNeighbourhoodFunction[0], Double.MIN_VALUE);
+ assertEquals(5, computeNeighbourhoodFunction[1], Double.MIN_VALUE);
+ g = ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView();
+ computeNeighbourhoodFunction = NeighbourhoodFunction.compute(g);
+ assertEquals(4, computeNeighbourhoodFunction.length);
+ assertEquals(15, computeNeighbourhoodFunction[0], Double.MIN_VALUE);
+ assertEquals(29, computeNeighbourhoodFunction[1], Double.MIN_VALUE);
+ assertEquals(41, computeNeighbourhoodFunction[2], Double.MIN_VALUE);
+ assertEquals(49, computeNeighbourhoodFunction[3], Double.MIN_VALUE);
+ }
+
+ @Test
+ public void testMedian() {
+ assertEquals(1, NeighbourhoodFunction.medianDistance(2, new double[] { 2, 4 }), 0);
+ assertEquals(Double.POSITIVE_INFINITY, NeighbourhoodFunction.medianDistance(3, new double[] { 3, 4 }), 0);
+ assertEquals(1, NeighbourhoodFunction.medianDistance(3, new double[] { 3, 6, 8 }), 0);
+ assertEquals(2, NeighbourhoodFunction.medianDistance(3, new double[] { 3, 4, 5, 6 }), 0);
+ assertEquals(0, NeighbourhoodFunction.medianDistance(1, new double[] { 1 }), 0);
+ assertEquals(Double.POSITIVE_INFINITY, NeighbourhoodFunction.medianDistance(2, new double[] { 2 }), 0);
+ assertEquals(Double.POSITIVE_INFINITY, NeighbourhoodFunction.medianDistance(3, new double[] { 3 }), 0);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ParallelBreadthFirstVisitTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ParallelBreadthFirstVisitTest.java
new file mode 100644
index 0000000..302dbde
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/ParallelBreadthFirstVisitTest.java
@@ -0,0 +1,67 @@
+package it.unimi.dsi.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+public class ParallelBreadthFirstVisitTest {
+ private final ProgressLogger pl = new ProgressLogger(NOPLogger.NOP_LOGGER);
+
+ @Test
+ public void testTree() {
+ ImmutableGraph graph = ArrayListMutableGraph.newCompleteBinaryOuttree(10).immutableView();
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(graph, 0, false, pl);
+ visit.visit(0);
+ final int d[] = new int[graph.numNodes()];
+ for(int i = 0; i < visit.cutPoints.size() - 1; i++)
+ for(int j = visit.cutPoints.getInt(i); j < visit.cutPoints.getInt(i + 1); j++) d[visit.queue.getInt(j)] = i;
+ for(int i = 0; i < graph.numNodes(); i++) assertEquals(Integer.toString(i), Fast.mostSignificantBit(i + 1), d[i]);
+ }
+
+ @Test
+ public void testStar() {
+ ArrayListMutableGraph graph = new ArrayListMutableGraph(1 + 10 + 100 + 1000);
+ for(int i = 1; i <= 10; i++) {
+ graph.addArc(0, i);
+ graph.addArc(i, 0);
+ for(int j = 1; j <= 10; j++) {
+ graph.addArc(i, i * 10 + j);
+ graph.addArc(i * 10 + j, i);
+ for(int k = 1; k <= 10; k++) {
+ graph.addArc(i * 10 + j, (i * 10 + j) * 10 + k);
+ graph.addArc((i * 10 + j) * 10 + k, i * 10 + j);
+ }
+ }
+ }
+
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(graph.immutableView(), 0, false, pl);
+ int componentSize = visit.visit(0);
+ for(int i = 1; i < graph.numNodes(); i++) {
+ visit.clear();
+ assertEquals("Source: " + i, componentSize, visit.visit(i));
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SequentialHyperBall.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SequentialHyperBall.java
new file mode 100644
index 0000000..57881f9
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SequentialHyperBall.java
@@ -0,0 +1,414 @@
+package it.unimi.dsi.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.io.SafelyCloseable;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+import it.unimi.dsi.webgraph.GraphClassParser;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Arrays;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** <p>Computes the approximate neighbourhood function of a graph using a sequential version of HyperBall.
+ *
+ * @author Paolo Boldi and Sebastiano Vigna
+ */
+
+public class SequentialHyperBall extends HyperLogLogCounterArray implements SafelyCloseable {
+ private static final Logger LOGGER = LoggerFactory.getLogger(SequentialHyperBall.class);
+ private static final boolean ASSERTS = true;
+
+ private static final long serialVersionUID = 1L;
+
+ protected final static int NODE_MASK = (int)(CHUNK_MASK >>> 6);
+
+ /** The graph whose neighbourhood function we are going to approximate. */
+ private final ImmutableGraph g;
+ /** The number of nodes of {@link #g}, cached. */
+ private final int numNodes;
+ /** The square of {@link #numNodes}, cached. */
+ private final double squareNumNodes;
+ /** The name of the temporary file that will be used to dump the new set of counters. */
+ private final File tempFile;
+ /** The file output stream on {@link #tempFile} for writing newly computed registers. */
+ private final FileOutputStream fos;
+ /** A data output stream wrapping {@link FileOutputStream }. */
+ private final DataOutputStream dos;
+ /** An input stream on {@link #tempFile} for reading newly computed registers. */
+ private final FastBufferedInputStream fbis;
+ /** A progress logger, or <code>null</code>. */
+ private final ProgressLogger pl;
+ /** A temporary array used by {@link #subtract(long[], long[], int)}. */
+ private final long accumulator[];
+ /** A temporary array used by {@link #subtract(long[], long[], int)}. */
+ private final long mask[];
+ /** The value computed by the last call to {@link #iterate()} . */
+ private double last;
+ /** Whether this approximator has been already closed. */
+ private boolean closed;
+
+ private final static int ensureEnoughRegisters(final int log2m) {
+ if (log2m < 4) throw new IllegalArgumentException("There must be at least 16 registers per counter");
+ return log2m;
+ }
+
+ /** Creates a new approximator for the neighbourhood function.
+ *
+ * @param g the graph whosee neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public SequentialHyperBall(final ImmutableGraph g, final int log2m, final ProgressLogger pl, final long seed) throws IOException {
+ super(g.numNodes(), g.numNodes(), ensureEnoughRegisters(log2m), seed);
+
+ if (pl != null) pl.logger().info("Precision: " + Util.format(100 * HyperLogLogCounterArray.relativeStandardDeviation(log2m)) + "% (" + m + " registers/counter, " + registerSize + " bits/counter)");
+
+ this.g = g;
+ this.pl = pl;
+
+ numNodes = g.numNodes();
+ squareNumNodes = (double)numNodes * numNodes;
+
+ tempFile = File.createTempFile(SequentialHyperBall.class.getName(), "temp");
+ tempFile.deleteOnExit();
+ dos = new DataOutputStream(new FastBufferedOutputStream(fos = new FileOutputStream(tempFile)));
+ fbis = new FastBufferedInputStream(new FileInputStream(tempFile));
+
+ accumulator = new long[counterLongwords];
+ mask = new long[counterLongwords];
+ }
+
+ /** Initialises the approximator.
+ *
+ * <p>This method must be call before a series of {@linkplain #iterate() iterations}.
+ */
+ public void init() {
+ if (pl != null) {
+ pl.itemsName = "iterates";
+ pl.start("Iterating...");
+ }
+ for(long[] a: bits) Arrays.fill(a, 0);
+ for(int i = numNodes; i-- != 0;) add(i, i);
+ last = numNodes;
+ }
+
+ @Override
+ public void close() throws IOException {
+ if (closed) return;
+ closed = true;
+ dos.close();
+ fbis.close();
+ tempFile.delete();
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ if (! closed) {
+ LOGGER.warn("This " + this.getClass().getName() + " [" + toString() + "] should have been closed.");
+ close();
+ }
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+
+ /** Performs a multiple precision subtraction, leaving the result in the first operand.
+ *
+ * @param x a vector of longs.
+ * @param y a vector of longs that will be subtracted from <code>x</code>.
+ * @param l the length of <code>x</code> and <code>y</code>.
+ */
+ private final static void subtract(final long[] x, final long[] y, final int l) {
+ boolean borrow = false;
+
+ for(int i = 0; i < l; i++) {
+ if (! borrow || x[i]-- != 0) borrow = x[i] < y[i] ^ x[i] < 0 ^ y[i] < 0; // This expression returns the result of an unsigned strict comparison.
+ x[i] -= y[i];
+ }
+ }
+
+ /** Computes the register-by-register maximum of two bit vectors.
+ *
+ * @param x first vector of longs, representing a bit vector in {@link LongArrayBitVector} format, where the result will be stored.
+ * @param y a second vector of longs, representing a bit vector in {@link LongArrayBitVector} format, that will be maximised with <code>x</code>.
+ * @param r the register size.
+ */
+
+ private final void max(final long[] x, final long[] y, final int r) {
+ final int l = x.length;
+
+ // Local copies of vectors used to store intermediate results.
+ final long[] accumulator = this.accumulator;
+ final long[] mask = this.mask;
+ final long[] msbMask = this.msbMask;
+
+ /* We work in two phases. Let H_r (msbMask) by the mask with the
+ * highest bit of each register (of size r) set, and L_r (lsbMask)
+ * be the mask with the lowest bit of each register set.
+ * We describe the algorithm on a single word.
+ *
+ * If the first phase we perform an unsigned strict register-by-register
+ * comparison of x and y, using the formula
+ *
+ * z = ((((x | H_r) - (y & ~H_r)) | (x ^ y))^ (x | ~y)) & H_r
+ *
+ * Then, we generate a register-by-register mask of all ones or
+ * all zeroes, depending on the result of the comparison, using the
+ * formula
+ *
+ * (((z >> r-1 | H_r) - L_r) | H_r) ^ z
+ *
+ * At that point, it is trivial to select from x and y the right values.
+ */
+
+ // We load x | H_r into the accumulator
+ for(int i = l; i-- != 0;) accumulator[i] = x[i] | msbMask[i];
+ // We subtract y & ~H_r, using mask as temporary storage
+ for(int i = l; i-- != 0;) mask[i] = y[i] & ~msbMask[i];
+ subtract(accumulator, mask, l);
+
+ // We OR with x ^ y, XOR with (x | ~y), and finally AND with H_r.
+ for(int i = l; i-- != 0;) accumulator[i] = ((accumulator[i] | (x[i] ^ y[i])) ^ (x[i] | ~y[i])) & msbMask[i];
+
+ if (ASSERTS) {
+ final LongBigList a = LongArrayBitVector.wrap(x).asLongBigList(r);
+ final LongBigList b = LongArrayBitVector.wrap(y).asLongBigList(r);
+ for(int i = 0; i < a.size(); i++) {
+ long pos = (i + 1) * (long)r - 1;
+ assert (a.getLong(i) < b.getLong(i)) == ((accumulator[(int)(pos / Long.SIZE)] & 1L << pos % Long.SIZE) != 0);
+ }
+ }
+
+ // We shift by r - 1 places and put the result into mask
+ final int rMinus1 = r - 1;
+ for(int i = l - 1; i-- != 0;) mask[i] = accumulator[i] >>> rMinus1 | accumulator[i + 1] << (Long.SIZE - rMinus1) | msbMask[i];
+ mask[l - 1] = accumulator[l - 1] >>> rMinus1 | msbMask[l - 1];
+
+ // We subtract L_r from mask
+ subtract(mask, lsbMask, l);
+
+ // We OR with H_r and XOR with the accumulator
+ for(int i = l; i-- != 0;) mask[i] = (mask[i] | msbMask[i]) ^ accumulator[i];
+
+ if (ASSERTS) {
+ final long[] t = x.clone();
+ LongBigList a = LongArrayBitVector.wrap(t).asLongBigList(r);
+ LongBigList b = LongArrayBitVector.wrap(y).asLongBigList(r);
+ for(int i = 0; i < Long.SIZE * l / r; i++) a.set(i, Math.max(a.getLong(i), b.getLong(i)));
+ // Note: this must be kept in sync with the line computing the result.
+ for(int i = l; i-- != 0;) assert t[i] == (mask[i] & x[i] | ~mask[i] & y[i]);
+ }
+
+ // Finally, we use mask to select the right bits from x and y and store the result.
+ for(int i = l; i-- != 0;) x[i] = mask[i] & x[i] | ~mask[i] & y[i];
+
+ }
+
+ private final void copyToLocal(final LongArrayBitVector chunk, final long[] t, final int node) {
+ // Offset in bits
+ final long counterLongwords = t.length;
+ long offset = (node << log2m & CHUNK_MASK) * registerSize;
+ // Note that we might copy a few bits in excess, but they will not be used anyway.
+ for(int i = 0; i < counterLongwords; i++, offset += Long.SIZE) t[i] = chunk.getLong(offset, Math.min(offset + Long.SIZE, chunk.length()));
+ }
+
+ /** Performs a new iteration of HyperBall.
+ *
+ * @return an approximation of the following value of the neighbourhood function (the
+ * first returned value is for distance one).
+ */
+ public double iterate() throws IOException {
+ final LongArrayBitVector bitVector[] = new LongArrayBitVector[bits.length];
+ for(int i = bits.length; i-- != 0;) bitVector[i] = LongArrayBitVector.wrap(bits[i]);
+
+ final NodeIterator nodeIterator = g.nodeIterator();
+ final int counterBits = registerSize << log2m;
+ final int nodeShift = this.counterShift;
+
+ final long t[] = new long[counterLongwords];
+ final long u[] = new long[counterLongwords];
+
+ final ProgressLogger nodeProgressLogger = pl == null ? null : new ProgressLogger(LOGGER, 10, TimeUnit.MINUTES, "nodes");
+
+ fbis.flush();
+ dos.flush();
+ fos.getChannel().position(0);
+
+ if (nodeProgressLogger != null) {
+ nodeProgressLogger.expectedUpdates = numNodes;
+ nodeProgressLogger.start("Scanning graph...");
+ }
+
+ for(int i = 0; i < numNodes; i++) {
+ nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ final int[] successor = nodeIterator.successorArray();
+ copyToLocal(bitVector[i >>> nodeShift], t, i);
+ while(d-- != 0) {
+ final int s = successor[d];
+ if (s != i) { // Self-loops to not influence the computation
+ copyToLocal(bitVector[s >>> nodeShift], u, s);
+ max(t, u, registerSize);
+ }
+ }
+
+ if (ASSERTS) {
+ LongBigList test = LongArrayBitVector.wrap(t).asLongBigList(registerSize);
+ for(int rr = 0; rr < m; rr++) {
+ int max = (int)registers[(int)((((long)i << log2m) + rr) >> CHUNK_SHIFT)].getLong((((long)i << log2m) + rr) & CHUNK_MASK);
+ for(int j = nodeIterator.outdegree(); j-- != 0;) max = Math.max(max, (int)registers[(int)((((long)successor[j] << log2m) + rr) >> CHUNK_SHIFT)].getLong((((long)successor[j] << log2m) + rr) & CHUNK_MASK));
+ assert max == test.getLong(rr) : max + "!=" + test.getLong(rr) + " [" + rr + "]";
+ }
+ }
+
+ // We store long-size padded bits.
+ BinIO.storeLongs(t, dos);
+
+ if (nodeProgressLogger != null) nodeProgressLogger.lightUpdate();
+ }
+
+ if (nodeProgressLogger != null) nodeProgressLogger.done();
+
+ dos.flush();
+ fbis.position(0);
+ final DataInputStream dis = new DataInputStream(fbis);
+
+ for(int i = 0; i < bitVector.length; i++) {
+ final int numCounters = (int)(registers[i].size64() >> log2m);
+ bitVector[i].clear();
+ for(int j = 0; j < numCounters; j++) {
+ // We read long-size padded bits and store just the useful part.
+ BinIO.loadLongs(dis, t);
+ bitVector[i].append(LongArrayBitVector.wrap(t).subVector(0, counterBits));
+ }
+ }
+
+ double result = 0, c = 0, y, z;
+ // Kahan summation
+ for(int i = numNodes; i-- != 0;) {
+ y = count(i) - c;
+ z = result + y;
+ c = (z - result) - y;
+ result = z;
+ }
+
+ if (pl != null) {
+ pl.update();
+ pl.logger().info("Pairs: " + result + " (" + 100.0 * result / squareNumNodes + "%)");
+ }
+
+ if (result < last) result = last;
+ last = result;
+ return result;
+ }
+
+ /** Returns an approximation of the neighbourhood function.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ * @param threshold a value that will be used to stop the computation either by absolute or relative increment.
+ * @return an approximation of the neighbourhood function.
+ */
+ public double[] approximateNeighbourhoodFunction(long upperBound, double threshold) throws IOException {
+ DoubleArrayList approximateNeighbourhoodFunction = new DoubleArrayList();
+ upperBound = Math.min(upperBound, numNodes);
+ double last;
+ approximateNeighbourhoodFunction.add(last = numNodes);
+ init();
+
+ for(long i = 0; i < upperBound; i++) {
+ final double current = iterate();
+ LOGGER.info("Absolute increment: " + (current - last));
+ if (current - last <= threshold) {
+ LOGGER.info("Terminating approximation after " + i + " iteration(s) by absolute bound");
+ break;
+ }
+
+ LOGGER.info("Relative increment: " + (current / last));
+ if (i > 3 && current / last < (1 + threshold)) {
+ LOGGER.info("Terminating approximation after " + i + " iteration(s) by relative bound");
+ break;
+ }
+ approximateNeighbourhoodFunction.add(last = current);
+ }
+
+ if (pl != null) pl.done();
+ return approximateNeighbourhoodFunction.toDoubleArray();
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException, IllegalArgumentException, ClassNotFoundException, IllegalAccessException, InvocationTargetException, InstantiationException, NoSuchMethodException {
+ SimpleJSAP jsap = new SimpleJSAP(SequentialHyperBall.class.getName(), "Prints an approximation of the neighbourhood function.",
+ new Parameter[] {
+ new FlaggedOption("log2m", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'l', "log2m", "The logarithm of the number of registers."),
+ new FlaggedOption("upperBound", JSAP.LONGSIZE_PARSER, Long.toString(Long.MAX_VALUE), JSAP.NOT_REQUIRED, 'u', "upper-bound", "An upper bound to the number of iteration (default: the graph size)."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(1E-3), JSAP.NOT_REQUIRED, 't', "threshould", "A threshould that will be used to stop the computation by absolute or relative increment."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean spec = jsapResult.getBoolean("spec");
+ final String basename = jsapResult.getString("basename");
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ final int log2m = jsapResult.getInt("log2m");
+
+ final ImmutableGraph graph = spec ? ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE) : ImmutableGraph.loadOffline(basename);
+
+ SequentialHyperBall shb = new SequentialHyperBall(graph, log2m, pl, Util.randomSeed());
+ TextIO.storeDoubles(shb.approximateNeighbourhoodFunction(jsapResult.getLong("upperBound"), jsapResult.getDouble("threshold")), System.out);
+ shb.close();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/StronglyConnectedComponentsTarjan.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/StronglyConnectedComponentsTarjan.java
new file mode 100644
index 0000000..ed4f6d6
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/StronglyConnectedComponentsTarjan.java
@@ -0,0 +1,318 @@
+package it.unimi.dsi.webgraph.algo;
+
+import java.util.BitSet;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.fastutil.ints.IntArrayList;
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntStack;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+/** Computes the strongly connected components (and optionally the buckets) of an immutable graph.
+ *
+ * <p>This class is a double implementation for debugging purposes.
+ *
+ * <p>The {@link #compute(ImmutableGraph, boolean, ProgressLogger)} method of this class will return
+ * an instance that contains the data computed by running a variant of Tarjan's algorithm on an immutable graph.
+ * The implementation is iterative, rather than recursive, to work around known limitations on the size of
+ * the stack in current JVMs.
+ * Besides the usually strongly connected components, it is possible to compute the <em>buckets</em> of the
+ * graph, that is, nodes belonging to components that are terminal, but not dangling, in the component DAG.
+ *
+ * <p>After getting an instance, it is possible to run the {@link #computeSizes()} and {@link #sortBySize(int[])}
+ * methods to obtain further information. This scheme has been devised to exploit the available memory as much
+ * as possible&mdash;after the components have been computed, the returned instance keeps no track of
+ * the graph, and the related memory can be freed by the garbage collector.
+ *
+ * <h2>Stack size</h2>
+ *
+ * <p>The method {@link #compute(ImmutableGraph, boolean, ProgressLogger)} might require a large stack size,
+ * that should be set using suitable JVM options. Note, however,
+ * that the stack size must be enlarged also on the operating-system side&mdash;for instance, using <code>ulimit -s unlimited</code>.
+ */
+
+
+public class StronglyConnectedComponentsTarjan {
+ /** The number of strongly connected components. */
+ final public int numberOfComponents;
+ /** The component of each node. */
+ final public int component[];
+ /** The bit set for buckets, or <code>null</code>, in which case buckets have not been computed. */
+ final public BitSet buckets;
+
+ protected StronglyConnectedComponentsTarjan(final int numberOfComponents, final int[] component, final BitSet buckets) {
+ this.numberOfComponents = numberOfComponents;
+ this.component = component;
+ this.buckets = buckets;
+ }
+
+ private final static class Visit {
+ /** The graph. */
+ private final ImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final int n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final int status[];
+ /** The buckets. */
+ private final BitSet buckets;
+ /** The component stack. */
+ private final IntStack stack;
+
+ /** The first-visit clock (incremented at each visited node). */
+ private int clock;
+ /** The number of components already output. */
+ private int numberOfComponents;
+
+ private Visit(final ImmutableGraph graph, final int[] status, final BitSet buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ stack = new IntArrayList(n);
+ }
+
+ /** Visits a node.
+ *
+ * @param x the node to visit.
+ * @return true if <code>x</code> is a bucket.
+ */
+ private boolean visit(final int x) {
+ final int[] status = this.status;
+ if (pl != null) pl.lightUpdate();
+ status[x] = ++clock;
+ stack.push(x);
+
+ int d = graph.outdegree(x);
+ boolean noOlderNodeFound = true, isBucket = d != 0; // If we're dangling we're certainly not a bucket.
+
+ if (d != 0) {
+ final LazyIntIterator successors = graph.successors(x);
+ while(d-- != 0) {
+ final int s = successors.nextInt();
+ // If we can reach a non-bucket or another component we are not a bucket.
+ if (status[s] == 0 && ! visit(s) || status[s] < 0) isBucket = false;
+ if (status[s] > 0 && status[s] < status[x]) {
+ status[x] = status[s];
+ noOlderNodeFound = false;
+ }
+ }
+ }
+
+ if (noOlderNodeFound) {
+ numberOfComponents++;
+ int z;
+ do {
+ z = stack.popInt();
+ // Component markers are -c-1, where c is the component number.
+ status[z] = -numberOfComponents;
+ if (isBucket && computeBuckets) buckets.set(z);
+ } while(z != x);
+ }
+
+ return isBucket;
+ }
+
+
+ public void run() {
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (int x = 0; x < n; x++) if (status[x] == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for (int x = n; x-- != 0;) status[x] = -status[x] - 1;
+
+ stack.push(numberOfComponents); // Horrible kluge to return the number of components.
+ }
+ }
+
+ /** Computes the strongly connected components of a given graph.
+ *
+ * @param graph the graph whose strongly connected components are to be computed.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponentsTarjan compute(final ImmutableGraph graph, final boolean computeBuckets, final ProgressLogger pl) {
+ final int n = graph.numNodes();
+ final Visit visit = new Visit(graph, new int[n], computeBuckets ? new BitSet(n) : null, pl);
+ visit.run();
+ return new StronglyConnectedComponentsTarjan(visit.numberOfComponents, visit.status, visit.buckets);
+ }
+
+
+ private final static class FilteredVisit {
+ /** The graph. */
+ private final ArcLabelledImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final int n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** A filter on arc labels. */
+ private final LabelledArcFilter filter;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final int status[];
+ /** The buckets. */
+ private final BitSet buckets;
+ /** The component stack. */
+ private final IntStack stack;
+
+
+ /** The first-visit clock (incremented at each visited node). */
+ private int clock;
+ /** The number of components already output. */
+ private int numberOfComponents;
+
+ private FilteredVisit(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final int[] status, final BitSet buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.filter = filter;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ stack = new IntArrayList(n);
+ }
+
+ /** Visits a node.
+ *
+ * @param x the node to visit.
+ * @return true if <code>x</code> is a bucket.
+ */
+ private boolean visit(final int x) {
+ final int[] status = this.status;
+ if (pl != null) pl.lightUpdate();
+ status[x] = ++clock;
+ stack.push(x);
+
+ int d = graph.outdegree(x), filteredDegree = 0;
+ boolean noOlderNodeFound = true, isBucket = true;
+
+ if (d != 0) {
+ final LabelledArcIterator successors = graph.successors(x);
+ while(d-- != 0) {
+ final int s = successors.nextInt();
+ if (! filter.accept(x, s, successors.label())) continue;
+ filteredDegree++;
+ // If we can reach a non-bucket or another component we are not a bucket.
+ if (status[s] == 0 && ! visit(s) || status[s] < 0) isBucket = false;
+ if (status[s] > 0 && status[s] < status[x]) {
+ status[x] = status[s];
+ noOlderNodeFound = false;
+ }
+ }
+ }
+
+ if (filteredDegree == 0) isBucket = false;
+
+ if (noOlderNodeFound) {
+ numberOfComponents++;
+ int z;
+ do {
+ z = stack.popInt();
+ // Component markers are -c-1, where c is the component number.
+ status[z] = -numberOfComponents;
+ if (isBucket && computeBuckets) buckets.set(z);
+ } while(z != x);
+ }
+
+ return isBucket;
+ }
+
+
+ public void run() {
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (int x = 0; x < n; x++) if (status[x] == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for (int x = n; x-- != 0;) status[x] = -status[x] - 1;
+
+ stack.push(numberOfComponents); // Horrible kluge to return the number of components.
+ }
+ }
+
+ /** Computes the strongly connected components of a given arc-labelled graph, filtering its arcs.
+ *
+ * @param graph the arc-labelled graph whose strongly connected components are to be computed.
+ * @param filter a filter selecting the arcs that must be taken into consideration.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponentsTarjan compute(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final boolean computeBuckets, final ProgressLogger pl) {
+ final int n = graph.numNodes();
+ FilteredVisit filteredVisit = new FilteredVisit(graph, filter, new int[n], computeBuckets ? new BitSet(n) : null, pl);
+ filteredVisit.run();
+ return new StronglyConnectedComponentsTarjan(filteredVisit.numberOfComponents, filteredVisit.status, filteredVisit.buckets);
+ }
+
+
+ /** Returns the size array for this set of strongly connected components.
+ *
+ * @return the size array for this set of strongly connected components.
+ */
+ public int[] computeSizes() {
+ final int[] size = new int[numberOfComponents];
+ for(int i = component.length; i-- != 0;) size[component[i]]++;
+ return size;
+ }
+
+ /** Renumbers by decreasing size the components of this set.
+ *
+ * <p>After a call to this method, both the internal status of this class and the argument
+ * array are permuted so that the sizes of strongly connected components are decreasing
+ * in the component index.
+ *
+ * @param size the components sizes, as returned by {@link #computeSizes()}.
+ */
+ public void sortBySize(final int[] size) {
+ final int[] perm = Util.identity(size.length);
+ IntArrays.quickSort(perm, 0, perm.length, (x,y) -> Integer.compare(size[y], size[x]));
+ final int[] copy = size.clone();
+ for (int i = size.length; i-- != 0;) size[i] = copy[perm[i]];
+ Util.invertPermutationInPlace(perm);
+ for(int i = component.length; i-- != 0;) component[i] = perm[component[i]];
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/StronglyConnectedComponentsTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/StronglyConnectedComponentsTest.java
new file mode 100644
index 0000000..b82cb82
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/StronglyConnectedComponentsTest.java
@@ -0,0 +1,120 @@
+package it.unimi.dsi.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+public class StronglyConnectedComponentsTest extends WebGraphTestCase {
+
+ public static void sameComponents(final int l, final StronglyConnectedComponentsTarjan componentsRecursive, final StronglyConnectedComponents componentsIterative) {
+ final LongOpenHashSet[] recursiveComponentsSet = new LongOpenHashSet[componentsRecursive.numberOfComponents];
+ final LongOpenHashSet[] iterativeComponentsSet = new LongOpenHashSet[componentsIterative.numberOfComponents];
+
+ for(int i = recursiveComponentsSet.length; i-- != 0;) {
+ recursiveComponentsSet[i] = new LongOpenHashSet();
+ iterativeComponentsSet[i] = new LongOpenHashSet();
+ }
+
+ for(int i = l; i-- != 0;) {
+ recursiveComponentsSet[componentsRecursive.component[i]].add(i);
+ iterativeComponentsSet[componentsIterative.component[i]].add(i);
+ }
+
+ assertEquals(new ObjectOpenHashSet<>(recursiveComponentsSet), new ObjectOpenHashSet<>(iterativeComponentsSet));
+ }
+
+ @Test
+ public void testBuckets() {
+ final ImmutableGraph g = new ArrayListMutableGraph(9,
+ new int[][] { { 0, 0 }, { 1, 0 }, { 1, 2 },
+ { 2, 1 }, { 2, 3 }, { 2, 4 }, { 2, 5 },
+ { 3, 4 }, { 4, 3 },
+ { 5, 5 }, { 5, 6 }, { 5, 7 }, { 5, 8 },
+ { 6, 7 },
+ { 8, 7 } }
+ ).immutableView();
+
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(g, true, null);
+
+ LongArrayBitVector buckets = LongArrayBitVector.ofLength(g.numNodes());
+ buckets.set(0, true);
+ buckets.set(3, true);
+ buckets.set(4, true);
+ assertEquals(buckets, components.buckets);
+ assertEquals(3, buckets.count());
+
+ final int[] size = components.computeSizes();
+ components.sortBySize(size);
+
+ assertEquals(2, size[0]);
+ assertEquals(2, size[1]);
+ assertEquals(1, size[2]);
+ assertEquals(1, size[3]);
+ assertEquals(1, size[4]);
+ assertEquals(1, size[5]);
+ assertEquals(1, size[6]);
+
+ StronglyConnectedComponents.compute(g, false, null); // To increase coverage
+ }
+
+
+ @Test
+ public void testBuckets2() {
+ final ImmutableGraph g = new ArrayListMutableGraph(4,
+ new int[][] { { 0, 1 }, { 1, 2 }, { 2, 0 }, { 1, 3 }, { 3, 3 } }
+ ).immutableView();
+
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(g, true, null);
+
+ LongArrayBitVector buckets = LongArrayBitVector.ofLength(g.numNodes());
+ buckets.set(3);
+ assertEquals(buckets, components.buckets);
+ assertEquals(1, buckets.count());
+ }
+
+ @Test
+ public void testCompleteGraph() {
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(ArrayListMutableGraph.newCompleteGraph(5, false).immutableView(), true, null);
+ assertEquals(5, components.buckets.count());
+ for(int i = 5; i-- != 0;) assertEquals(0, components.component[i]);
+ assertEquals(5, components.computeSizes()[0]);
+ }
+
+ @Test
+ public void testNoBuckets() {
+ StronglyConnectedComponents.compute(ArrayListMutableGraph.newCompleteGraph(5, false).immutableView(), false, null);
+ }
+
+ @Test
+ public void testWithProgressLogger() {
+ StronglyConnectedComponents.compute(ArrayListMutableGraph.newCompleteGraph(5, false).immutableView(), true, new ProgressLogger());
+ }
+
+ @Test
+ public void testTree() {
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView(), true, null);
+ assertEquals(0, components.buckets.count());
+ assertEquals(15, components.numberOfComponents);
+ }
+
+ @Test
+ public void testErdosRrenyi() {
+ for(int size: new int[] { 10, 100, 1000 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ final ImmutableGraph view = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .05, attempt + 1, false)).immutableView();
+ final StronglyConnectedComponentsTarjan componentsRecursive = StronglyConnectedComponentsTarjan.compute(view, true, null);
+ final StronglyConnectedComponents componentsIterative = StronglyConnectedComponents.compute(view, true, null);
+ assertEquals(componentsRecursive.numberOfComponents, componentsIterative.numberOfComponents);
+ sameComponents(size, componentsRecursive, componentsIterative);
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SumSweepDirectedDiameterRadiusTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SumSweepDirectedDiameterRadiusTest.java
new file mode 100644
index 0000000..9d2adeb
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SumSweepDirectedDiameterRadiusTest.java
@@ -0,0 +1,339 @@
+package it.unimi.dsi.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.algo.SumSweepDirectedDiameterRadius.OutputLevel;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+//RELEASE-STATUS: DIST
+
+/**
+ * This class tests the {@link SumSweepDirectedDiameterRadius} class.
+ *
+ * @author Michele Borassi
+ */
+public class SumSweepDirectedDiameterRadiusTest {
+
+ @Test
+ public void testPath() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(3,
+ new int[][] { { 0, 1 }, { 1, 2 }, { 2, 1 }, { 1, 0 } }).immutableView();
+
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, OutputLevel.ALL, null,
+ new ProgressLogger());
+ ss.compute();
+
+ assertEquals(ss.getEccentricity(0, true), 2);
+ assertEquals(ss.getEccentricity(1, true), 1);
+ assertEquals(ss.getEccentricity(2, true), 2);
+ assertEquals(ss.getEccentricity(0, false), 2);
+ assertEquals(ss.getEccentricity(0, false), 2);
+ assertEquals(ss.getEccentricity(0, false), 2);
+ assertEquals(ss.getDiameter(), 2);
+ assertEquals(ss.getRadius(), 1);
+ assertEquals(ss.getRadialVertex(), 1);
+ assertTrue(ss.getDiametralVertex() == 2 || ss.getDiametralVertex() == 0);
+ }
+
+ @Test
+ public void testManySCC() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(7,
+ new int[][] { { 0, 1 }, { 1, 0 }, { 1, 2 }, { 2, 1 }, { 6, 2 }, { 2, 6 }, { 3, 4 }, { 4, 3 }, { 4, 5 },
+ { 5, 4 }, { 0, 3 }, { 0, 4 }, { 1, 5 }, { 1, 4 }, { 2, 5 } }).immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, OutputLevel.RADIUS, null,
+ new ProgressLogger());
+
+ ss.compute();
+ assertEquals(ss.getRadius(), 2);
+ assertEquals(ss.getRadialVertex(), 1);
+ }
+
+ @Test
+ public void testLozenge() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(4,
+ new int[][] { { 0, 1 }, { 1, 0 }, { 0, 2 }, { 1, 3 }, { 2, 3 } }).immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, OutputLevel.RADIUS, null,
+ new ProgressLogger());
+
+ ss.compute();
+ assertEquals(ss.getRadius(), 2);
+ assertTrue(ss.getRadialVertex() == 0 || ss.getRadialVertex() == 1);
+ assertTrue(ss.getEccentricity(ss.getRadialVertex(), true) == ss.getRadius());
+ }
+
+ @Test
+ public void testManyDirPath() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(19,
+ new int[][] { { 0, 1 }, { 1, 2 }, { 2, 3 }, { 3, 4 }, { 5, 6 }, { 6, 7 }, { 7, 8 }, { 8, 9 }, { 9, 10 },
+ { 10, 18 }, { 11, 12 }, { 13, 14 }, { 14, 15 }, { 15, 16 }, { 16, 17 } }).immutableView();
+
+ boolean accRadial[] = new boolean[19];
+ accRadial[16] = true;
+ accRadial[8] = true;
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, OutputLevel.ALL, accRadial,
+ new ProgressLogger());
+ ss.compute();
+ assertEquals(ss.getDiameter(), 6);
+ assertEquals(ss.getRadius(), 1);
+
+ assertTrue(ss.getRadialVertex() == 16);
+ assertTrue(ss.getDiametralVertex() == 5 || ss.getDiametralVertex() == 18);
+
+ }
+
+ @Test
+ public void testCycle() {
+ for (int size : new int[] { 3, 5, 7 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newDirectedCycle(size).immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph,
+ OutputLevel.RADIUS_DIAMETER, null, new ProgressLogger());
+ ss.compute();
+
+ assertEquals(ss.getDiameter(), size - 1);
+ assertEquals(ss.getRadius(), size - 1);
+
+ assertTrue(ss.getEccentricity(ss.getRadialVertex(), true) == ss.getRadius());
+ assertTrue(ss.getEccentricity(ss.getDiametralVertex(), true) == ss.getDiameter());
+ }
+ }
+
+ @Test
+ public void testClique() {
+ for (int size : new int[] { 10, 50, 100 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ boolean accRadius[] = new boolean[graph.numNodes()];
+ accRadius[(int) (Math.random() * size)] = true;
+ accRadius[(int) (Math.random() * size)] = true;
+ accRadius[(int) (Math.random() * size)] = true;
+
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, OutputLevel.ALL,
+ accRadius, new ProgressLogger());
+ ss.compute();
+
+ for (int i = 0; i < size; i++) {
+ assertEquals(ss.getEccentricity(i, true), 1);
+ }
+ assertTrue(accRadius[ss.getRadialVertex()]);
+ }
+ }
+
+ @Test
+ public void testSparse() {
+ // Used to test the behavior if the centrality of only few vertices
+ // is defined (in the extreme case, if the graph is empty).
+ final ImmutableGraph emptygraph = new ArrayListMutableGraph(100, new int[][] {}).immutableView();
+
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(emptygraph, OutputLevel.ALL, null,
+ new ProgressLogger());
+ ss.compute();
+ assertEquals(ss.getRadius(), 0);
+ assertEquals(ss.getDiameter(), 0);
+
+ final ImmutableGraph graphfewedges = new ArrayListMutableGraph(100,
+ new int[][] { { 10, 32 }, { 10, 65 }, { 65, 10 }, { 21, 44 } }).immutableView();
+ final SumSweepDirectedDiameterRadius ss1 = new SumSweepDirectedDiameterRadius(graphfewedges, OutputLevel.RADIUS,
+ null, null);
+ ss1.compute();
+ assertEquals(ss1.getRadius(), 1);
+ assertEquals(ss1.getRadialVertex(), 10);
+ }
+
+ public int[] computeAllEccentricities(ImmutableGraph g) {
+ int[] ecc = new int[g.numNodes()];
+ for (int v = 0; v < g.numNodes(); v++) {
+ ParallelBreadthFirstVisit bfs = new ParallelBreadthFirstVisit(g, v, true, null);
+ bfs.visit(v, -1);
+ ecc[v] = bfs.cutPoints.size() - 2;
+ }
+ return ecc;
+ }
+
+ @Test
+ public void testRandomSumSweepHeuristic() {
+ for (double p : new double[] { .1, .2, .5, .7 }) {
+ for (int size : new int[] { 10, 30, 50 }) {
+ final boolean[] accRadial = new boolean[size];
+ for (int i = 0; i < size / 2; i++) {
+ accRadial[(int) (Math.random() * size)] = true;
+ }
+ final ImmutableGraph graph = new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false))
+ .immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, OutputLevel.ALL,
+ accRadial, new ProgressLogger());
+ ss.sumSweepHeuristic((int) (Math.random() * size), Math.min(10, (int) (2 + Math.random() * size)));
+
+ int[] ecc = computeAllEccentricities(graph);
+ int[] eccRev = computeAllEccentricities(Transform.transpose(graph));
+
+ for (int v = 0; v < size; v++) {
+ assertTrue(ss.lF[v] <= ecc[v]);
+ assertTrue(ss.lB[v] <= eccRev[v]);
+ assertTrue(ss.uF[v] >= ecc[v]);
+ assertTrue(ss.uB[v] >= eccRev[v]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testRandom() {
+ for (int t = 0; t < 100; t++) {
+ double p = Math.random();
+ if (p < 1.0E-12) {
+ continue;
+ }
+ int size = (int) (Math.random() * 50) + 2;
+ final boolean[] accRadial = new boolean[size];
+ for (int i = (int) (Math.random() * size + 1); i >= 0; i--) {
+ accRadial[(int) (Math.random() * size)] = true;
+ }
+ final ImmutableGraph graph = new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false))
+ .immutableView();
+
+ OutputLevel output;
+ switch ((int) (Math.random() * 5)) {
+ case 0:
+ output = OutputLevel.RADIUS;
+ break;
+ case 1:
+ output = OutputLevel.DIAMETER;
+ break;
+ case 2:
+ output = OutputLevel.RADIUS_DIAMETER;
+ break;
+ case 3:
+ output = OutputLevel.ALL_FORWARD;
+ break;
+ default:
+ output = OutputLevel.ALL;
+ break;
+ }
+
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(graph, output, accRadial,
+ null);
+ ss.compute();
+ int[] ecc = computeAllEccentricities(graph);
+ int D = 0, R = size;
+ for (int v = 0; v < size; v++) {
+ D = Math.max(D, ecc[v]);
+ if (accRadial[v]) {
+ R = Math.min(R, ecc[v]);
+ }
+ }
+ if (output == OutputLevel.RADIUS || output == OutputLevel.RADIUS_DIAMETER || output == OutputLevel.ALL_FORWARD
+ || output == OutputLevel.ALL) {
+ assertEquals(ss.getRadius(), R);
+ assertEquals(ss.getEccentricity(ss.getRadialVertex(), true), R);
+ }
+ if (output == OutputLevel.DIAMETER || output == OutputLevel.RADIUS_DIAMETER || output == OutputLevel.ALL_FORWARD
+ || output == OutputLevel.ALL) {
+ assertEquals(ss.getDiameter(), D);
+ int maxEcc = -1;
+ try {
+ maxEcc = ss.getEccentricity(ss.getDiametralVertex(), true);
+ } catch (UnsupportedOperationException e) {
+ }
+ try {
+ maxEcc = Math.max(maxEcc, ss.getEccentricity(ss.getDiametralVertex(), false));
+ } catch (UnsupportedOperationException e) {
+ }
+ assertEquals(maxEcc, D);
+ }
+
+ if (output == OutputLevel.ALL_FORWARD) {
+ for (int v = 0; v < size; v++) {
+ assertEquals(ss.getEccentricity(v, true), ecc[v]);
+ }
+ }
+ }
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAccRadial() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final boolean accRadial[] = new boolean[4];
+ @SuppressWarnings("unused")
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.RADIUS, accRadial,
+ new ProgressLogger());
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAccRadial1() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final boolean accRadial[] = new boolean[1];
+ @SuppressWarnings("unused")
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.RADIUS_DIAMETER,
+ accRadial, new ProgressLogger());
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAccRadial2() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final boolean accRadial[] = new boolean[3];
+ @SuppressWarnings("unused")
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.ALL, accRadial,
+ new ProgressLogger());
+ }
+
+ @Test
+ public void testEmptyAccRadial() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final boolean accRadial[] = new boolean[2];
+ // @SuppressWarnings("unused")
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.ALL, accRadial,
+ new ProgressLogger());
+ ss.compute();
+ assertEquals(ss.getRadius(), Integer.MAX_VALUE);
+ }
+
+ @Test
+ public void testEmptyGraph() {
+ final ImmutableGraph g = new ArrayListMutableGraph(0, new int[][] {}).immutableView();
+ // @SuppressWarnings("unused")
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.ALL, null,
+ new ProgressLogger());
+ ss.compute();
+ assertEquals(ss.getRadius(), Integer.MAX_VALUE);
+ assertEquals(ss.getDiameter(), 0);
+
+ final ImmutableGraph h = new ArrayListMutableGraph(2, new int[][] {}).immutableView();
+ // @SuppressWarnings("unused")
+ final SumSweepDirectedDiameterRadius ssh = new SumSweepDirectedDiameterRadius(h, OutputLevel.ALL, null,
+ new ProgressLogger());
+ ssh.compute();
+ assertEquals(ssh.getRadius(), 0);
+ assertEquals(ssh.getDiameter(), 0);
+ }
+
+ @Test(expected = UnsupportedOperationException.class)
+ public void testNotComputedR() {
+ final ImmutableGraph g = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.RADIUS, null,
+ new ProgressLogger());
+ ss.sumSweepHeuristic(0, 1);
+ ss.getRadius();
+ }
+
+ @Test(expected = UnsupportedOperationException.class)
+ public void testNotComputedD() {
+ final ImmutableGraph g = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.RADIUS, null,
+ new ProgressLogger());
+ ss.sumSweepHeuristic(0, 1);
+ ss.getDiameter();
+ }
+
+ @Test(expected = UnsupportedOperationException.class)
+ public void testNotComputedEcc() {
+ final ImmutableGraph g = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView();
+ final SumSweepDirectedDiameterRadius ss = new SumSweepDirectedDiameterRadius(g, OutputLevel.RADIUS, null,
+ new ProgressLogger());
+ ss.getEccentricity(2, true);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SumSweepUndirectedDiameterRadiusTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SumSweepUndirectedDiameterRadiusTest.java
new file mode 100644
index 0000000..0a6e639
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/SumSweepUndirectedDiameterRadiusTest.java
@@ -0,0 +1,271 @@
+package it.unimi.dsi.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.algo.SumSweepUndirectedDiameterRadius.OutputLevel;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+//RELEASE-STATUS: DIST
+
+/**
+ * This class tests the {@link SumSweepUndirectedDiameterRadius} class.
+ *
+ * @author Michele Borassi
+ */
+public class SumSweepUndirectedDiameterRadiusTest {
+
+ @Test
+ public void testPath() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(3,
+ new int[][] { { 0, 1 }, { 1, 2 }, { 2, 1 }, { 1, 0 } }).immutableView();
+
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, OutputLevel.ALL,
+ new ProgressLogger());
+ ss.compute();
+
+ assertEquals(ss.getEccentricity(0), 2);
+ assertEquals(ss.getEccentricity(1), 1);
+ assertEquals(ss.getEccentricity(2), 2);
+ assertEquals(ss.getDiameter(), 2);
+ assertEquals(ss.getRadius(), 1);
+ assertEquals(ss.getRadialVertex(), 1);
+ assertTrue(ss.getDiametralVertex() == 2 || ss.getDiametralVertex() == 0);
+ }
+
+ @Test
+ public void testStar() {
+ final ImmutableGraph graph = Transform.symmetrize(new ArrayListMutableGraph(9,
+ new int[][] { { 0, 1 }, { 1, 2 }, { 0, 3 }, { 3, 4 }, { 0, 5 }, { 5, 6 }, { 0, 7 }, { 7, 8 } })
+ .immutableView());
+
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, OutputLevel.ALL,
+ new ProgressLogger());
+ ss.compute();
+
+ assertEquals(ss.getEccentricity(0), 2);
+ assertEquals(ss.getEccentricity(1), 3);
+ assertEquals(ss.getEccentricity(2), 4);
+ assertEquals(ss.getEccentricity(3), 3);
+ assertEquals(ss.getEccentricity(4), 4);
+ assertEquals(ss.getEccentricity(5), 3);
+ assertEquals(ss.getEccentricity(6), 4);
+ assertEquals(ss.getEccentricity(7), 3);
+ assertEquals(ss.getEccentricity(8), 4);
+
+ assertEquals(ss.getDiameter(), 4);
+ assertEquals(ss.getRadius(), 2);
+ assertEquals(ss.getRadialVertex(), 0);
+ }
+
+ @Test
+ public void testLozenge() {
+ final ImmutableGraph graph = Transform.symmetrize(
+ new ArrayListMutableGraph(4, new int[][] { { 0, 1 }, { 1, 0 }, { 0, 2 }, { 1, 3 }, { 2, 3 } })
+ .immutableView());
+
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, OutputLevel.RADIUS,
+ new ProgressLogger());
+
+ ss.compute();
+ assertEquals(ss.getRadius(), 2);
+ assertTrue(ss.getEccentricity(ss.getRadialVertex()) == ss.getRadius());
+ }
+
+ @Test
+ public void testCycle() {
+ for (int size : new int[] { 3, 5, 7 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newBidirectionalCycle(size).immutableView();
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph,
+ OutputLevel.RADIUSDIAMETER, new ProgressLogger());
+ ss.compute();
+
+ assertEquals(ss.getDiameter(), size / 2);
+ assertEquals(ss.getRadius(), size / 2);
+
+ assertTrue(ss.getEccentricity(ss.getRadialVertex()) == ss.getRadius());
+ assertTrue(ss.getEccentricity(ss.getDiametralVertex()) == ss.getDiameter());
+ }
+ }
+
+ @Test
+ public void testClique() {
+ for (int size : new int[] { 10, 50, 100 }) {
+ final ImmutableGraph graph = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, OutputLevel.ALL,
+ new ProgressLogger());
+ ss.compute();
+
+ for (int i = 0; i < size; i++) {
+ assertEquals(ss.getEccentricity(i), 1);
+ }
+ assertEquals(ss.getDiameter(), 1);
+ assertEquals(ss.getRadius(), 1);
+ }
+ }
+
+ @Test
+ public void testSparse() {
+ final ImmutableGraph emptygraph = new ArrayListMutableGraph(100, new int[][] {}).immutableView();
+
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(emptygraph, OutputLevel.ALL,
+ null);
+ ss.compute();
+ assertEquals(ss.getRadius(), 0);
+ assertEquals(ss.getDiameter(), 0);
+
+ final ImmutableGraph graphfewedges = Transform.symmetrize(
+ new ArrayListMutableGraph(100, new int[][] { { 10, 32 }, { 10, 65 }, { 65, 10 }, { 21, 44 } })
+ .immutableView());
+ final SumSweepUndirectedDiameterRadius ss1 = new SumSweepUndirectedDiameterRadius(graphfewedges,
+ OutputLevel.RADIUS, null);
+ ss1.compute();
+ assertEquals(ss1.getRadius(), 0);
+ }
+
+ public int[] computeAllEccentricities(ImmutableGraph g) {
+ int[] ecc = new int[g.numNodes()];
+ for (int v = 0; v < g.numNodes(); v++) {
+ ParallelBreadthFirstVisit bfs = new ParallelBreadthFirstVisit(g, v, true, null);
+ bfs.visit(v, -1);
+ ecc[v] = bfs.cutPoints.size() - 2;
+ }
+ return ecc;
+ }
+
+ @Test
+ public void testRandomSumSweepHeuristic() {
+ for (double p : new double[] { .1, .2, .5, .7 }) {
+ for (int size : new int[] { 10, 30, 50 }) {
+ final ImmutableGraph graph = Transform
+ .symmetrize(new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false)).immutableView());
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, OutputLevel.ALL,
+ new ProgressLogger());
+ ss.sumSweepHeuristic((int) (Math.random() * size), Math.min(10, (int) (2 + Math.random() * size)));
+
+ int[] ecc = computeAllEccentricities(graph);
+
+ for (int v = 0; v < size; v++) {
+ assertTrue(ss.l[v] <= ecc[v]);
+ assertTrue(ss.u[v] >= ecc[v]);
+ assertTrue(ss.ecc[v] == -1 || ss.ecc[v] == ecc[v]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testRandom() {
+ for (int t = 0; t < 100; t++) {
+ double p = Math.random();
+ if (p < 1.0E-12) {
+ continue;
+ }
+ int size = (int) (Math.random() * 50) + 2;
+
+ final ImmutableGraph graph = Transform
+ .symmetrize(new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false)).immutableView());
+
+ OutputLevel output;
+ switch ((int) (Math.random() * 4)) {
+ case 0:
+ output = OutputLevel.RADIUS;
+ break;
+ case 1:
+ output = OutputLevel.DIAMETER;
+ break;
+ case 2:
+ output = OutputLevel.RADIUSDIAMETER;
+ break;
+ default:
+ output = OutputLevel.ALL;
+ break;
+ }
+
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(graph, output, null);
+ ss.compute();
+ int[] ecc = computeAllEccentricities(graph);
+ int D = 0, R = size;
+ for (int v = 0; v < size; v++) {
+ D = Math.max(D, ecc[v]);
+ R = Math.min(R, ecc[v]);
+ }
+ if (output == OutputLevel.RADIUS || output == OutputLevel.RADIUSDIAMETER || output == OutputLevel.ALL) {
+ assertEquals(ss.getRadius(), R);
+ assertEquals(ss.ecc[ss.getRadialVertex()], R);
+ }
+ if (output == OutputLevel.DIAMETER || output == OutputLevel.RADIUSDIAMETER || output == OutputLevel.ALL) {
+ assertEquals(ss.getDiameter(), D);
+ assertTrue(ss.ecc[ss.getDiametralVertex()] == D);
+ }
+
+ if (output == OutputLevel.ALL) {
+ for (int v = 0; v < size; v++) {
+ assertEquals(ss.ecc[v], ecc[v]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testEmptyGraph() {
+ final ImmutableGraph g = new ArrayListMutableGraph(0, new int[][] {}).immutableView();
+ // @SuppressWarnings("unused")
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(g, OutputLevel.ALL,
+ new ProgressLogger());
+ ss.compute();
+ assertEquals(ss.getRadius(), Integer.MAX_VALUE);
+ assertEquals(ss.getDiameter(), 0);
+
+ final ImmutableGraph h = new ArrayListMutableGraph(2, new int[][] {}).immutableView();
+ // @SuppressWarnings("unused")
+ final SumSweepUndirectedDiameterRadius ssh = new SumSweepUndirectedDiameterRadius(h, OutputLevel.ALL,
+ new ProgressLogger());
+ ssh.compute();
+ assertEquals(ssh.getRadius(), 0);
+ assertEquals(ssh.getDiameter(), 0);
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testNonSymmetricGraph() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ @SuppressWarnings("unused")
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(g, OutputLevel.RADIUS,
+ new ProgressLogger());
+ }
+
+ @Test(expected = UnsupportedOperationException.class)
+ public void testNotComputedR() {
+ final ImmutableGraph g = Transform
+ .symmetrize(new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView());
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(g, OutputLevel.RADIUS,
+ new ProgressLogger());
+ ss.sumSweepHeuristic(0, 1);
+ ss.getRadius();
+ }
+
+ @Test(expected = UnsupportedOperationException.class)
+ public void testNotComputedD() {
+ final ImmutableGraph g = Transform
+ .symmetrize(new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView());
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(g, OutputLevel.RADIUS,
+ new ProgressLogger());
+ ss.sumSweepHeuristic(0, 1);
+ ss.getDiameter();
+ }
+
+ @Test(expected = UnsupportedOperationException.class)
+ public void testNotComputedEcc() {
+ final ImmutableGraph g = Transform
+ .symmetrize(new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 } }).immutableView());
+ final SumSweepUndirectedDiameterRadius ss = new SumSweepUndirectedDiameterRadius(g, OutputLevel.RADIUS,
+ new ProgressLogger());
+ ss.getEccentricity(2);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/TopKGeometricCentralityTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/TopKGeometricCentralityTest.java
new file mode 100644
index 0000000..f531e8d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/algo/TopKGeometricCentralityTest.java
@@ -0,0 +1,243 @@
+package it.unimi.dsi.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.algo.TopKGeometricCentrality.Centrality;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.util.Arrays;
+
+import org.apache.commons.lang3.ArrayUtils;
+import org.junit.Test;
+
+//RELEASE-STATUS: DIST
+
+/**
+ * @author Michele Borassi This class tests the ClosenessCentrality class.
+ */
+public class TopKGeometricCentralityTest {
+
+ // The list of centralities to be tested.
+ private final Centrality c[] = { Centrality.LIN, Centrality.HARMONIC, Centrality.EXPONENTIAL };
+
+ @Test
+ public void testPath() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 }, { 2, 1 }, { 1, 0 } }).immutableView();
+
+ final TopKGeometricCentrality cc1 = TopKGeometricCentrality.newLinCentrality(graph, 1, 0);
+ cc1.compute();
+ assertEquals(1, cc1.topK.length);
+ assertEquals(3.0 * 3.0 / 2, cc1.centrality[1], 1E-13);
+ assertEquals(cc1.topK[0], 1);
+
+ final TopKGeometricCentrality cc2 = TopKGeometricCentrality.newLinCentrality(graph, 3, 0);
+
+ cc2.compute();
+ assertEquals(3, cc2.topK.length);
+ int v = cc2.topK[0];
+ assertTrue(v == 1);
+ v = cc2.topK[1];
+ int w = cc2.topK[2];
+ assertTrue(v == 0 || v == 2);
+ assertTrue(w == 0 || w == 2);
+ assertTrue(v != w);
+ assertEquals(3.0 * 3.0 / 2, cc2.centrality[1], 1E-13);
+ assertEquals(3.0 * 3.0 / 3, cc2.centrality[0], 1E-13);
+ assertEquals(3.0 * 3.0 / 3, cc2.centrality[2], 1E-13);
+ }
+
+ @Test
+ public void testLozenge() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(4, new int[][] { { 0, 1 }, { 0, 2 }, { 1, 3 }, { 2, 3 } }).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newHarmonicCentrality(graph, 3, 0);
+
+ cc.compute();
+ assertEquals(3, cc.topK.length);
+ int v = cc.topK[0];
+ assertTrue(v == 0);
+ v = cc.topK[1];
+ int w = cc.topK[2];
+ assertTrue(v == 1 || v == 2);
+ assertTrue(w == 1 || w == 2);
+ assertTrue(v != w);
+ assertEquals(2.5, cc.centrality[0], 1E-13);
+ assertEquals(1.0, cc.centrality[1], 1E-13);
+ assertEquals(1.0, cc.centrality[2], 1E-13);
+ }
+
+ @Test
+ public void testLozengeModified() {
+ final ImmutableGraph graph = new ArrayListMutableGraph(5, new int[][] { { 0, 1 }, { 0, 2 }, { 1, 3 }, { 2, 3 }, { 0, 4 }, { 4, 0 } }).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newExponentialCentrality(graph, 2, 0.5, 0);
+
+ cc.compute();
+ assertEquals(2, cc.topK.length);
+ int v = cc.topK[1];
+ assertEquals(v, 4);
+ v = cc.topK[0];
+ assertEquals(v, 0);
+ assertEquals((3 * 0.5 + 0.5 * 0.5), cc.centrality[0], 1E-13);
+ assertEquals((0.5 + 2 * 0.5 * 0.5 + 0.5 * 0.5 * 0.5), cc.centrality[4], 1E-13);
+ }
+
+ @Test
+ public void testCycle() {
+ // In this test, we also check the behavior if k is bigger than the
+ // number of nodes.
+ for (int size : new int[] { 3, 5, 7 }) {
+ int k = 5;
+ final ImmutableGraph graph = ArrayListMutableGraph.newDirectedCycle(size).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newExponentialCentrality(graph, k, 0.5, 0);
+ cc.compute();
+
+ final double expected = (1 - Math.pow(0.5, size - 1));
+ int nFound = 0;
+ for (int i = 0; i < size; i++) {
+ if (Math.abs(expected - cc.centrality[i]) < 1E-12) {
+ nFound++;
+ }
+ }
+ assertTrue(nFound >= Math.min(size, k));
+ }
+ }
+
+ @Test
+ public void testClique() {
+ // In this test, we also check the behavior if k is bigger than the
+ // number of nodes.
+ for (int size : new int[] { 10, 50, 100 }) {
+ int k = 30;
+ final ImmutableGraph graph = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newLinCentrality(graph, k, 0);
+ cc.compute();
+ double expected = size * size / (size-1.0);
+ int nFound = 0;
+ for (int i = 0; i < size; i++) {
+ if (Math.abs(expected - cc.centrality[i]) < 1E-12) {
+ nFound++;
+ }
+ }
+ assertTrue(nFound >= Math.min(size, k));
+ }
+ }
+
+ @Test
+ public void testSparse() {
+ // Used to test the behavior if the centrality of only few vertices
+ // is defined (in the extreme case, if the graph is empty).
+ final ImmutableGraph emptygraph = new ArrayListMutableGraph(100, new int[][] {}).immutableView();
+
+ for (Centrality curC : c) {
+ final TopKGeometricCentrality cc = new TopKGeometricCentrality(emptygraph, 1, curC, 0, 0.5, new ProgressLogger());
+ cc.compute();
+ assertEquals(cc.topK.length, 1);
+ }
+ final ImmutableGraph graphfewedges = new ArrayListMutableGraph(100, new int[][] { { 10, 32 }, { 21, 44 } }).immutableView();
+ for (Centrality curC : c) {
+ final TopKGeometricCentrality cc = new TopKGeometricCentrality(graphfewedges, 30, curC, 0, 0.5, new ProgressLogger());
+ cc.compute();
+ assertEquals(cc.topK.length, 30);
+ int v = cc.topK[0], w = cc.topK[1];
+ assertTrue(cc.topK[0] == 10 || cc.topK[0] == 21);
+ assertTrue(w == 10 || w == 21);
+ assertTrue(v != w);
+ }
+ }
+
+ @Test
+ public void testRandom() throws InterruptedException {
+ for (double p : new double[] { .1, .2, .5 }) {
+ for (int size : new int[] { 10, 50, 100, 500 }) {
+ for (int k : new int[] { 5, 10, 30, 100 }) {
+ final ImmutableGraph graph = new ArrayListMutableGraph(new ErdosRenyiGraph(size, p, 0, false)).immutableView();
+ final GeometricCentralities cc_exaustive = new GeometricCentralities(graph);
+ cc_exaustive.compute();
+ for (Centrality cCur : c) {
+ final TopKGeometricCentrality cc = new TopKGeometricCentrality(graph, k, cCur, 0, 0.5, new ProgressLogger());
+ cc.compute();
+ cc.checkReachLU();
+ double[] centrExaustive;
+ switch (cCur) {
+ case LIN:
+ centrExaustive = Arrays.copyOf(cc_exaustive.lin, cc_exaustive.lin.length);
+ break;
+ case HARMONIC:
+ centrExaustive = Arrays.copyOf(cc_exaustive.harmonic, cc_exaustive.harmonic.length);
+ break;
+ case EXPONENTIAL:
+ centrExaustive = Arrays.copyOf(cc_exaustive.exponential, cc_exaustive.exponential.length);
+ break;
+ default:
+ centrExaustive = new double[size];
+ break;
+ }
+
+ for (int v = 0; v < graph.numNodes(); v++) {
+ if (cc.centrality[v] != -1) {
+ assertEquals(cc.centrality[v], centrExaustive[v], 1E-12);
+ }
+ }
+ Arrays.sort(centrExaustive);
+ ArrayUtils.reverse(centrExaustive);
+
+ for (int i = 0; i < cc.topK.length; i++) {
+ assertEquals(cc.centrality[cc.topK[i]], centrExaustive[i], 1E-12);
+ }
+ assertEquals(cc.topK.length, Math.min(k, size));
+ }
+ }
+ }
+ }
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidK() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ @SuppressWarnings("unused")
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newLinCentrality(g, -1, 0);
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidNThreads() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ @SuppressWarnings("unused")
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newHarmonicCentrality(g, 1, -1);
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAlpha() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newExponentialCentrality(g, 1, 1, 0);
+ cc.compute();
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAlpha1() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newExponentialCentrality(g, 1, 0, 0);
+ cc.compute();
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAlpha2() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newExponentialCentrality(g, 1, -1.0, 0);
+ cc.compute();
+ }
+
+ @Test(expected = IllegalArgumentException.class)
+ public void testInvalidAlpha3() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final TopKGeometricCentrality cc = TopKGeometricCentrality.newExponentialCentrality(g, 1, 5, 0);
+ cc.compute();
+ }
+
+ public void testNoPL() {
+ final ImmutableGraph g = new ArrayListMutableGraph(2, new int[][] { { 0, 1 } }).immutableView();
+ final TopKGeometricCentrality cc = new TopKGeometricCentrality(g, 10, Centrality.EXPONENTIAL, 1, 5, null);
+ cc.compute();
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/ErdosRenyiGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/ErdosRenyiGraphTest.java
new file mode 100644
index 0000000..16155db
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/ErdosRenyiGraphTest.java
@@ -0,0 +1,64 @@
+package it.unimi.dsi.webgraph.examples;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.NodeIterator;
+
+import org.junit.Test;
+
+
+
+public class ErdosRenyiGraphTest {
+
+ @Test
+ public void test() {
+ ImmutableGraph graph = new ErdosRenyiGraph(10000, 1000000, 0, false);
+ long arcs = 0;
+ for(NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ final int curr = nodeIterator.nextInt();
+ final int outdegree = nodeIterator.outdegree();
+ arcs += outdegree;
+ final int[] s = nodeIterator.successorArray();
+ if (outdegree != 0) assertTrue("Node " + curr, s[0] != curr);
+ for(int i = 1; i < outdegree; i++) {
+ assertTrue(s[i] > s[i - 1]);
+ assertTrue(s[i] != curr);
+ }
+ }
+
+ assertEquals((1000000.0 - arcs) / 1000000.0, 0, 1E-2);
+ }
+
+ @Test
+ public void testBinomialWithoutLoops() {
+ ImmutableGraph g = new ErdosRenyiGraph(5, .5, 0, false);
+ new ArrayListMutableGraph(g).immutableView();
+ }
+
+ @Test
+ public void testCopy() {
+ ImmutableGraph graph = new ErdosRenyiGraph(10000, 1000000, 0, false);
+ assertEquals(graph, graph.copy());
+ assertEquals(graph.copy(), graph);
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/IntegerListImmutableGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/IntegerListImmutableGraphTest.java
new file mode 100644
index 0000000..7e7a85c
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/IntegerListImmutableGraphTest.java
@@ -0,0 +1,64 @@
+package it.unimi.dsi.webgraph.examples;
+
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+
+import org.junit.Test;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+
+public class IntegerListImmutableGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void test() throws IOException {
+ for (int size: new int[] { 5, 10, 100 })
+ for (double p: new double[] { .1, .3, .5, .9 }) {
+
+ ErdosRenyiGraph eg = new ErdosRenyiGraph(size, p, true);
+ String filename = File.createTempFile(IntegerListImmutableGraphTest.class.getSimpleName(), "test").getAbsolutePath();
+ DataOutputStream dos = new DataOutputStream(new FileOutputStream(filename));
+ dos.writeInt(eg.numNodes());
+ NodeIterator nodeIterator = eg.nodeIterator();
+ while (nodeIterator.hasNext()) {
+ nodeIterator.nextInt();
+ dos.writeInt(nodeIterator.outdegree());
+ LazyIntIterator successors = nodeIterator.successors();
+ for (;;) {
+ int succ = successors.nextInt();
+ if (succ < 0) break;
+ dos.writeInt(succ);
+ }
+ }
+ dos.close();
+
+ ImmutableGraph graph = IntegerListImmutableGraph.loadOffline(filename);
+ assertGraph(graph);
+ }
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/IntegerTriplesArcLabelledImmutableGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/IntegerTriplesArcLabelledImmutableGraphTest.java
new file mode 100644
index 0000000..008d2f5
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/examples/IntegerTriplesArcLabelledImmutableGraphTest.java
@@ -0,0 +1,48 @@
+package it.unimi.dsi.webgraph.examples;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import org.junit.Test;
+
+public class IntegerTriplesArcLabelledImmutableGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testEmpty() {
+ ImmutableGraph g = new IntegerTriplesArcLabelledImmutableGraph(new int[][] {});
+
+ assertGraph(g);
+ }
+
+ @Test
+ public void testCycle() {
+ ImmutableGraph g = new IntegerTriplesArcLabelledImmutableGraph(new int[][] {
+ { 0, 1, 2 },
+ { 1, 2, 0 },
+ { 2, 0, 1 },
+
+ });
+
+ assertGraph(g);
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/jung/JungAdapterTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/jung/JungAdapterTest.java
new file mode 100644
index 0000000..27822ff
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/jung/JungAdapterTest.java
@@ -0,0 +1,212 @@
+package it.unimi.dsi.webgraph.jung;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+
+import java.util.Collection;
+
+import org.junit.Test;
+
+import edu.uci.ics.jung.graph.util.EdgeType;
+
+
+
+public class JungAdapterTest {
+
+ @Test
+ public void testSmall() {
+ final ImmutableGraph g = new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 1 }, { 1, 2 } }).immutableView();
+ final JungAdapter j = new JungAdapter(g, Transform.transpose(g));
+
+ final Integer zero = Integer.valueOf(0);
+ final Integer one = Integer.valueOf(1);
+ final Integer two = Integer.valueOf(2);
+ for(Long e: j.getOutEdges(zero)) {
+ assertEquals(zero, j.getSource(e));
+ assertEquals(one, j.getDest(e));
+ }
+ for(Long e: j.getInEdges(two)) {
+ assertEquals(one, j.getSource(e));
+ assertEquals(two, j.getDest(e));
+ }
+
+ assertEquals(one, j.getOpposite(zero, j.getOutEdges(zero).iterator().next()));
+ }
+
+ @Test
+ public void test() {
+ ImmutableGraph graph = new ArrayListMutableGraph(5, new int[][] {
+ { 0, 1 },
+ { 1, 0 },
+ { 1, 2 },
+ { 2, 2 },
+ { 2, 3 },
+ { 3, 0 }
+ }).immutableView();
+ ImmutableGraph transpose = Transform.transpose(graph);
+ JungAdapter jungAdapter = new JungAdapter(graph, transpose);
+
+ // Test number of nodes and arcs
+ assertEquals(graph.numNodes(), jungAdapter.getVertexCount());
+ assertEquals(graph.numArcs(), jungAdapter.getEdgeCount());
+ assertEquals(EdgeType.DIRECTED, jungAdapter.getDefaultEdgeType());
+ assertEquals(graph.numArcs(), jungAdapter.getEdgeCount(EdgeType.DIRECTED));
+ assertEquals(0, jungAdapter.getEdgeCount(EdgeType.UNDIRECTED));
+ assertEquals(0, jungAdapter.getEdges(EdgeType.UNDIRECTED).size());
+
+ // Test vertices
+ Collection<Integer> vertices = jungAdapter.getVertices();
+ assertEquals(graph.numNodes(), vertices.size());
+ for (int i = 0; i < graph.numNodes(); i++) assertTrue(vertices.contains(Integer.valueOf(i)));
+
+ // Test presence / absence of all arcs and nodes
+ Collection<Long> edges = jungAdapter.getEdges();
+ assertEquals(graph.numArcs(), edges.size());
+ assertEquals(edges, jungAdapter.getEdges(EdgeType.DIRECTED));
+ for (int source = 0; source < graph.numNodes(); source++) {
+ Integer sourceVertex = Integer.valueOf(source);
+ assertTrue(jungAdapter.containsVertex(sourceVertex));
+ boolean pOut[] = new boolean[graph.numNodes()];
+ boolean pIn[] = new boolean[graph.numNodes()];
+ int outDeg = graph.outdegree(source);
+ int succ[] = graph.successorArray(source);
+ int inDeg = transpose.outdegree(source);
+ int pred[] = transpose.successorArray(source);
+ for (int i = 0; i < outDeg; i++) pOut[succ[i]] = true;
+ for (int i = 0; i < inDeg; i++) pIn[pred[i]] = true;
+ Collection<Long> incidentEdges = jungAdapter.getIncidentEdges(sourceVertex);
+ Collection<Integer> neighbors = jungAdapter.getNeighbors(sourceVertex);
+ assertEquals(neighbors.size(), jungAdapter.getNeighborCount(sourceVertex));
+
+ // Test outedges
+ Collection<Long> outEdges = jungAdapter.getOutEdges(sourceVertex);
+ Collection<Integer> successors = jungAdapter.getSuccessors(sourceVertex);
+ assertEquals(outDeg, outEdges.size());
+ assertEquals(outDeg, successors.size());
+ int countOut = outDeg;
+ for (Long e: outEdges) {
+ assertEquals(sourceVertex, jungAdapter.getSource(e));
+ int dest = jungAdapter.getDest(e).intValue();
+ assertTrue(pOut[dest]);
+ assertTrue(successors.contains(Integer.valueOf(dest)));
+ successors.remove(Integer.valueOf(dest));
+ assertTrue(incidentEdges.contains(e));
+ assertTrue(neighbors.contains(Integer.valueOf(dest)));
+ if (source != dest) incidentEdges.remove(e);
+ neighbors.remove(Integer.valueOf(dest));
+ countOut--;
+ }
+ assertEquals(0, countOut);
+
+ // Test inedges
+ Collection<Long> inEdges = jungAdapter.getInEdges(sourceVertex);
+ Collection<Integer> predecessors = jungAdapter.getPredecessors(sourceVertex);
+ assertEquals(inDeg, inEdges.size());
+ assertEquals(inDeg, predecessors.size());
+ int countIn = inDeg;
+ for (Long e: inEdges) {
+ assertEquals(sourceVertex, jungAdapter.getDest(e));
+ int src = jungAdapter.getSource(e).intValue();
+ assertTrue(pIn[src]);
+ assertTrue(predecessors.contains(Integer.valueOf(src)));
+ predecessors.remove(Integer.valueOf(src));
+ assertTrue(incidentEdges.contains(e));
+ incidentEdges.remove(e);
+ if (! pOut[src]) {
+ assertTrue(neighbors.contains(Integer.valueOf(src))); // Because if source->src is an arc and also src->source is an arc, src was already removed in the previous cycle
+ neighbors.remove(Integer.valueOf(src));
+ }
+ countIn--;
+ }
+ assertEquals(0, countIn);
+
+ assertEquals(0, incidentEdges.size());
+ assertEquals(0, neighbors.size());
+ if (pOut[source]) assertEquals(inDeg + outDeg - 1, jungAdapter.degree(Integer.valueOf(source)));
+ else assertEquals(inDeg + outDeg, jungAdapter.degree(Integer.valueOf(source)));
+
+
+ assertEquals(outDeg, jungAdapter.getSuccessorCount(sourceVertex));
+ assertEquals(outDeg, jungAdapter.outDegree(sourceVertex));
+ assertEquals(transpose.outdegree(source), jungAdapter.getPredecessorCount(sourceVertex));
+ assertEquals(transpose.outdegree(source), jungAdapter.inDegree(sourceVertex));
+
+ // Test contains edge
+ for (int target = 0; target < graph.numNodes(); target++) {
+ Long edge = Long.valueOf((long)source << 32 | target);
+ Integer targetVertex = Integer.valueOf(target);
+ if (pOut[target]) {
+ assertTrue(jungAdapter.isPredecessor(sourceVertex, targetVertex));
+ assertTrue(jungAdapter.isSuccessor(targetVertex, sourceVertex));
+ assertTrue(jungAdapter.containsEdge(edge));
+ assertEquals(EdgeType.DIRECTED, jungAdapter.getEdgeType(edge));
+ Long foundEdge = jungAdapter.findEdge(Integer.valueOf(source), Integer.valueOf(target));
+ assertEquals(edge, foundEdge);
+ assertEquals(sourceVertex, jungAdapter.getEndpoints(edge).getFirst());
+ assertEquals(targetVertex, jungAdapter.getEndpoints(edge).getSecond());
+ assertEquals(targetVertex, jungAdapter.getOpposite(sourceVertex, edge));
+ assertEquals(sourceVertex, jungAdapter.getOpposite(targetVertex, edge));
+ assertTrue(jungAdapter.isSource(sourceVertex, edge));
+ assertTrue(! jungAdapter.isSource(Integer.valueOf(source + 1), edge));
+ assertTrue(jungAdapter.isDest(targetVertex, edge));
+ assertTrue(! jungAdapter.isDest(Integer.valueOf(target + 1), edge));
+ assertEquals(sourceVertex ,jungAdapter.getSource(edge));
+ assertEquals(targetVertex ,jungAdapter.getDest(edge));
+ Collection<Long> edgeSet = jungAdapter.findEdgeSet(sourceVertex, targetVertex);
+ assertEquals(1, edgeSet.size());
+ assertEquals(edge, edgeSet.iterator().next());
+ assertTrue(edges.contains(edge));
+ edges.remove(edge);
+ Collection<Integer> incidentVertices = jungAdapter.getIncidentVertices(edge);
+ if (source != target) {
+ assertEquals(2, jungAdapter.getIncidentCount(edge));
+ assertEquals(2, incidentVertices.size());
+ }
+ else {
+ assertEquals(1, jungAdapter.getIncidentCount(edge));
+ assertEquals(1, incidentVertices.size());
+ }
+ assertTrue(incidentVertices.contains(sourceVertex));
+ assertTrue(incidentVertices.contains(targetVertex));
+ assertTrue(jungAdapter.isIncident(sourceVertex, edge));
+ assertTrue(jungAdapter.isIncident(targetVertex, edge));
+ assertTrue(jungAdapter.isNeighbor(sourceVertex, targetVertex));
+ assertTrue(jungAdapter.isNeighbor(targetVertex, sourceVertex));
+ }
+ else {
+ assertTrue(! jungAdapter.containsEdge(edge));
+ assertEquals(null, jungAdapter.findEdge(Integer.valueOf(source), Integer.valueOf(target)));
+ if (! pIn[target]) { // Means that source->target and target->source both fail to exist
+ assertTrue(! jungAdapter.isNeighbor(sourceVertex, targetVertex));
+ assertTrue(! jungAdapter.isNeighbor(targetVertex, sourceVertex));
+ }
+ }
+ }
+ }
+ assertTrue(! jungAdapter.containsVertex(Integer.valueOf(-1)));
+ assertTrue(! jungAdapter.containsVertex(Integer.valueOf(graph.numNodes() + 1)));
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/BitStreamArcLabelledGraphTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/BitStreamArcLabelledGraphTest.java
new file mode 100644
index 0000000..ff505b3
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/BitStreamArcLabelledGraphTest.java
@@ -0,0 +1,361 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.BVGraphTest;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+import it.unimi.dsi.webgraph.LazyIntIterators;
+import it.unimi.dsi.webgraph.NodeIterator;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph;
+
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+
+import org.junit.Test;
+
+public class BitStreamArcLabelledGraphTest {
+
+ private static final int[] SIZES = { 0, 1, 2, 3, 4, 7 };
+ private static final int MAX_WIDTH_FOR_FIXED = 32;
+ private static final int[] WIDTHS = { -1, 0, 1, 2, 3, 8, 32, 40, 41, 63 };
+ private static final int[] BATCH_SIZES = { 1, 2, 4, 5, 16 };
+
+ public static File storeTempGraph(final ArcLabelledImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(BitStreamArcLabelledGraphTest.class.getSimpleName(), "test");
+ BitStreamArcLabelledImmutableGraph.store(g, basename.toString(), basename.toString() + "-underlying");
+ BVGraph.store(g, basename.toString() + "-underlying");
+ return basename;
+ }
+
+ private static OutputBitStream createTempBitStream(final String name) throws FileNotFoundException {
+ File f = new File(name);
+ f.deleteOnExit();
+ return new OutputBitStream(f.getAbsolutePath());
+ }
+
+ public String createGraphWithFixedWidthLabels(File basename, ImmutableGraph g, int width) throws IllegalArgumentException, SecurityException, IOException {
+ final int n = g.numNodes();
+ System.err.println("Testing " + n + " nodes, width " + width+ ", basename " + basename);
+
+ OutputBitStream labels = createTempBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = createTempBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(IntIterator j = LazyIntIterators.eager(g.successors(i)); j.hasNext();) bits += labels.writeInt(i * j.nextInt() + i, width);
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-fixedlabel.properties"));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + FixedWidthIntLabel.class.getName() + "(TEST," + width + ")");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-fixedlabel";
+ }
+
+ public String createGraphWithFixedWidthListLabels(File basename, ImmutableGraph g, int width) throws IllegalArgumentException, SecurityException, IOException {
+ final int n = g.numNodes();
+ System.err.println("Testing " + n + " nodes, element width " + width+ ", basename " + basename);
+
+ OutputBitStream labels = createTempBitStream(basename + "-fixedlistlabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = createTempBitStream(basename + "-fixedlistlabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(IntIterator j = LazyIntIterators.eager(g.successors(i)); j.hasNext();) {
+ int succ = j.nextInt();
+ bits += labels.writeGamma((succ + 1) * 2); // list length
+ for(int k = 0; k < (succ + 1) * 2 ; k++) bits += labels.writeInt(i * k + i, width);
+ }
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-fixedlistlabel" + ImmutableGraph.PROPERTIES_EXTENSION));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + FixedWidthIntListLabel.class.getName() + "(TEST," + width + ")");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-fixedlistlabel";
+ }
+
+ public String createGraphWithGammaLabels(File basename, ImmutableGraph g) throws IllegalArgumentException, SecurityException, IOException {
+ // We create a complete graph with labels
+ final int n = g.numNodes();
+ System.err.println("Testing " + n + " nodes, gamma coding, basename " + basename);
+
+ OutputBitStream labels = createTempBitStream(basename + "-gammalabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = createTempBitStream(basename + "-gammalabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(IntIterator j = LazyIntIterators.eager(g.successors(i)); j.hasNext();) bits += labels.writeGamma(i * j.nextInt() + i);
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-gammalabel" + ImmutableGraph.PROPERTIES_EXTENSION));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + GammaCodedIntLabel.class.getName() + "(TEST)");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-gammalabel";
+ }
+
+ public void testLabels(ArcLabelledImmutableGraph alg, final int width) {
+
+ final int mask = (int)(width == MAX_WIDTH_FOR_FIXED ? -1 : (1L << width) - 1);
+
+ // Sequential access, iterators
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = nodeIterator.nextInt();
+ ArcLabelledNodeIterator.LabelledArcIterator l = nodeIterator.successors();
+ int d = nodeIterator.outdegree();
+ while(d-- != 0) {
+ int succ = l.nextInt();
+ if (l.label() instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + succ,(curr * succ + curr) & mask, l.label().getInt());
+ else {
+ int[] value = (int[]) l.label().get();
+ assertEquals((succ + 1) * 2, value.length);
+ for(int i = 0; i < value.length; i++) assertEquals("Successor of index " + i + " of " + curr + "(" + succ + ")", (curr * i + curr) & mask, value[i]);
+ }
+ }
+ }
+
+ // Sequential access, arrays
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ int succ[] = nodeIterator.successorArray();
+ Label[] label = nodeIterator.labelArray();
+ for(int i = 0; i < d; i++) {
+ if (label[i] instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + succ[i], (curr * succ[i] + curr) & mask, label[i].getInt());
+ else {
+ int[] value = (int[]) label[i].get();
+ assertEquals((succ[i] + 1) * 2, value.length);
+ for(int j = 0; j < value.length; j++) assertEquals((curr * j + curr) & mask, value[j]);
+ }
+ }
+ }
+
+ if (! alg.randomAccess()) return;
+
+ // Random access, iterators
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ ArcLabelledNodeIterator.LabelledArcIterator l = alg.successors(curr);
+ int d = alg.outdegree(curr);
+ while(d-- != 0) {
+ int succ = l.nextInt();
+ if (l.label() instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + succ ,(curr * succ + curr) & mask, l.label().getInt());
+ else {
+ int[] value = (int[]) l.label().get();
+ assertEquals((succ + 1) * 2, value.length);
+ for(int i = 0; i < value.length; i++) assertEquals((curr * i + curr) & mask, value[i]);
+ }
+ }
+ }
+
+ // Random access, arrays
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ int d = alg.outdegree(curr);
+ int succ[] = alg.successorArray(curr);
+ Label[] label = alg.labelArray(curr);
+ for(int i = 0; i < d; i++) {
+ if (label[i] instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + succ[i], (curr * succ[i] + curr) & mask, label[i].getInt());
+ else {
+ int[] value = (int[]) label[i].get();
+ assertEquals((succ[i] + 1) * 2, value.length);
+ for(int j = 0; j < value.length; j++) assertEquals((curr * j + curr) & mask, value[j]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testLabels() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n: SIZES) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing type " + type + "...");
+ final ImmutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+ final File basename = BVGraphTest.storeTempGraph(g);
+ // -1 means gamma coding
+ for(int width: WIDTHS) {
+ final String basenameLabel = width == -1 ?
+ createGraphWithGammaLabels(basename, g) :
+ width < MAX_WIDTH_FOR_FIXED ? createGraphWithFixedWidthLabels(basename, g, width) :
+ createGraphWithFixedWidthListLabels(basename, g, width - MAX_WIDTH_FOR_FIXED);
+
+ System.err.println("Testing offline...");
+ testLabels(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel), width % MAX_WIDTH_FOR_FIXED);
+ testLabels(BitStreamArcLabelledImmutableGraph.load(basenameLabel), width % MAX_WIDTH_FOR_FIXED);
+ WebGraphTestCase.assertGraph(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel));
+
+ new File(basenameLabel + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename.delete();
+ BVGraphTest.deleteGraph(basename);
+ }
+ }
+ }
+
+ // Proceeds with the same test as before, but with a graph obtained as a union
+ @Test
+ public void testUnion() throws IllegalArgumentException, SecurityException, IOException {
+ for(int n: SIZES) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing arc-labelled union type " + type + "...");
+ final ImmutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+
+ // Now split the graph g into two (possibly non-disjoint) graphs
+ ArrayListMutableGraph g0mut = new ArrayListMutableGraph();
+ ArrayListMutableGraph g1mut = new ArrayListMutableGraph();
+ g0mut.addNodes(g.numNodes()); g1mut.addNodes(g.numNodes());
+ NodeIterator nit = g.nodeIterator();
+ while (nit.hasNext()) {
+ int from = nit.nextInt();
+ LazyIntIterator succ = nit.successors();
+ int d = nit.outdegree();
+ while (d-- != 0) {
+ int to = succ.nextInt();
+ if (Math.random() < .5) g0mut.addArc(from, to);
+ else if (Math.random() < .5) g1mut.addArc(from, to);
+ else { g0mut.addArc(from, to); g1mut.addArc(from, to); }
+ }
+ }
+ ImmutableGraph g0 = g0mut.immutableView();
+ ImmutableGraph g1 = g1mut.immutableView();
+
+ final File basename0 = BVGraphTest.storeTempGraph(g0);
+ final File basename1 = BVGraphTest.storeTempGraph(g1);
+ // -1 means gamma coding
+ for(int width: WIDTHS) {
+ final String basenameLabel0 = width == -1 ?
+ createGraphWithGammaLabels(basename0, g0) :
+ width < MAX_WIDTH_FOR_FIXED ? createGraphWithFixedWidthLabels(basename0, g0, width) :
+ createGraphWithFixedWidthListLabels(basename0, g0, width - MAX_WIDTH_FOR_FIXED);
+ final String basenameLabel1 = width == -1 ?
+ createGraphWithGammaLabels(basename1, g1) :
+ width < MAX_WIDTH_FOR_FIXED ? createGraphWithFixedWidthLabels(basename1, g1, width) :
+ createGraphWithFixedWidthListLabels(basename1, g1, width - MAX_WIDTH_FOR_FIXED);
+
+
+ System.err.println("Testing arc-labelled union offline...");
+ testLabels((ArcLabelledImmutableGraph) Transform.union(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel0), BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel1)), width % MAX_WIDTH_FOR_FIXED);
+ testLabels((ArcLabelledImmutableGraph) Transform.union(BitStreamArcLabelledImmutableGraph.load(basenameLabel0), BitStreamArcLabelledImmutableGraph.load(basenameLabel1)), width % MAX_WIDTH_FOR_FIXED);
+
+ WebGraphTestCase.assertGraph(Transform.union(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel0), BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel1)));
+
+ new File(basenameLabel0 + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel0 + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel0 + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ new File(basenameLabel1 + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel1 + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel1 + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename0.delete();
+ BVGraphTest.deleteGraph(basename0);
+ basename1.delete();
+ BVGraphTest.deleteGraph(basename1);
+ }
+ }
+ }
+
+ @Test
+ public void moreTestUnion() {
+ IntegerTriplesArcLabelledImmutableGraph integerTriplesArcLabelledImmutableGraph0 = new IntegerTriplesArcLabelledImmutableGraph(new int[][]
+ {
+ { 0, 1, 302 }, { 0, 2, 401 }, { 1, 3, 201 }
+ });
+
+ IntegerTriplesArcLabelledImmutableGraph integerTriplesArcLabelledImmutableGraph1 = new IntegerTriplesArcLabelledImmutableGraph(new int[][]
+ {
+ { 0, 5, 302 }, { 0, 2, 401 }, { 1, 2, 201 }
+ });
+
+ ImmutableGraph union = Transform.union(integerTriplesArcLabelledImmutableGraph0, integerTriplesArcLabelledImmutableGraph1);
+ WebGraphTestCase.assertGraph(union);
+ }
+
+ @Test
+ public void testTransposition() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n: new int[] {7}) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing arc-labelled transposition type " + type + "...");
+ final ImmutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+ final File basename = BVGraphTest.storeTempGraph(g);
+ // -1 means gamma coding
+ for(int width: WIDTHS) {
+ final String basenameLabel;
+
+ if (width == -1) basenameLabel = createGraphWithGammaLabels(basename, g);
+ else if (width < MAX_WIDTH_FOR_FIXED) basenameLabel = createGraphWithFixedWidthLabels(basename, g, width);
+ else basenameLabel = createGraphWithFixedWidthListLabels(basename, g, width - MAX_WIDTH_FOR_FIXED);
+
+ for (int batchSize: BATCH_SIZES) {
+ ArcLabelledImmutableGraph gt = Transform.transposeOffline(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel),
+ batchSize, new File(System.getProperty("java.io.tmpdir")), null);
+
+ ArcLabelledImmutableGraph gtt = Transform.transposeOffline(gt,
+ batchSize, new File(System.getProperty("java.io.tmpdir")), null);
+ System.err.println("Testing with batch size " + batchSize + "...");
+ testLabels(gtt, width % MAX_WIDTH_FOR_FIXED);
+ }
+
+ new File(basenameLabel + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename.delete();
+ BVGraphTest.deleteGraph(basename);
+ }
+ }
+ }
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/CSSerializationTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/CSSerializationTest.java
new file mode 100644
index 0000000..a08df16
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/CSSerializationTest.java
@@ -0,0 +1,146 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.BVGraphTest;
+import it.unimi.dsi.webgraph.ImmutableGraph;
+import it.unimi.dsi.webgraph.LazyIntIterators;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+
+import org.junit.Test;
+
+public class CSSerializationTest extends WebGraphTestCase {
+
+ private static final int[] SIZES = { 0, 1, 2, 3, 4 };
+ private static final int[] WIDTHS = { 20, 21, 30, 31 };
+
+ public String createGraph(File basename, ImmutableGraph g, int width) throws IllegalArgumentException, SecurityException, IOException {
+ final int n = g.numNodes();
+ System.err.println("Testing " + n + " nodes, width " + width+ ", basename " + basename);
+
+ OutputBitStream labels = new OutputBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = new OutputBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ Label lab;
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(IntIterator j = LazyIntIterators.eager(g.successors(i)); j.hasNext();) {
+ int succ = j.nextInt();
+ lab = new FakeCSFixedWidthIntLabel("TEST", width, i * succ + i);
+ bits += lab.toBitStream(labels, i);
+ }
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-fixedlabel" + ImmutableGraph.PROPERTIES_EXTENSION));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + FakeCSFixedWidthIntLabel.class.getName() + "(TEST," + width + ")");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-fixedlabel";
+ }
+
+ public void testLabels(ArcLabelledImmutableGraph alg, final int width) {
+
+ final int mask = (int)((1L << width) - 1);
+
+ // Sequential access, iterators
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = nodeIterator.nextInt();
+ ArcLabelledNodeIterator.LabelledArcIterator l = nodeIterator.successors();
+ int d = nodeIterator.outdegree();
+ while(d-- != 0) {
+ int succ = l.nextInt();
+ assertEquals(curr + " -> " + succ,(curr * succ + curr) & mask, l.label().getInt());
+ }
+ }
+
+ // Sequential access, arrays
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = nodeIterator.nextInt();
+ int d = nodeIterator.outdegree();
+ int succ[] = nodeIterator.successorArray();
+ Label[] label = nodeIterator.labelArray();
+ for(int i = 0; i < d; i++)
+ assertEquals(curr + " -> " + succ[i], (curr * succ[i] + curr) & mask, label[i].getInt());
+ }
+
+ if (! alg.randomAccess()) return;
+
+ // Random access, iterators
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ ArcLabelledNodeIterator.LabelledArcIterator l = alg.successors(curr);
+ int d = alg.outdegree(curr);
+ while(d-- != 0) {
+ int succ = l.nextInt();
+ assertEquals(curr + " -> " + succ ,(curr * succ + curr) & mask, l.label().getInt());
+ }
+ }
+
+ // Random access, arrays
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ int d = alg.outdegree(curr);
+ int succ[] = alg.successorArray(curr);
+ Label[] label = alg.labelArray(curr);
+ for(int i = 0; i < d; i++) {
+ assertEquals(curr + " -> " + succ[i], (curr * succ[i] + curr) & mask, label[i].getInt());
+ }
+ }
+ }
+
+ @Test
+ public void testLabels() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n: SIZES) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing type " + type + "...");
+ final ImmutableGraph g = type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView();
+ final File basename = BVGraphTest.storeTempGraph(g);
+ for(int width: WIDTHS) {
+ final String basenameLabel = createGraph(basename, g, width);
+
+ System.err.println("Testing offline...");
+ testLabels(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel), width);
+ testLabels(BitStreamArcLabelledImmutableGraph.load(basenameLabel), width);
+
+ new File(basenameLabel + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/FakeCSFixedWidthIntLabel.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/FakeCSFixedWidthIntLabel.java
new file mode 100644
index 0000000..73ef37d
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/FakeCSFixedWidthIntLabel.java
@@ -0,0 +1,102 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+
+/** An integer represented in fixed width, that fakely provides context sensitivity:
+ * when storing label <var>v</var> onto the arc (<var>x</var>,<var>y</var>),
+ * the value <var>v</var>*(<var>x</var>+1) is stored instead. The provided width must
+ * be smaller than 32.
+ */
+
+public class FakeCSFixedWidthIntLabel extends AbstractIntLabel {
+ /** The bit width used to represent the value of this label. */
+ private final int width;
+
+ /** Creates a new fixed-width int label.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ * @param value the value of this label.
+ */
+ public FakeCSFixedWidthIntLabel(String key, int width, int value) {
+ super(key, value);
+ if (width < 0 || width > 31) throw new IllegalArgumentException("Width out of range: " + width);
+ if (value < 0 || value >= 1L << width) throw new IllegalArgumentException("Value out of range: " + Integer.toString(value));
+ this.width = width;
+ }
+
+ /** Creates a new fixed-width int label of value 0.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ */
+ public FakeCSFixedWidthIntLabel(String key, int width) {
+ this(key, width, 0);
+ }
+
+ /** Creates a new fixed-width integer label using the given key and width
+ * with value 0.
+ *
+ * @param arg two strings containing the key and the width of this label.
+ */
+ public FakeCSFixedWidthIntLabel(String... arg) {
+ this(arg[0], Integer.parseInt(arg[1]));
+ }
+
+ @Override
+ public Label copy() {
+ return new FakeCSFixedWidthIntLabel(key, width, value);
+ }
+
+ /** Returns the width of this label (as provided at construction time).
+ * @return the width of this label.
+ */
+ @Override
+ public int fixedWidth() {
+ return width;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value + " (width:" + width + ")";
+ }
+
+ @Override
+ public int fromBitStream(InputBitStream inputBitStream, int source) throws IOException, UnsupportedOperationException {
+ int v = inputBitStream.readInt(width);
+ value = v / (source + 1);
+ return width;
+ }
+
+ @Override
+ public int toBitStream(OutputBitStream outputBitStream, int source) throws IOException, UnsupportedOperationException {
+ return outputBitStream.writeInt((source + 1) * value, width);
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + "," + width + ")";
+ }
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/MoreLabelledTransformTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/MoreLabelledTransformTest.java
new file mode 100644
index 0000000..9ba9062
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/MoreLabelledTransformTest.java
@@ -0,0 +1,190 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.BVGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.PrintWriter;
+
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class MoreLabelledTransformTest extends WebGraphTestCase {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(MoreLabelledTransformTest.class);
+
+ @Test
+ public void testTransform() throws IOException, IllegalArgumentException, SecurityException {
+ File f = File.createTempFile("test", "transform");
+ f.delete();
+ f.mkdir();
+ f.deleteOnExit();
+ System.out.println(f);
+ ProgressLogger pl = new ProgressLogger(LOGGER);
+ pl.logInterval = 1;
+
+ // Creates an arc-labelled graph
+ int[][] arcs;
+ ArrayListMutableGraph under = new ArrayListMutableGraph(6, arcs = new int[][] {
+ { 0, 3 }, { 1, 3 }, { 1, 4 }, { 2, 4 }, { 5, 4 }
+ });
+ BVGraph.store(under.immutableView(), new File(f, "original" + BitStreamArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX).toString());
+ OutputBitStream obs = new OutputBitStream(new File(f, "original" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).toString());
+ OutputBitStream labobs = new OutputBitStream(new FileOutputStream(new File(f, "original" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).toString()));
+ long prev = 0;
+ int curr = -1;
+ for (int[] arc: arcs) {
+ while (arc[0] != curr) {
+ labobs.writeGamma((int)(obs.writtenBits() - prev));
+ prev = obs.writtenBits();
+ curr++;
+ }
+ new FixedWidthIntLabel("fake", 8, arc[0] * arc[1]).toBitStream(obs, arc[0]);
+ }
+ labobs.writeGamma((int)(obs.writtenBits() - prev));
+ obs.close();
+ labobs.close();
+ String graphBasename = new File(f, "original").toString();
+ PrintWriter pw = new PrintWriter(graphBasename + ArcLabelledImmutableGraph.PROPERTIES_EXTENSION);
+ pw.println(BitStreamArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + "=original" + BitStreamArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX);
+ pw.println(ArcLabelledImmutableGraph.GRAPHCLASS_PROPERTY_KEY + "=" + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + "=" + FixedWidthIntLabel.class.getName() + "(fake,8,0)");
+ pw.close();
+
+ // We transpose it
+ ArcLabelledImmutableGraph graph = ArcLabelledImmutableGraph.load(graphBasename, pl);
+ ArcLabelledImmutableGraph gT = Transform.transposeOffline(graph, 2, null, new ProgressLogger());
+ String baseNameT = graphBasename + "t";
+ BVGraph.store(gT, baseNameT + "-underlying");
+ BitStreamArcLabelledImmutableGraph.store(gT, baseNameT, baseNameT + "-underlying");
+
+ // We reload the transpose
+ gT = ArcLabelledImmutableGraph.load(baseNameT, pl);
+
+ // We merge it with the original one
+ LabelMergeStrategy mergeStrategy = null;
+ ArcLabelledImmutableGraph gU = Transform.union(graph, gT, mergeStrategy);
+ assertGraph(gU, false);
+
+ String baseNameU = graphBasename + "u";
+ BVGraph.store(gU, baseNameU + "-underlying", -1, -1, -1, -1, 0, 1);
+ BitStreamArcLabelledImmutableGraph.store(gU, baseNameU, baseNameU + "-underlying");
+
+ // We reload it
+ gU = BitStreamArcLabelledImmutableGraph.load(baseNameU, pl);
+
+ // Here is what we expect to find
+ int[][] expectedSuccessors = new int[][] {
+ { 3 }, // successors of 0
+ { 3, 4 }, // successors of 1
+ { 4 }, // successors of 2
+ { 0, 1 }, // successors of 3
+ { 1, 2, 5 }, // successors of 4
+ { 4 }, // successors of 5
+ };
+ int[][] expectedLabels = new int[][] {
+ { 0 }, // successors of 0
+ { 3, 4 }, // successors of 1
+ { 8 }, // successors of 2
+ { 0, 3 }, // successors of 3
+ { 4, 8, 20 }, // successors of 4
+ { 20 }, // successors of 5
+ };
+ ArcLabelledNodeIterator nit = gU.nodeIterator();
+ while (nit.hasNext()) {
+ int node = nit.nextInt();
+ assertEquals(expectedSuccessors[node].length, nit.outdegree());
+ LabelledArcIterator ait = nit.successors();
+ int d = nit.outdegree();
+ int k = 0;
+ while (d-- != 0) {
+ assertEquals(expectedSuccessors[node][k], ait.nextInt());
+ assertEquals(expectedLabels[node][k], ait.label().getInt());
+ k++;
+ }
+ }
+
+ // Same test, but with iterators requested randomly
+ for (int node = gU.numNodes() - 1; node >= 0; node--) {
+ LabelledArcIterator ait = gU.successors(node);
+ assertEquals(expectedSuccessors[node].length, gU.outdegree(node));
+ int k = 0;
+ int d = gU.outdegree(node);
+ while (d-- != 0) {
+ assertEquals(expectedSuccessors[node][k], ait.nextInt());
+ assertEquals(expectedLabels[node][k], ait.label().getInt());
+ k++;
+ }
+ }
+
+ // Filter
+ ArcLabelledImmutableGraph filteredGraph = Transform.filterArcs(gU, new LabelledArcFilter() {
+ @Override
+ public boolean accept(int i, int j, Label label) {
+ return i%2 == 0 && j%2 == 1 && label.getInt()%2==0;
+ }
+ });
+ int[][] expectedFilteredSuccessors = new int[][] {
+ { 3 },
+ {},
+ {},
+ {},
+ { 1, 5 },
+ {}
+ };
+ int[][] expectedFilteredLabels = new int[][] {
+ { 0 },
+ {},
+ {},
+ {},
+ { 4, 20 },
+ {}
+ };
+
+ WebGraphTestCase.assertGraph(filteredGraph);
+ nit = filteredGraph.nodeIterator();
+ while (nit.hasNext()) {
+ int node = nit.nextInt();
+ assertEquals(expectedFilteredSuccessors[node].length, nit.outdegree());
+ LabelledArcIterator ait = nit.successors();
+ int d = nit.outdegree();
+ int k = 0;
+ while (d-- != 0) {
+ assertEquals(expectedFilteredSuccessors[node][k], ait.nextInt());
+ assertEquals(expectedFilteredLabels[node][k], ait.label().getInt());
+ k++;
+ }
+ }
+
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/RelabellingTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/RelabellingTest.java
new file mode 100644
index 0000000..4422431
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/labelling/RelabellingTest.java
@@ -0,0 +1,79 @@
+package it.unimi.dsi.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph;
+
+import org.junit.Test;
+
+public class RelabellingTest extends WebGraphTestCase {
+
+ @Test
+ public void testIntRelabelling() {
+ // Take a graph and convert from gamma to fixed-width
+ ArcLabelledImmutableGraph gorig = new IntegerTriplesArcLabelledImmutableGraph(new int[][]
+ {
+ { 0, 1, 203 }, { 0, 2, 104 }, { 1, 3, 102 }
+ });
+ ArcLabelledImmutableGraph gfixed = new ArcRelabelledImmutableGraph(gorig, new FixedWidthIntLabel("FOO", 15), ArcRelabelledImmutableGraph.INT_LABEL_CONVERSION_STRATEGY);
+ assertGraph(gorig);
+ assertGraph(gfixed);
+ assertEquals(gorig, gfixed);
+
+ // Convert its labels to lists, digitwise; e.g. 203-> [2,0,3]...
+ ArcLabelledImmutableGraph glist = new ArcRelabelledImmutableGraph(gorig, new FixedWidthIntListLabel("FOO", 15), new ArcRelabelledImmutableGraph.LabelConversionStrategy() {
+ @Override
+ public void convert(Label from, Label to, int source, int target) {
+ String sValue = Integer.toString(((AbstractIntLabel)from).value);
+ int[] s = new int[sValue.length()];
+ for (int i = 0; i < sValue.length(); i++) s[i] = sValue.charAt(i) - '0';
+ ((AbstractIntListLabel)to).value = s;
+ }
+ });
+ // ...and then back to integer, but backwards; e.g. [2,0,3] -> 302...
+ ArcLabelledImmutableGraph grevert = new ArcRelabelledImmutableGraph(glist, new FixedWidthIntLabel("FOO", 15), new ArcRelabelledImmutableGraph.LabelConversionStrategy() {
+ @Override
+ public void convert(Label from, Label to, int source, int target) {
+ int[] v = ((AbstractIntListLabel)from).value;
+ int tot = 0;
+ for (int i = v.length - 1; i >= 0; i--)
+ tot = tot * 10 + v[i];
+ ((AbstractIntLabel)to).value = tot;
+ }
+ });
+ assertGraph(glist);
+ assertGraph(grevert);
+ assertGraph(new ArcRelabelledImmutableGraph(gorig, new FixedWidthIntLabel("FOO", 15), new ArcRelabelledImmutableGraph.LabelConversionStrategy() {
+ @Override
+ public void convert(Label from, Label to, int source, int target) {
+ }
+ }));
+
+
+ // Check the result is correct
+ assertEquals(grevert, new IntegerTriplesArcLabelledImmutableGraph(new int[][]
+ {
+ { 0, 1, 302 }, { 0, 2, 401 }, { 1, 3, 201 }
+ }));
+ }
+
+
+}
diff --git a/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/tool/ExtractComponentTest.java b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/tool/ExtractComponentTest.java
new file mode 100644
index 0000000..11369a0
--- /dev/null
+++ b/third_party/webgraph-3.6.1/test/it/unimi/dsi/webgraph/tool/ExtractComponentTest.java
@@ -0,0 +1,77 @@
+package it.unimi.dsi.webgraph.tool;
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.webgraph.tool.ExtractComponent;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.apache.commons.io.IOUtils;
+import org.junit.Test;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.JSAPException;
+
+public class ExtractComponentTest {
+
+ @Test
+ public void test() throws IOException, JSAPException {
+ File componentsFile = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-components");
+ componentsFile.deleteOnExit();
+ File mapFile = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-map");
+ mapFile.deleteOnExit();
+ File inIds = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-inIds");
+ inIds.deleteOnExit();
+ File outIds = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-outIds");
+ outIds.deleteOnExit();
+
+ BinIO.storeInts(new int[] { 1, 0, 1, 0, 0, 2, 1, 0 }, componentsFile);
+ IOUtils.writeLines(Arrays.asList(new String[] { "a", "b", "c", "d", "e", "f", "g", "h" }), null, new FileOutputStream(inIds), Charsets.UTF_8);
+ ExtractComponent.main(new String[] { componentsFile.toString(), mapFile.toString(), inIds.toString(), outIds.toString() });
+
+ assertArrayEquals(new int[] { -1, 0, -1, 1, 2, -1, -1, 3 }, BinIO.loadInts(mapFile));
+ assertEquals(Arrays.asList(new String[] { "b", "d", "e", "h" }), IOUtils.readLines(new FileInputStream(outIds), Charsets.UTF_8));
+
+ componentsFile.delete();
+ mapFile.delete();
+ inIds.delete();
+ outIds.delete();
+ }
+
+ @Test
+ public void testNoIds() throws IOException, JSAPException {
+ File componentsFile = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-components");
+ componentsFile.deleteOnExit();
+ File mapFile = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-map");
+ mapFile.deleteOnExit();
+
+ BinIO.storeInts(new int[] { 1, 0, 1, 0, 0, 2, 1, 0 }, componentsFile);
+ ExtractComponent.main(new String[] { componentsFile.toString(), mapFile.toString() });
+
+ assertArrayEquals(new int[] { -1, 0, -1, 1, 2, -1, -1, 3 }, BinIO.loadInts(mapFile));
+
+ componentsFile.delete();
+ mapFile.delete();
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testDifferentLengths() throws IOException, JSAPException {
+ File componentsFile = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-components");
+ componentsFile.deleteOnExit();
+ File mapFile = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-map");
+ mapFile.deleteOnExit();
+ File inIds = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-inIds");
+ inIds.deleteOnExit();
+ File outIds = File.createTempFile(ExtractComponentTest.class.getSimpleName() + "-", "-outIds");
+ outIds.deleteOnExit();
+
+ BinIO.storeInts(new int[] { 1, 0, 1, 0, 0, 2, 1, 0 }, componentsFile);
+ IOUtils.writeLines(Arrays.asList(new String[] { "a", "b", "c", "d", "e", "f", "g", "h", "i" }), null, new FileOutputStream(inIds), Charsets.UTF_8);
+ ExtractComponent.main(new String[] { componentsFile.toString(), mapFile.toString(), inIds.toString(), outIds.toString() });
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/CHANGES b/third_party/webgraph-big-3.5.0/CHANGES
new file mode 100644
index 0000000..01879b0
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/CHANGES
@@ -0,0 +1,119 @@
+3.5.0
+
+- Java 8-only.
+
+- Fixed obscure bug in ShiftByOneArcListASCIIGraph: if the arc list
+ was specified on the command line (no -1 option) and more than
+ one core was available, the graph would have not been shifted.
+ Thanks to Luca Prigioniero for reporting this bug.
+
+3.3.6
+
+- The family of loadSequential() methods have been deprecated, and
+ replaced in code by loadOffline() or loadMapped().
+
+3.3.5
+
+- Fixed dependencies.
+
+3.3.4
+
+- Significantly improved performance of HyperBall on graphs with a highly
+ skewed (e.g., heavy-tailed) outdegree distribution (e.g., transposed web
+ graphs).
+
+- Fixed wrong estimation of memory used.
+
+- Now ConnectedComponents writes results using "wcc" instead of "scc".
+
+3.3.3
+
+- Regressed to fastutil's quicksort calls in case of array fragments. Java
+ 7's Arrays.sort() has a memory bug that was killing the performance of a
+ number of methods.
+
+3.3.2
+
+- We now distribute SpeedTest, hoping to improve the quality of benchmarks
+ in the literature.
+
+3.3.1
+
+- Adapted to new DSI utilities.
+
+3.3.0
+
+- HyperBall sports a new adaptive decomposition scheme that
+ is based on the number of arcs to be scanned, rather than
+ on the number of nodes.
+
+- Fixed bug in the computation of the buckets. If you have used the new
+ iterative implementation of Tarjan's algorithm
+ (StronglyConnectedComponents) to compute buckets please recompute them.
+
+- ParallelBreadthFirstVisit and ConnectedComponents have been ported
+ from the standard version.
+
+3.2.1
+
+- New iterative implementation of Tarjan's algorithm.
+
+- HyperBall can now compute Nieminen's centrality.
+
+- Added missing shift option to ArcListASCIIGraph.
+
+3.2.0
+
+- New selectable upper bound for EFGraph makes it possible to build
+ "fake" graphs in which successors are greater than or equal to
+ the number of nodes (this was already possible with BVGraph). Useful
+ for incremental graph construction.
+
+- New IncrementalImmutableSequentialGraph adapter, which provides an
+ inversion of control for storing graphs: you supply, one at a time,
+ the successor list of each node.
+
+3.1.0
+
+- We switched to SLF4J for logging.
+
+- Now ScatteredArcsASCIIGraph accepts a translation function from
+ node identifiers to node numbers.
+
+- The DecimalFormat object used to print data is has now a fixed US locale.
+
+- New EFGraph implementation using the Elias-Fano representation of
+ monotone sequences. Compression is not so good, but successor
+ enumeration is blazingly fast and the implementation returns a skippable
+ iterator which provides constant-time search of nodes by lower bound.
+
+- Both BVGraph and EFGraph have outdegree caching and exact unwrapping
+ of successorArray(). This should bring performance improvements.
+
+- New HyperBall implementation of the HyperANF idea ported to this
+ version. It computes several kind of geometric centrality and once in
+ systolic local mode uses time proportional to the number of edges
+ causing a modification, setting in practice the expected run time to the
+ theoretical bound O(m log n).
+
+- Several wrong instances of "int" have been replaced with "long".
+
+3.0.3
+
+- New ImmutableGraph.outdegrees() method that exposes the outdegrees of a
+ graph as a LongIterator.
+
+- RandomGraph removed.
+
+- Almost all transformed graphs now support copy().
+
+- New Transform.NodeClassFilter.
+
+3.0.1
+
+- ASCIIGraph was parsing the number of nodes as an integer rather than a
+ long. Thanks to David Gleich for reporting this bug.
+
+3.0
+
+- First public release.
diff --git a/third_party/webgraph-big-3.5.0/COPYING b/third_party/webgraph-big-3.5.0/COPYING
new file mode 100644
index 0000000..94a9ed0
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/COPYING
@@ -0,0 +1,674 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+ The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works. By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users. We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors. You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+ To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights. Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received. You must make sure that they, too, receive
+or can get the source code. And you must show them these terms so they
+know their rights.
+
+ Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+ For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software. For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+ Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so. This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software. The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable. Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products. If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+ Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary. To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ TERMS AND CONDITIONS
+
+ 0. Definitions.
+
+ "This License" refers to version 3 of the GNU General Public License.
+
+ "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+ "The Program" refers to any copyrightable work licensed under this
+License. Each licensee is addressed as "you". "Licensees" and
+"recipients" may be individuals or organizations.
+
+ To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy. The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+ A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+ To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy. Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+ To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies. Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+ An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License. If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+ 1. Source Code.
+
+ The "source code" for a work means the preferred form of the work
+for making modifications to it. "Object code" means any non-source
+form of a work.
+
+ A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+ The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form. A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+ The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities. However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work. For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+ The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+ The Corresponding Source for a work in source code form is that
+same work.
+
+ 2. Basic Permissions.
+
+ All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met. This License explicitly affirms your unlimited
+permission to run the unmodified Program. The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work. This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+ You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force. You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright. Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+ Conveying under any other circumstances is permitted solely under
+the conditions stated below. Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+ 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+ No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+ When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+ 4. Conveying Verbatim Copies.
+
+ You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+ You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+ 5. Conveying Modified Source Versions.
+
+ You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+ a) The work must carry prominent notices stating that you modified
+ it, and giving a relevant date.
+
+ b) The work must carry prominent notices stating that it is
+ released under this License and any conditions added under section
+ 7. This requirement modifies the requirement in section 4 to
+ "keep intact all notices".
+
+ c) You must license the entire work, as a whole, under this
+ License to anyone who comes into possession of a copy. This
+ License will therefore apply, along with any applicable section 7
+ additional terms, to the whole of the work, and all its parts,
+ regardless of how they are packaged. This License gives no
+ permission to license the work in any other way, but it does not
+ invalidate such permission if you have separately received it.
+
+ d) If the work has interactive user interfaces, each must display
+ Appropriate Legal Notices; however, if the Program has interactive
+ interfaces that do not display Appropriate Legal Notices, your
+ work need not make them do so.
+
+ A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit. Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+ 6. Conveying Non-Source Forms.
+
+ You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+ a) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by the
+ Corresponding Source fixed on a durable physical medium
+ customarily used for software interchange.
+
+ b) Convey the object code in, or embodied in, a physical product
+ (including a physical distribution medium), accompanied by a
+ written offer, valid for at least three years and valid for as
+ long as you offer spare parts or customer support for that product
+ model, to give anyone who possesses the object code either (1) a
+ copy of the Corresponding Source for all the software in the
+ product that is covered by this License, on a durable physical
+ medium customarily used for software interchange, for a price no
+ more than your reasonable cost of physically performing this
+ conveying of source, or (2) access to copy the
+ Corresponding Source from a network server at no charge.
+
+ c) Convey individual copies of the object code with a copy of the
+ written offer to provide the Corresponding Source. This
+ alternative is allowed only occasionally and noncommercially, and
+ only if you received the object code with such an offer, in accord
+ with subsection 6b.
+
+ d) Convey the object code by offering access from a designated
+ place (gratis or for a charge), and offer equivalent access to the
+ Corresponding Source in the same way through the same place at no
+ further charge. You need not require recipients to copy the
+ Corresponding Source along with the object code. If the place to
+ copy the object code is a network server, the Corresponding Source
+ may be on a different server (operated by you or a third party)
+ that supports equivalent copying facilities, provided you maintain
+ clear directions next to the object code saying where to find the
+ Corresponding Source. Regardless of what server hosts the
+ Corresponding Source, you remain obligated to ensure that it is
+ available for as long as needed to satisfy these requirements.
+
+ e) Convey the object code using peer-to-peer transmission, provided
+ you inform other peers where the object code and Corresponding
+ Source of the work are being offered to the general public at no
+ charge under subsection 6d.
+
+ A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+ A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling. In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage. For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product. A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+ "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source. The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+ If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information. But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+ The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed. Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+ Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+ 7. Additional Terms.
+
+ "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law. If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+ When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it. (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.) You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+ Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+ a) Disclaiming warranty or limiting liability differently from the
+ terms of sections 15 and 16 of this License; or
+
+ b) Requiring preservation of specified reasonable legal notices or
+ author attributions in that material or in the Appropriate Legal
+ Notices displayed by works containing it; or
+
+ c) Prohibiting misrepresentation of the origin of that material, or
+ requiring that modified versions of such material be marked in
+ reasonable ways as different from the original version; or
+
+ d) Limiting the use for publicity purposes of names of licensors or
+ authors of the material; or
+
+ e) Declining to grant rights under trademark law for use of some
+ trade names, trademarks, or service marks; or
+
+ f) Requiring indemnification of licensors and authors of that
+ material by anyone who conveys the material (or modified versions of
+ it) with contractual assumptions of liability to the recipient, for
+ any liability that these contractual assumptions directly impose on
+ those licensors and authors.
+
+ All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10. If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term. If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+ If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+ Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+ 8. Termination.
+
+ You may not propagate or modify a covered work except as expressly
+provided under this License. Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+ However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+ Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+ Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License. If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+ 9. Acceptance Not Required for Having Copies.
+
+ You are not required to accept this License in order to receive or
+run a copy of the Program. Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance. However,
+nothing other than this License grants you permission to propagate or
+modify any covered work. These actions infringe copyright if you do
+not accept this License. Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+ 10. Automatic Licensing of Downstream Recipients.
+
+ Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License. You are not responsible
+for enforcing compliance by third parties with this License.
+
+ An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations. If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+ You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License. For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+ 11. Patents.
+
+ A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based. The
+work thus licensed is called the contributor's "contributor version".
+
+ A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version. For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+ Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+ In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement). To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+ If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients. "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+ If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+ A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License. You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+ Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+ 12. No Surrender of Others' Freedom.
+
+ If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all. For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+ 13. Use with the GNU Affero General Public License.
+
+ Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work. The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+ 14. Revised Versions of this License.
+
+ The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+ Each version is given a distinguishing version number. If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation. If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+ If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+ Later license versions may give you additional or different
+permissions. However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+ 15. Disclaimer of Warranty.
+
+ THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+ 16. Limitation of Liability.
+
+ IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+ 17. Interpretation of Sections 15 and 16.
+
+ If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software: you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation, either version 3 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+ If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+ <program> Copyright (C) <year> <name of author>
+ This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+ You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+ The GNU General Public License does not permit incorporating your program
+into proprietary programs. If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License. But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/third_party/webgraph-big-3.5.0/JavaBig.pdf b/third_party/webgraph-big-3.5.0/JavaBig.pdf
new file mode 100644
index 0000000..514ffc8
Binary files /dev/null and b/third_party/webgraph-big-3.5.0/JavaBig.pdf differ
diff --git a/third_party/webgraph-big-3.5.0/build.properties b/third_party/webgraph-big-3.5.0/build.properties
new file mode 100644
index 0000000..15c86df
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/build.properties
@@ -0,0 +1,38 @@
+version=3.5.0
+
+jar.base=/usr/share/java
+javadoc.base=/usr/share/javadoc
+
+src=src
+dist=dist
+test=test
+slow=slow
+reports=reports
+coverage=coverage
+checkstyle=checkstyle
+docs=docs
+build=build
+instrumented=instr
+
+# Whenever it necessary to add new jar to the project, the following
+# data must be updated:
+
+# 1) the list of local javadocs
+# 2) the list of remote javadocs
+# 3) the list of javadocs referenced by the javadoc target
+# 4) the list of jars in the fingbugs target
+
+j2se.apiurl=http://download.oracle.com/javase/6/docs/api/
+fastutil.apiurl=http://fastutil.di.unimi.it/docs/
+webgraph.apiurl=http://webgraph.di.unimi.it/docs/
+dsiutils.apiurl=http://dsiutils.di.unimi.it/docs/
+sux4j.apiurl=http://sux4j.di.unimi.it/docs/
+colt.apiurl=http://acs.lbl.gov/ACSSoftware/colt/api/
+jsap.apiurl=http://www.martiansoftware.com/jsap/doc/javadoc/
+junit.apiurl=http://junit.sourceforge.net/javadoc_40/
+slf4j.apiurl=http://www.slf4j.org/apidocs/
+commons-configuration.apiurl=http://commons.apache.org/configuration/apidocs/
+commons-io.apiurl=http://commons.apache.org/proper/commons-io/javadocs/api-release/
+commons-lang.apiurl=http://commons.apache.org/proper/commons-lang/javadocs/api-release/
+commons-collections.apiurl=http://commons.apache.org/proper/commons-collections/javadocs/api-release/
+guava.apiurl=http://google.github.io/guava/releases/19.0/api/docs/
diff --git a/third_party/webgraph-big-3.5.0/build.xml b/third_party/webgraph-big-3.5.0/build.xml
new file mode 100644
index 0000000..0903d0d
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/build.xml
@@ -0,0 +1,304 @@
+<project name="webgraph-big" default="jar" basedir="." xmlns:ivy="antlib:org.apache.ivy.ant" xmlns:artifact="antlib:org.apache.maven.artifact.ant">
+
+ <property name="build.sysclasspath" value="ignore"/>
+ <property name="jars.dir" value="${basedir}/jars"/>
+ <property file="build.properties"/>
+
+ <property environment="env"/>
+
+ <property name="ivy.pom.version" value="${version}" />
+ <condition property="ivy.settings.file" value="${env.LOCAL_IVY_SETTINGS}"><isset property="env.LOCAL_IVY_SETTINGS"/></condition>
+
+ <taskdef resource="org/apache/ivy/ant/antlib.xml" uri="antlib:org.apache.ivy.ant"/>
+
+ <target name="ivy-setupjars" description="Downloads dependencies with ivy and generate report">
+ <ivy:retrieve symlink="true" sync="true" pattern="${jars.dir}/[conf]/[artifact].[ext]"/>
+ </target>
+
+ <target name="ivy-clean" description="Cleans ivy cache, jars dir and ivy installation">
+ <delete dir="${jars.dir}"/>
+ </target>
+
+ <target name="ivy-pom" description="Creates POM">
+ <ivy:resolve/>
+ <ivy:deliver deliverpattern="${dist}/ivy.xml" pubrevision="${version}" status="release"/>
+ <ivy:makepom ivyfile="${dist}/ivy.xml" templatefile="pom-model.xml" pomfile="pom.xml">
+ <dependency group="ch.qos.logback" artifact="logback-classic.jar" optional="true"/>
+ </ivy:makepom>
+ </target>
+
+ <path id="compile.classpath">
+ <fileset dir="${jars.dir}/compile"/>
+ </path>
+ <path id="test.classpath">
+ <fileset dir="${jars.dir}/test"/>
+ </path>
+ <path id="project.classpath">
+ <fileset dir="${jars.dir}/runtime"/>
+ </path>
+
+ <!-- ************************************** WARNING: MAVEN SH*T ************************************** -->
+
+ <!-- define Maven coordinates -->
+ <property name="groupId" value="it.unimi.dsi" />
+ <property name="artifactId" value="webgraph-big" />
+ <property name="version" value="${version}" />
+
+ <!-- define artifacts' name, which follows the convention of Maven -->
+ <property name="maven-jar" value="${dist}/lib/${artifactId}-${version}.jar" />
+ <property name="maven-javadoc-jar" value="${dist}/lib/${artifactId}-${version}-javadoc.jar" />
+ <property name="maven-sources-jar" value="${dist}/lib/${artifactId}-${version}-sources.jar" />
+
+ <!-- defined maven snapshots and staging repository id and url -->
+ <property name="maven-snapshots-repository-id" value="sonatype-nexus-snapshots" />
+ <property name="maven-snapshots-repository-url" value="https://oss.sonatype.org/content/repositories/snapshots/" />
+ <property name="maven-staging-repository-id" value="sonatype-nexus-staging" />
+ <property name="maven-staging-repository-url" value="https://oss.sonatype.org/service/local/staging/deploy/maven2/" />
+
+ <target name="dist" depends="compile,javadoc" description="generate the distribution">
+
+ <!-- build the main artifact -->
+ <jar jarfile="${maven-jar}" basedir="${build}" />
+
+ <!-- build the javadoc artifact (from symbolic link created in init) -->
+ <jar jarfile="${maven-javadoc-jar}">
+ <fileset dir="${dist}/javadoc" />
+ </jar>
+
+ <!-- build the sources artifact -->
+ <jar jarfile="${maven-sources-jar}">
+ <fileset dir="." includes="CHANGES,COPYING,build.xml,build.properties,ivy.xml,${src}/**/*.java,${src}/**/*.html,${test}/**/*.java,${slow}/**/*.java"/>
+ </jar>
+ </target>
+
+ <target name="deploy" depends="dist,ivy-pom" description="deploy snapshot version to Maven snapshot repository">
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-deploy-plugin:2.6:deploy-file" />
+ <arg value="-Durl=${maven-snapshots-repository-url}" />
+ <arg value="-DrepositoryId=${maven-snapshots-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-jar}" />
+ </artifact:mvn>
+ </target>
+
+ <target name="stage" depends="dist,ivy-pom" description="deploy release version to Maven staging repository">
+ <!-- sign and deploy the main artifact -->
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-gpg-plugin:1.3:sign-and-deploy-file" />
+ <arg value="-Durl=${maven-staging-repository-url}" />
+ <arg value="-DrepositoryId=${maven-staging-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-jar}" />
+ <arg value="-Pgpg" />
+ </artifact:mvn>
+
+ <!-- sign and deploy the sources artifact -->
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-gpg-plugin:1.3:sign-and-deploy-file" />
+ <arg value="-Durl=${maven-staging-repository-url}" />
+ <arg value="-DrepositoryId=${maven-staging-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-sources-jar}" />
+ <arg value="-Dclassifier=sources" />
+ <arg value="-Pgpg" />
+ </artifact:mvn>
+
+ <!-- sign and deploy the javadoc artifact -->
+ <artifact:mvn>
+ <arg value="org.apache.maven.plugins:maven-gpg-plugin:1.3:sign-and-deploy-file" />
+ <arg value="-Durl=${maven-staging-repository-url}" />
+ <arg value="-DrepositoryId=${maven-staging-repository-id}" />
+ <arg value="-DpomFile=pom.xml" />
+ <arg value="-Dfile=${maven-javadoc-jar}" />
+ <arg value="-Dclassifier=javadoc" />
+ <arg value="-Pgpg" />
+ </artifact:mvn>
+ </target>
+
+ <!-- ************************************** END OF MAVEN SH*T ************************************** -->
+
+ <property name="subdir" value=""/>
+
+ <!-- ************ SOURCE ********************* -->
+ <target name="init">
+ <available property="ivy.set.up" file="${jars.dir}"/>
+ <fail message="It appears that Ivy has not been set up properly. Please run &quot;ant ivy-setupjars&quot; and try again." unless="ivy.set.up"/>
+ <mkdir dir="${dist}"/>
+ <mkdir dir="${build}"/>
+ <mkdir dir="${docs}"/>
+ <mkdir dir="${reports}"/>
+ <mkdir dir="${coverage}"/>
+ <mkdir dir="${instrumented}"/>
+ <symlink link="${dist}/javadoc" resource="../${docs}" overwrite="true"/>
+ </target>
+
+ <target name="compile" depends="init" description="Compile sources (without tests)">
+ <javac srcdir="${src}" debug="on" optimize="on" destdir="${build}" encoding="UTF-8" source="1.8" target="1.8" classpathref="compile.classpath"/>
+ </target>
+
+ <target name="compile-tests" depends="init" description="Compile sources (with tests)">
+ <javac srcdir="${src}:${test}:${slow}" debug="on" optimize="on" destdir="${build}" encoding="UTF-8" source="1.8" target="1.8" classpathref="test.classpath"/>
+ </target>
+
+ <target name="jar" depends="compile" description="Creates jar (without tests)">
+ <jar jarfile="webgraph-big-${version}.jar">
+ <fileset dir="${build}"/>
+ </jar>
+ </target>
+
+ <target name="jar-tests" depends="compile-tests" description="Creates jar (with tests)">
+ <jar jarfile="webgraph-big-${version}.jar">
+ <fileset dir="${build}"/>
+ </jar>
+ </target>
+
+ <!-- ************ JAVADOC ********************* -->
+ <target name="javadoc" description="Generates documentation">
+ <delete dir="${docs}"/>
+ <mkdir dir="${docs}"/>
+ <javadoc destdir="${docs}"
+ encoding="UTF-8"
+ sourcepath="${src}"
+ packagenames="it.unimi.dsi.big.webgraph.*"
+ private="off"
+ overview="${src}/overview.html"
+ source="1.8"
+ windowtitle="WebGraph (big) ${version}"
+ classpathref="compile.classpath">
+ <link href="${j2se.apiurl}"/>
+ <link href="${fastutil.apiurl}"/>
+ <link href="${dsiutils.apiurl}"/>
+ <link href="${webgraph.apiurl}"/>
+ <link href="${sux4j.apiurl}"/>
+ <link href="${slf4j.apiurl}"/>
+ <link href="${colt.apiurl}"/>
+ <link href="${jsap.apiurl}"/>
+ <link href="${junit.apiurl}"/>
+ <link href="${commons-io.apiurl}"/>
+ <link href="${commons-lang.apiurl}"/>
+ <link href="${commons-configuration.apiurl}"/>
+ <link href="${commons-collections.apiurl}"/>
+ </javadoc>
+ </target>
+
+ <target name="junit" depends="instrument" description="Runs JUnit tests">
+
+ <junit printsummary="yes" fork="yes" haltonfailure="off" haltonerror="off">
+ <classpath>
+ <path refid="test.classpath" />
+ <pathelement location="${instrumented}/classes"/>
+ <pathelement location="${build}"/>
+ <pathelement location="${src}"/>
+ <pathelement location="${test}"/>
+ <pathelement location="${slow}"/>
+ </classpath>
+
+ <assertions><enable/></assertions>
+
+ <jvmarg value="-Demma.coverage.out.file=${coverage}/coverage.emma" />
+ <jvmarg value="-Demma.coverage.out.merge=true" />
+ <jvmarg value="-Xmx1G" />
+
+ <formatter type="xml"/>
+ <formatter type="plain"/>
+
+ <batchtest fork="yes" todir="${reports}">
+ <fileset dir="${instrumented}/classes">
+ <include name="it/unimi/dsi/big/webgraph/**/*Test.class"/>
+ <exclude name="it/unimi/dsi/big/webgraph/**/*SlowTest.class"/>
+ <exclude name="it/unimi/dsi/big/webgraph/test/*"/>
+ </fileset>
+ </batchtest>
+ </junit>
+
+ <junitreport todir="reports">
+ <fileset dir="reports">
+ <include name="TEST-*.xml"/>
+ </fileset>
+ <report todir="reports/html"/>
+ </junitreport>
+
+ <emma>
+ <report sourcepath="${src}" >
+ <fileset file="${coverage}/*a"/>
+ <html outfile="coverage.html" />
+ <xml outfile="${coverage}/coverage.xml" />
+ </report>
+ </emma>
+ </target>
+
+ <target name="junit-slow" depends="instrument" description="Runs JUnit tests">
+
+ <junit printsummary="yes" fork="yes" haltonfailure="off" haltonerror="off">
+ <classpath>
+ <path refid="test.classpath" />
+ <pathelement location="${instrumented}/classes"/>
+ <pathelement location="${build}"/>
+ <pathelement location="${src}"/>
+ <pathelement location="${test}"/>
+ <pathelement location="${slow}"/>
+ </classpath>
+
+ <assertions><enable/></assertions>
+
+ <jvmarg value="-Demma.coverage.out.file=${coverage}/coverage.emma" />
+
+ <jvmarg value="-Demma.coverage.out.file=${coverage}/coverage.emma" />
+ <jvmarg value="-Demma.coverage.out.merge=true" />
+ <jvmarg value="-Xmx120G" />
+
+ <formatter type="xml"/>
+ <formatter type="plain"/>
+
+ <batchtest fork="yes" todir="${reports}">
+ <fileset dir="${instrumented}/classes">
+ <include name="it/unimi/dsi/big/webgraph/**/*SlowTest.class"/>
+ </fileset>
+ </batchtest>
+ </junit>
+
+ <junitreport todir="reports">
+ <fileset dir="reports">
+ <include name="TEST-*.xml"/>
+ </fileset>
+ <report todir="reports/html"/>
+ </junitreport>
+
+ <emma>
+ <report sourcepath="${src}" >
+ <fileset file="${coverage}/*a"/>
+ <html outfile="coverage.html" />
+ <xml outfile="${coverage}/coverage.xml" />
+ </report>
+ </emma>
+ </target>
+
+
+ <target name="instrument" depends="compile-tests" description="Generate instrumented classes">
+ <taskdef resource="emma_ant.properties" classpathref="test.classpath"/>
+ <emma>
+ <instr mode="fullcopy"
+ outdir="${instrumented}"
+ merge="no"
+ metadatafile="${coverage}/metadata.emma"
+ instrpath="${build}"
+ >
+ <filter excludes="*Test*"/>
+ </instr>
+ </emma>
+ </target>
+
+ <!-- ************ CLEAN ********************* -->
+ <target name="clean">
+ <delete dir="${dist}"/>
+ <delete dir="${build}"/>
+ <delete dir="${reports}"/>
+ <delete dir="${coverage}"/>
+ <delete dir="${instrumented}"/>
+ <delete dir="${docs}"/>
+ <delete>
+ <fileset dir="." includes="*.jar"/>
+ </delete>
+ </target>
+</project>
+
diff --git a/third_party/webgraph-big-3.5.0/ivy.xml b/third_party/webgraph-big-3.5.0/ivy.xml
new file mode 100644
index 0000000..8821805
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/ivy.xml
@@ -0,0 +1,25 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+<ivy-module version="2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ant.apache.org/ivy/schemas/ivy.xsd">
+ <info organisation="it.unimi.dsi" module="webgraph-big"/>
+
+ <configurations defaultconf="compile" defaultconfmapping="*->default">
+ <conf name="compile"/>
+ <conf name="runtime" extends="compile"/>
+ <conf name="test" extends="runtime"/>
+ </configurations>
+
+ <dependencies>
+
+ <dependency org="it.unimi.dsi" name="fastutil" rev="latest.release" />
+ <dependency org="it.unimi.dsi" name="sux4j" rev="latest.release" />
+ <dependency org="it.unimi.dsi" name="dsiutils" rev="latest.release"/>
+ <dependency org="it.unimi.dsi" name="webgraph" rev="latest.release" />
+ <dependency org="com.martiansoftware" name="jsap" rev="latest.release"/>
+ <dependency org="junit" name="junit" rev="latest.release" conf="test"/>
+ <dependency org="emma" name="emma" rev="latest.release" conf="test"/>
+ <dependency org="emma" name="emma_ant" rev="latest.release" conf="test"/>
+
+ <dependency org="ch.qos.logback" name="logback-classic" rev="latest.release" conf="runtime"/>
+ <dependency org="commons-configuration" name="commons-configuration" rev="latest.release"/>
+ </dependencies>
+</ivy-module>
diff --git a/third_party/webgraph-big-3.5.0/pom-model.xml b/third_party/webgraph-big-3.5.0/pom-model.xml
new file mode 100644
index 0000000..d452034
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/pom-model.xml
@@ -0,0 +1,36 @@
+<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+ <groupId>it.unimi.dsi</groupId>
+ <artifactId>${ivy.pom.artifactId}</artifactId>
+ <packaging>jar</packaging>
+ <name>WebGraph (big)</name>
+ <version>${ivy.pom.version}</version>
+ <description>WebGraph is a framework to study the web graph. It provides simple ways to manage very large graph, exploiting modern compression techniques. The big version is a fork of the original WebGraph that can handle more than 2^31 nodes.</description>
+ <url>http://webgraph.dsi.unimi.it/</url>
+ <licenses>
+ <license>
+ <name>GNU General Public License Version 3+</name>
+ <url>http://www.gnu.org/licenses/gpl.html</url>
+ <distribution>repo</distribution>
+ </license>
+ </licenses>
+ <scm>
+ <connection>scm:git://github.com/vigna/WebGraph.git</connection>
+ <url>https://github.com/vigna/WebGraph</url>
+ </scm>
+ <developers>
+
+ <developer>
+ <id>boldi</id>
+ <name>Paolo Boldi</name>
+ <email>boldi@dsi.unimi.it</email>
+ </developer>
+
+ <developer>
+ <id>vigna</id>
+ <name>Sebastiano Vigna</name>
+ <email>vigna@dsi.unimi.it</email>
+ </developer>
+
+ </developers>
+</project>
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/BVGraphSlowTest.java b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/BVGraphSlowTest.java
new file mode 100644
index 0000000..4fd6808
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/BVGraphSlowTest.java
@@ -0,0 +1,98 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+import org.junit.Test;
+
+public class BVGraphSlowTest extends WebGraphTestCase {
+
+ protected final static class BigGraph extends ImmutableSequentialGraph {
+ private final long numNodes;
+ private final long outdegree;
+ private final int step;
+
+ public BigGraph(long numNodes, long outdegree, int step) {
+ if (outdegree * step > numNodes) throw new IllegalArgumentException();
+ this.numNodes = numNodes;
+ this.outdegree = outdegree;
+ this.step = step;
+ }
+
+ public BigGraph(long outdegree, int step) {
+ this(outdegree * step, outdegree, step);
+ }
+
+ @Override
+ public long numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public NodeIterator nodeIterator(long from) {
+ return new NodeIterator() {
+ long next = 0;
+ @Override
+ public boolean hasNext() {
+ return next < numNodes();
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ return next++;
+ }
+
+ @Override
+ public long outdegree() {
+ return next < 2 ? outdegree : 2;
+ }
+
+ @Override
+ public LazyLongIterator successors() {
+ if (next >= 2) return LazyLongIterators.wrap(new long[] { next - 2, next - 1 });
+ else return new AbstractLazyLongIterator() {
+ public long i = 0;
+ @Override
+ public long nextLong() {
+ if (i == outdegree) return -1;
+ else return i++ * step;
+ }
+ };
+ }
+ };
+ }
+ }
+
+ @Test
+ public void testStore() throws IOException {
+ final ImmutableGraph graph = new BigGraph(3L << 31, 1L << 30, 4);
+ File basename = File.createTempFile(BVGraphTest.class.getSimpleName(), "test");
+ BVGraph.store(graph, basename.toString());
+ assertEquals(graph, BVGraph.load(basename.toString()));
+ assertEquals(graph, BVGraph.loadMapped(basename.toString()));
+ assertEquals(graph, BVGraph.loadOffline(basename.toString()));
+ deleteGraph(basename);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/ConnectedComponentsSlowTest.java b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/ConnectedComponentsSlowTest.java
new file mode 100644
index 0000000..238187a
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/ConnectedComponentsSlowTest.java
@@ -0,0 +1,39 @@
+package it.unimi.dsi.big.webgraph;
+
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.big.webgraph.algo.ConnectedComponentsTest;
+import it.unimi.dsi.webgraph.Transform;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class ConnectedComponentsSlowTest extends WebGraphTestCase {
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.wrap(Transform.symmetrize(it.unimi.dsi.webgraph.ImmutableGraph.load(path)));
+ ConnectedComponentsTest.sameComponents(g);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/EstimateEffectiveDiameterSlowTest.java b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/EstimateEffectiveDiameterSlowTest.java
new file mode 100644
index 0000000..99ad519
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/EstimateEffectiveDiameterSlowTest.java
@@ -0,0 +1,45 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.big.webgraph.algo.HyperBall;
+import it.unimi.dsi.webgraph.algo.NeighbourhoodFunction;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class EstimateEffectiveDiameterSlowTest extends WebGraphTestCase {
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ final HyperBall hyperBall = new HyperBall(g, 8, 0);
+ hyperBall.run(Integer.MAX_VALUE, -1);
+ assertEquals(NeighbourhoodFunction.effectiveDiameter(.9, HyperBallSlowTest.cnr2000NF), NeighbourhoodFunction.effectiveDiameter(.9, hyperBall.neighbourhoodFunction.toDoubleArray()), 1);
+ hyperBall.close();
+ deleteGraph(path);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/HyperBallSlowTest.java b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/HyperBallSlowTest.java
new file mode 100644
index 0000000..a26ed0c
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/HyperBallSlowTest.java
@@ -0,0 +1,76 @@
+package it.unimi.dsi.big.webgraph;
+
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.big.webgraph.algo.HyperBall;
+import it.unimi.dsi.big.webgraph.algo.HyperBallTest;
+import it.unimi.dsi.big.webgraph.algo.SequentialHyperBall;
+import it.unimi.dsi.webgraph.algo.NeighbourhoodFunction;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+
+public class HyperBallSlowTest extends WebGraphTestCase {
+
+ /** The true (i.e., exactly computed by {@link NeighbourhoodFunction}) neighbourhood function of <code>cnr-2000</code>. */
+ public final static double[] cnr2000NF = { 325557.0, 3454267.0, 3.4531824E7, 1.5878699E8, 6.83926525E8, 1.190460703E9, 1.604430414E9, 2.35307782E9, 2.997067429E9, 3.968809803E9, 5.058079643E9,
+ 6.421976049E9, 8.284517654E9, 1.0243847731E10, 1.2607757915E10, 1.5228803201E10, 1.7747396141E10, 1.9909476778E10, 2.221766255E10, 2.4379845882E10, 2.6311779701E10, 2.8107451664E10,
+ 2.9665243165E10, 3.0951071763E10, 3.218581841E10, 3.3215135972E10, 3.4149034335E10, 3.4932882223E10, 3.5364851538E10, 3.5931189753E10, 3.6281498738E10, 3.6560429256E10, 3.6817190941E10,
+ 3.6998241145E10, 3.7125032189E10, 3.7214125718E10, 3.7278637339E10, 3.7317211025E10, 3.7344441435E10, 3.7363743739E10, 3.7376116159E10, 3.7386091516E10, 3.7393988067E10, 3.7401055259E10,
+ 3.740755634E10, 3.7413358276E10, 3.7418706947E10, 3.7423579858E10, 3.7427946736E10, 3.7431862349E10, 3.7435354797E10, 3.7438438086E10, 3.7441057447E10, 3.7443233065E10, 3.7445170896E10,
+ 3.7446818612E10, 3.7448244469E10, 3.7449425939E10, 3.745045924E10, 3.7451366966E10, 3.7452151719E10, 3.7452841271E10, 3.7453422635E10, 3.7453918161E10, 3.7454357668E10, 3.7454740726E10,
+ 3.7455030057E10, 3.745523956E10, 3.7455417775E10, 3.7455555869E10, 3.7455655899E10, 3.7455728404E10, 3.7455776324E10, 3.7455807203E10, 3.7455827683E10, 3.7455839892E10, 3.7455845502E10,
+ 3.7455848208E10, 3.7455850151E10, 3.745585096E10, 3.7455851388E10, 3.7455851633E10, 3.7455851773E10, 3.7455851833E10, 3.7455851843E10 };
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ int correct[] = new int[cnr2000NF.length];
+ final int limit = cnr2000NF.length;
+ for(int log2m: new int[] { 4, 7 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int attempt = 0; attempt < 10; attempt++) {
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(ImmutableGraph.wrap(g))) : null, log2m, null, 0, 0, 0, attempt % 2 != 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ for(int i = 1; i < limit; i++) {
+ System.err.println("log2m: " + log2m + " attempt: " + attempt + " round: " + i);
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ HyperBallTest.assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ HyperBallTest.assertRelativeError(sequentialCurrent, current, HyperBallTest.THRESHOLD);
+ if (Math.abs(cnr2000NF[i] - current) <= cnr2000NF[i] * 2 * rsd) correct[i]++;
+ }
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ for(int i = 1; i < limit; i++) assertTrue(correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ deleteGraph(path);
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/StronglyConnectedComponentsSlowTest.java b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/StronglyConnectedComponentsSlowTest.java
new file mode 100644
index 0000000..1a634a2
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/StronglyConnectedComponentsSlowTest.java
@@ -0,0 +1,26 @@
+package it.unimi.dsi.big.webgraph;
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.algo.StronglyConnectedComponentsTarjan;
+import it.unimi.dsi.big.webgraph.algo.StronglyConnectedComponents;
+import it.unimi.dsi.big.webgraph.algo.StronglyConnectedComponentsTest;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+public class StronglyConnectedComponentsSlowTest extends WebGraphTestCase {
+
+ @Test
+ public void testLarge() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ final StronglyConnectedComponentsTarjan componentsRecursive = StronglyConnectedComponentsTarjan.compute(g, true, new ProgressLogger());
+ final StronglyConnectedComponents componentsIterative = StronglyConnectedComponents.compute(g, true, new ProgressLogger());
+ assertEquals(componentsRecursive.numberOfComponents, componentsIterative.numberOfComponents);
+ StronglyConnectedComponentsTest.sameComponents(g.numNodes(), componentsRecursive, componentsIterative);
+ deleteGraph(path);
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/TransformSlowTest.java b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/TransformSlowTest.java
new file mode 100644
index 0000000..acc4ad4
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/TransformSlowTest.java
@@ -0,0 +1,66 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.Transform.ArcFilter;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.util.XorShift1024StarRandom;
+
+import java.io.IOException;
+
+import org.junit.Test;
+
+public class TransformSlowTest extends WebGraphTestCase {
+
+ @Test
+ public void testTranspose() throws IOException {
+ final ImmutableGraph graph = new BVGraphSlowTest.BigGraph(3L << 28, 1 << 2);
+ assertEquals(graph, Transform.transposeOffline(Transform.transposeOffline(graph, 1000000000), 1000000000));
+ }
+
+ @Test
+ public void testSymmetrize() throws IOException {
+ final ImmutableGraph graph = new BVGraphSlowTest.BigGraph(3L << 28, 1 << 2);
+ assertEquals(Transform.symmetrizeOffline(graph, 1000000000), Transform.symmetrizeOffline(Transform.symmetrizeOffline(graph, 1000000000), 1000000000));
+ assertEquals(Transform.symmetrizeOffline(graph, 1000000000), Transform.symmetrizeOffline(Transform.transposeOffline(graph, 1000000000), 1000000000));
+ }
+
+ @Test
+ public void testMap() throws IOException {
+ final ImmutableGraph graph = new BVGraphSlowTest.BigGraph((2L << 20) + 1, 1 << 10);
+ final long[][] perm = Util.identity(graph.numNodes());
+ LongBigArrays.shuffle(perm, new XorShift1024StarRandom(0));
+ final long[][] inv = Util.invertPermutation(perm);
+ assertEquals(graph, Transform.mapOffline(Transform.mapOffline(graph, perm, 1000000000), inv, 1000000000));
+ }
+
+ @Test
+ public void testFilter() {
+ final ImmutableGraph graph = new BVGraphSlowTest.BigGraph((2L << 20) + 1, 1 << 10);
+ // Just testings that the basic implementation is OK.
+ assertEquals(graph, Transform.filterArcs(graph, new ArcFilter() {
+ @Override
+ public boolean accept(long i, long t) {
+ return true;
+ }
+ }));
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.graph b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.graph
new file mode 100644
index 0000000..94cd2ac
Binary files /dev/null and b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.graph differ
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.graph-txt.gz b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.graph-txt.gz
new file mode 100644
index 0000000..09dcae2
Binary files /dev/null and b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.graph-txt.gz differ
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.offsets b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.offsets
new file mode 100644
index 0000000..55f5ca0
Binary files /dev/null and b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.offsets differ
diff --git a/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.properties b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.properties
new file mode 100644
index 0000000..deca8dd
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/slow/it/unimi/dsi/big/webgraph/cnr-2000.properties
@@ -0,0 +1,15 @@
+#BVGraph properties
+#Mon Apr 03 14:58:33 CEST 2006
+bitspernode=35.15
+arcs=3216152
+nodes=325557
+graphclass=it.unimi.dsi.big.webgraph.BVGraph
+maxrefcount=3
+windowsize=7
+minintervallength=3
+bitsperlink=3.56
+avgdist=1.74
+compressionflags=
+version=0
+avgref=1.38
+zetak=3
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ASCIIGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ASCIIGraph.java
new file mode 100644
index 0000000..3b24ecc
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ASCIIGraph.java
@@ -0,0 +1,304 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.longs.LongBigArrayBigList;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.BufferedReader;
+import java.io.FileOutputStream;
+import java.io.FileReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.io.StreamTokenizer;
+import java.lang.reflect.InvocationTargetException;
+import java.util.NoSuchElementException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+/** An {@link ImmutableGraph} that corresponds to graphs stored in a human-readable
+ * ASCII format where each line contains the list of successors of a given node.
+ *
+ * <p>The file format is as follows: the graph is stored in a file named <code><var>basename</var>.graph-txt</code>.
+ * The first line contains the number of nodes, <var>n</var>. Then, <var>n</var> lines follow, the <var>i</var>-th
+ * line containing the successors of node <var>i</var> in increasing order
+ * (nodes are numbered from 0 to <var>n</var>&minus;1).
+ * Successors are separated by a single space.
+ *
+ * <P>Contrarily to other classes, the load methods of this class <strong>do not always return instances of this class</strong>.
+ * In particular, {@link #loadOffline(CharSequence)} and {@link #loadOnce(InputStream)} <em>will</em> return an instance of this class for
+ * offline access. The instance will not provide random access, but sequential access will be backed by
+ * the original text file and only one array of successor will be loaded in core memory at any time.
+ *
+ * <p>The {@link #load(CharSequence)} method, on the other hand, will return a {@linkplain
+ * ImmutableGraph#wrap(it.unimi.dsi.webgraph.ImmutableGraph) wrapped} instance of
+ * {@link it.unimi.dsi.webgraph.ArrayListMutableGraph} built by copying a {@linkplain
+ * ImmutableGraph#wrap(ImmutableGraph) wrapped} offline instance of this class.
+ *
+ * <h2>Using {@link ASCIIGraph} to convert your data</h2>
+ *
+ * <p>A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs. Suppose you
+ * create the following file, named <code>example.graph-txt</code>:
+ * <pre>
+ * 2
+ * 1
+ * 0 1
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -g ASCIIGraph example bvexample
+ * </pre>
+ * will produce a compressed graph in {@link it.unimi.dsi.big.webgraph.BVGraph} format
+ * with basename <code>bvexample</code>. Even more convenient is the {@link #loadOnce(InputStream)}
+ * method, which reads from an input stream an ASCII graph and exposes it for a single traversal. It
+ * can be used, for instance, with the main method of {@link it.unimi.dsi.big.webgraph.BVGraph} to
+ * generate somehow an ASCII graph and store it in compressed form on the fly. The previous
+ * example could be then rewritten as
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -1 -g ASCIIGraph dummy bvexample &lt;example.graph-txt
+ * </pre>
+ */
+
+
+public class ASCIIGraph extends ImmutableSequentialGraph {
+ /** The standard extension of an ASCII graph. */
+ private static final String ASCII_GRAPH_EXTENSION = ".graph-txt";
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(ASCIIGraph.class);
+
+ /** Number of nodes. */
+ private final long n;
+ /** The file containing the graph, or <code>null</code> for a read-once ASCII graph. */
+ private final CharSequence graphFile;
+ /** A fast buffered reader containing the description of an ASCII graph (except for the number of nodes) for a read-once ASCII graph; <code>null</code>, otherwise. */
+ private final FastBufferedReader fbr;
+
+ protected ASCIIGraph(final CharSequence graphFile) throws NumberFormatException, IOException {
+ this.graphFile = graphFile;
+
+ final BufferedReader bufferedReader = new BufferedReader(new FileReader(graphFile.toString() + ASCII_GRAPH_EXTENSION));
+ n = Long.parseLong(bufferedReader.readLine());
+ bufferedReader.close();
+ fbr = null;
+ if (n < 0) throw new IllegalArgumentException("Number of nodes must be nonnegative");
+ }
+
+ /** Creates a read-once ASCII graph. Instances created using this constructor can be
+ * only accessed using a single call to {@link #nodeIterator(long)}.
+ *
+ * @param is an input stream containing an ASCII graph.
+ */
+
+ public ASCIIGraph(final InputStream is) throws NumberFormatException, IOException {
+ graphFile = null;
+ fbr = new FastBufferedReader(new InputStreamReader(is, "ASCII"));
+ n = Long.parseLong(fbr.readLine(new MutableString()).toString());
+ if (n < 0) throw new IllegalArgumentException("Number of nodes must be nonnegative");
+ }
+
+ @Override
+ public long numNodes() {
+ return n;
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final long from) {
+ if (from < 0 || from > n) throw new IllegalArgumentException();
+ try {
+ final FastBufferedReader fbr = this.fbr != null ? this.fbr : new FastBufferedReader(new FileReader(graphFile + ASCII_GRAPH_EXTENSION));
+ final MutableString s = new MutableString();
+ // We skip up to from, but we skip the first line only if this is not a read-once scan (in that case the constructor has read the first line).
+ for (long i = from + (this.fbr != null ? 0 : 1); i-- != 0;)
+ fbr.readLine(s);
+
+ final StreamTokenizer st = new StreamTokenizer(fbr);
+ st.eolIsSignificant(true);
+ st.parseNumbers();
+
+ return new NodeIterator() {
+ long i = from;
+
+ LongBigArrayBigList successors = new LongBigArrayBigList();
+
+ @Override
+ public boolean hasNext() {
+ return i < n;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ return successors.elements();
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ successors.clear();
+ int tokenType;
+ long dep;
+
+ try {
+ do {
+ tokenType = st.nextToken();
+ if (tokenType == StreamTokenizer.TT_NUMBER) {
+ successors.add(dep = (long)st.nval);
+ if (dep < 0 || dep >= n)
+ throw new IOException("The value " + dep + " is not a node index at line " + st.lineno());
+ }
+ else if (tokenType != StreamTokenizer.TT_EOL) {
+ throw new IOException("Unexpected token " + st.toString());
+ }
+ } while (tokenType != StreamTokenizer.TT_EOL);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ return i++;
+ }
+
+ @Override
+ public long outdegree() {
+ return successors.size64();
+ }
+
+ };
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return loadOffline(basename);
+ }
+
+ @Deprecated
+ public static ASCIIGraph loadSequential(CharSequence basename, ProgressLogger unused) throws IOException {
+ return loadOffline(basename, unused);
+ }
+
+ public static ASCIIGraph loadOffline(CharSequence basename) throws IOException {
+ return loadOffline(basename, (ProgressLogger)null);
+ }
+
+ public static ASCIIGraph loadOffline(CharSequence basename, ProgressLogger unused) throws IOException {
+ return new ASCIIGraph(basename);
+ }
+
+ public static ASCIIGraph loadMapped(CharSequence basename) throws IOException {
+ return loadOffline(basename);
+ }
+
+ public static ASCIIGraph loadMapped(CharSequence basename, ProgressLogger unused) throws IOException {
+ return loadOffline(basename);
+ }
+
+ public static ASCIIGraph loadOnce(final InputStream is) throws IOException {
+ return new ASCIIGraph(is);
+ }
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(basename, (ProgressLogger)null);
+ }
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger unused) throws IOException {
+ return ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(loadOffline(basename))).immutableView());
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename, @SuppressWarnings("unused") ProgressLogger unused) throws IOException {
+ store(graph, basename);
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename) throws IOException {
+ final PrintStream ps = new PrintStream(new FastBufferedOutputStream(new FileOutputStream(basename + ASCII_GRAPH_EXTENSION)), false, Charsets.US_ASCII.toString());
+ long n = graph.numNodes();
+ LazyLongIterator successors;
+
+ ps.println(n);
+ for (NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ nodeIterator.nextLong();
+ long d = nodeIterator.outdegree();
+ successors = nodeIterator.successors();
+ while (d-- != 0) ps.print(successors.nextLong() + " ");
+ ps.println();
+ }
+ ps.close();
+ }
+
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException, ClassNotFoundException, InstantiationException {
+ String sourceBasename, destBasename;
+ Class<?> graphClass;
+
+ SimpleJSAP jsap = new SimpleJSAP(ASCIIGraph.class.getName(), "Reads a graph with a given basename, or a given spec, and writes it out in ASCII format with another basename",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph"),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a spec of the form ImmutableGraphClass(arg,arg,...)."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph, or a source spec if --spec was given; it is immaterial when --once is specified."),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph"),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) return;
+
+ graphClass = jsapResult.getClass("graphClass");
+ sourceBasename = jsapResult.getString("sourceBasename");
+ destBasename = jsapResult.getString("destBasename");
+ final boolean spec = jsapResult.getBoolean("spec");
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ if (graphClass != null && spec) {
+ System.err.println("Options --graphClass and --spec are incompatible");
+ return;
+ }
+
+ ImmutableGraph graph;
+ if (!spec)
+ graph = graphClass != null
+ ? (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class, ProgressLogger.class).invoke(null, sourceBasename, pl)
+ : ImmutableGraph.loadOffline(sourceBasename, pl);
+ else
+ graph = ObjectParser.fromSpec(sourceBasename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ ASCIIGraph.store(graph, destBasename);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/AbstractLazyLongIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/AbstractLazyLongIterator.java
new file mode 100644
index 0000000..4ef6763
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/AbstractLazyLongIterator.java
@@ -0,0 +1,33 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract implementation of a lazy integer iterator, implementing {@link #skip(long)}
+ * by repeated calls to {@link LazyLongIterator#nextLong() nextInt()}. */
+
+public abstract class AbstractLazyLongIterator implements LazyLongIterator {
+
+ @Override
+ public long skip(final long n) {
+ long i;
+ for(i = 0; i < n && nextLong() != -1; i++);
+ return i;
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ArcListASCIIGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ArcListASCIIGraph.java
new file mode 100644
index 0000000..3e210e7
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ArcListASCIIGraph.java
@@ -0,0 +1,345 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.longs.LongBigArrayBigList;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.io.FastBufferedReader;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.io.StreamTokenizer;
+import java.lang.reflect.InvocationTargetException;
+import java.util.NoSuchElementException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+/** An {@link ImmutableGraph} that corresponds to graphs stored in a human-readable
+ * ASCII format were each line contains an arc.
+ *
+ * <p>The file format is very simple: each line contains an arc specified as two nodes
+ * separated by whitespace (but we suggest exactly one TAB character). Sources must be in increasing
+ * order, but targets can be in any order. The {@linkplain #ArcListASCIIGraph(InputStream, int) constructor}
+ * provides an additional parameter, called <em>shift</em>, which will be added to
+ * all node indices. The default is 0, but for lists that number nodes starting from 1
+ * it can be set to -1. Actually, the class {@link ShiftedByOneArcListASCIIGraph} can be used in place
+ * of this class for setting the shift to -1 without specifying additional parameters.
+ *
+ * <P>Contrarily to other classes, the load methods of this class <strong>do not always return instances of this class</strong>.
+ * In particular, {@link #loadOnce(InputStream)} <em>will</em> return an instance of this class for
+ * read-once access. The instance will not provide offline or random access, but read-once access will be backed by
+ * the original input stream and only the successors of a single node will be loaded in core memory at any time.
+ *
+ * <p>The {@link #load(CharSequence)} method, on the other hand, will return a
+ * {@linkplain ImmutableGraph#wrap(it.unimi.dsi.webgraph.ImmutableGraph) wrapped} instance of
+ * {@link it.unimi.dsi.webgraph.ArrayListMutableGraph} built by copying a
+ * {@linkplain ImmutableGraph#wrap(ImmutableGraph) wrapped} offline instance of this class: thus, it is limited
+ * to non-big graphs.
+ *
+ * <h2>Using {@link ArcListASCIIGraph} to convert your data</h2>
+ *
+ * <p>A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by arc lists. Suppose you
+ * create the following file, named <code>example.arcs</code>:
+ * <pre>
+ * 0 1
+ * 1 2
+ * 2 1
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -g ArcListASCIIGraph example.arcs bvexample
+ * </pre>
+ * will produce a compressed graph in {@link it.unimi.dsi.big.webgraph.BVGraph} format
+ * with basename <code>bvexample</code>. Even more convenient, and extremely
+ * more efficient, is the {@link #loadOnce(InputStream)}
+ * method, which reads from an input stream an arc-list ASCII graph and exposes it for a single traversal. It
+ * can be used, for instance, with the main method of {@link it.unimi.dsi.big.webgraph.BVGraph} to
+ * generate somehow an arc-list ASCII graph and store it in compressed form on the fly. The previous
+ * example could be then rewritten as
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -1 -g ArcListASCIIGraph dummy bvexample &lt;example.arcs
+ * </pre>
+ *
+ */
+
+
+public class ArcListASCIIGraph extends ImmutableSequentialGraph {
+ private final static boolean DEBUG = false;
+ private static final Logger LOGGER = LoggerFactory.getLogger(ArcListASCIIGraph.class);
+
+ /** Number of nodes. */
+ private long n;
+ /** A fast buffered reader containing the description of an ASCII graph (except for the number of nodes) for a read-once ASCII graph; <code>null</code>, otherwise. */
+ private final FastBufferedReader fbr;
+ /** The shift. All node numbers will be shifted by this value. */
+ private final long shift;
+
+ /** Creates a read-once arc-list ASCII graph. Instances created using this constructor can be
+ * only accessed using a single call to {@link #nodeIterator(long)}.
+ *
+ * @param is an input stream containing an arc-list ASCII graph.
+ */
+
+ public ArcListASCIIGraph(final InputStream is, final int shift) throws NumberFormatException, IOException {
+ this.shift = shift;
+ fbr = new FastBufferedReader(new InputStreamReader(is, "ASCII"));
+ n = -1;
+ }
+
+ @Override
+ public long numNodes() {
+ if (n == -1) throw new UnsupportedOperationException("The number of nodes is unknown (you need to complete a traversal)");
+ return n;
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final long from) {
+ if (from < 0) throw new IllegalArgumentException();
+ try {
+ final StreamTokenizer st = new StreamTokenizer(fbr);
+ st.eolIsSignificant(true);
+ st.parseNumbers();
+
+ return new NodeIterator() {
+ /** The maximum node index we ever saw. */
+ long maxNodeSeen;
+ long following = -1;
+ long curr = -1;
+ boolean eof;
+
+ LongBigArrayBigList successors = new LongBigArrayBigList();
+
+ {
+ fillNextLine();
+ // ALERT: WRONG! This skips from lines, but does not skip up to node from!
+ for(long i = 0; i < from; i++) nextLong();
+ }
+
+
+ private void ensureNumberToken() {
+ if (st.ttype != StreamTokenizer.TT_NUMBER || st.nval != (long)st.nval) throw new IllegalArgumentException("Expected integer, found " + st.toString());
+ if ((long)st.nval + shift < 0) throw new IllegalArgumentException("Integer plus shift is negative: " + st.toString());
+ }
+
+ private void fillNextLine() throws IOException {
+ if (eof) return;
+ if (DEBUG) System.err.println("Filling next line (curr = " + curr + ", following = " + following +")");
+ successors.clear();
+ if (following == -1) {
+ while(st.nextToken() == StreamTokenizer.TT_EOL); // Skip empty lines
+ ensureNumberToken();
+ }
+ if (following > (long)st.nval + shift) throw new IllegalArgumentException("Source nodes must be sorted");
+ following = (long)st.nval + shift;
+ if (following > maxNodeSeen) maxNodeSeen = following;
+
+ if (DEBUG) System.err.println("New following node: " + following);
+ st.nextToken();
+ ensureNumberToken();
+ long successor = (long)st.nval + shift;
+ if (DEBUG) System.err.println("Adding successor " + successor);
+ successors.add(successor);
+ if (successor > maxNodeSeen) maxNodeSeen = successor;
+ st.nextToken();
+
+ for(;;) {
+ final int nextToken = st.nextToken();
+ if (nextToken == StreamTokenizer.TT_EOF) {
+ eof = true;
+ n = maxNodeSeen + 1;
+ break;
+ }
+ if (nextToken == StreamTokenizer.TT_EOL) continue; // Skip empty lines
+ ensureNumberToken();
+ if ((long)st.nval + shift != following) {
+ if (following > (long)st.nval + shift) throw new IllegalArgumentException("Source nodes must be sorted");
+ if (DEBUG) System.err.println("New source (" + (long)st.nval + "), breaking the loop...");
+ if ((long)st.nval + shift > maxNodeSeen) maxNodeSeen = (long)st.nval + shift;
+ break;
+ }
+ st.nextToken();
+ ensureNumberToken();
+ successor = (long)st.nval + shift;
+ if (DEBUG) System.err.println("Adding successor " + successor);
+ successors.add(successor);
+ if (successor > maxNodeSeen) maxNodeSeen = successor;
+ st.nextToken();
+ }
+
+ LongBigArrays.quickSort(successors.elements(), 0, successors.size64());
+ }
+
+ @Override
+ public boolean hasNext() {
+ return curr < maxNodeSeen;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (curr == -1) throw new IllegalStateException();
+ return curr == following ? successors.elements() : LongBigArrays.EMPTY_BIG_ARRAY;
+ }
+
+ @Override
+ public final long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ if (++curr > following) try {
+ fillNextLine();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return curr;
+ }
+
+ @Override
+ public long outdegree() {
+ if (curr == -1) throw new IllegalStateException();
+ return curr == following ? successors.size64() : 0;
+ }
+
+ };
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ArcListASCIIGraph loadOnce(final InputStream is) throws IOException {
+ return new ArcListASCIIGraph(is, 0);
+ }
+
+ public static ArcListASCIIGraph loadOnce(final InputStream is, final int shift) throws IOException {
+ return new ArcListASCIIGraph(is, shift);
+ }
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(basename, null);
+ }
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger unused) throws IOException {
+ return ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(loadOnce(new FastBufferedInputStream(new FileInputStream(basename.toString()))))).immutableView());
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename, @SuppressWarnings("unused") ProgressLogger unused) throws IOException {
+ store(graph, basename);
+ }
+
+
+ public static void store(final ImmutableGraph graph, final CharSequence basename) throws IOException {
+ store(graph, basename, 0);
+ }
+
+ /** Stores an arc-list ASCII graph with a given shift.
+ *
+ * @param graph a graph to be stored.
+ * @param basename the name of the output file.
+ * @param shift a shift that will be added to each node; note that is the <em>opposite</em> of the shift that will
+ * have to be used to load the generated file.
+ */
+
+ public static void store(final ImmutableGraph graph, final CharSequence basename, final int shift) throws IOException {
+ final PrintStream ps = new PrintStream(new FastBufferedOutputStream(new FileOutputStream(basename.toString())), false, Charsets.US_ASCII.toString());
+ long d, s;
+ long[][] successor;
+ for (NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ s = nodeIterator.nextLong();
+ d = nodeIterator.outdegree();
+ successor = nodeIterator.successorBigArray();
+ for(long i = 0; i < d; i++) ps.println((s + shift) + "\t" + (LongBigArrays.get(successor, i) + shift));
+ }
+ ps.close();
+ }
+
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException {
+ String sourceBasename, destBasename;
+ Class<?> graphClass;
+
+ SimpleJSAP jsap = new SimpleJSAP(ArcListASCIIGraph.class.getName(), "Reads a graph with a given basename and writes it out in ASCII format with another basename",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph"),
+ new FlaggedOption("shift", JSAP.INTEGER_PARSER, null, JSAP.NOT_REQUIRED, 'S', "shift", "A shift that will be added to each node index."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph"),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph"),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) return;
+
+ graphClass = jsapResult.getClass("graphClass");
+ sourceBasename = jsapResult.getString("sourceBasename");
+ destBasename = jsapResult.getString("destBasename");
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ final ImmutableGraph graph = graphClass != null
+ ? (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class, ProgressLogger.class).invoke(null, sourceBasename, pl)
+ : ImmutableGraph.loadOffline(sourceBasename, pl);
+ if (jsapResult.userSpecified("shift")) ArcListASCIIGraph.store(graph, destBasename, jsapResult.getInt("shift"));
+ else ArcListASCIIGraph.store(graph, destBasename);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/BVGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/BVGraph.java
new file mode 100644
index 0000000..a8b84d2
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/BVGraph.java
@@ -0,0 +1,2125 @@
+package it.unimi.dsi.big.webgraph;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.PrintWriter;
+import java.lang.reflect.InvocationTargetException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.math.RoundingMode;
+import java.nio.channels.FileChannel.MapMode;
+import java.text.DecimalFormat;
+import java.text.NumberFormat;
+import java.util.Formatter;
+import java.util.Locale;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.fastutil.BigArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream;
+import it.unimi.dsi.fastutil.longs.LongArrayList;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.io.ByteBufferInputStream;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.NullOutputStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.MutableString;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+
+
+/** An immutable graph represented using the techniques described in
+ * &ldquo;<a href="http://vigna.dsi.unimi.it/papers.php#BoVWFI">The WebGraph Framework I: Compression Techniques</a>&rdquo;, by Paolo Boldi and
+ * Sebastiano Vigna, in <i>Proc&#46; of the Thirteenth World&ndash;Wide Web
+ * Conference</i>, pages 595&minus;601, 2004, ACM Press.
+ *
+ * <P>This class provides a flexible and configurable way to store and
+ * access web graphs in a compressed form. Its main method can load an
+ * {@link ImmutableGraph} and compress it. The resulting compressed {@link
+ * BVGraph} is described by a <em>graph file</em> (with extension
+ * <code>.graph</code>), an <em>offset file</em> (with extension
+ * <code>.offsets</code>) and a <em>property file</em> (with extension
+ * <code>.properties</code>). The latter, not surprisingly, is a Java property file.
+ * Optionally, an <em>offset big-list file</em> (with extension
+ * <code>.obl</code>) can be created to load graphs faster.
+ * *
+ * <p>As a rule of thumb, random access is faster using {@link #successors(long)}, whereas
+ * while iterating using a {@link NodeIterator} it is better to use {@link NodeIterator#successorBigArray()}.
+ *
+ * <h2>The Graph File</h2>
+ *
+ * <P>This class stores a graph as an <a href="http://dsiutils.dsi.unimi.it/docs/it/unimi/dsi/io/InputBitStream.html">bit stream</a>. The bit stream format
+ * depends on a number of parameters and encodings that can be mixed
+ * orthogonally. The parameters are:
+ *
+ * <ul>
+ *
+ * <li>the <em>window size</em>, a nonnegative integer;
+ * <li>the <em>maximum reference count</em>, a positive integer (it is meaningful only when the window is nonzero);
+ * <li>the <em>minimum interval length</em>, an integer larger than or equal to two, or 0, which is interpreted as infinity.
+ *
+ * </ul>
+ *
+ * <H3>Successor Lists</H3>
+ *
+ * <P>The graph file is a sequence of successor lists, one for each node.
+ * The list of node <var>x</var> can be thought of as a sequence of natural numbers (even though, as we will
+ * explain later, this sequence is further coded suitably as a sequence of bits):
+ * <OL STYLE="list-style-type: lower-alpha">
+ * <LI>The outdegree of the node; if it is zero, the list ends here.
+ * <LI>If the window size is not zero, the <em>reference part</em>, that is:
+ * <OL><LI>a nonnegative integer, the <em>reference</em>, which never exceeds the window size; if the reference
+ * is <var>r</var>, the list of successors will be specified as a modified version of the list of successors
+ * of <var>x</var>&minus;<var>r</var>; if <var>r</var> is 0, then the list of successors will be specified
+ * explicitly;
+ * <LI>if <var>r</var> is nonzero:
+ * <OL STYLE="list-style-type: lower-roman">
+ * <LI>a natural number <var>b</var>, the <em>block count</em>;
+ * <LI>a sequence of <var>b</var> natural numbers <var>B</var><sub>1</sub>, &hellip;, <var>B</var><sub>b</sub>, called the <em>copy-block list</em>; only the
+ * first number can be zero.
+ * </OL>
+ *
+ * </OL>
+ * <LI>Then comes the <em>extra part</em>, specifying some more entries that the list of successors contains (or all of them, if
+ * <var>r</var> is zero), that is:
+ * <OL>
+ * <LI>If the minimum interval length is finite,
+ * <OL STYLE="list-style-type: lower-roman">
+ * <LI>an integer <var>i</var>, the <em>interval count</em>;
+ * <LI>a sequence of <var>i</var> pairs, whose first component is the left extreme of an interval,
+ * and whose second component is the length of the interval (i.e., the number of integers contained in the interval).
+ * </OL>
+ * <li>Finally, the list of <em>residuals</em>, which contain all successors not specified by previous methods.
+ * </OL>
+ * </OL>
+ *
+ * <P>The above data should be interpreted as follows:
+ * <ul>
+ * <li>The reference part, if present (i.e., if both the window size and the reference are strictly positive), specifies
+ * that part of the list of successors of node <var>x</var>&minus;<var>r</var> should be copied; the successors of
+ * node <var>x</var>&minus;<var>r</var> that should be copied are described in the copy-block list; more precisely, one should copy
+ * the first <var>B</var><sub>1</sub> entries of this list, discard the next <var>B</var><sub>2</sub>, copy
+ * the next <var>B</var><sub>3</sub> etc. (the last remaining elements of the list of successors will be copied if <var>b</var> is
+ * even, and discarded if <var>b</var> is odd).
+ * <li>The extra part specifies additional successors (or all of them, if the reference part is absent); the extra part is not present
+ * if the number of successors that are to be copied according to the reference part already coincides with the outdegree of <var>x</var>;
+ * the successors listed in the extra part are given in two forms:
+ * <ul>
+ * <li>some of them are specified as belonging to (integer) intervals, if the minimum interval length is finite;
+ * the interval count indicates how many intervals,
+ * and the intervals themselves are listed as pairs (left extreme, length);
+ * <li>the residuals are the remaining "scattered" successors.
+ * </ul>
+ * </ul>
+ *
+ *
+ * <H3>How Successor Lists Are Coded</H3>
+ *
+ * <P>As we said before, the list of integers corresponding to each successor list should be coded into a sequence of bits.
+ * This is (ideally) done in two phases: we first modify the sequence in a suitable manner (as explained below) so to obtain
+ * another sequence of integers (some of them might be negative). Then each single integer is coded, using a coding that can
+ * be specified as an option; the integers that may be negative are first turned into natural numbers using {@link Fast#int2nat(int)}.
+ *
+ * <OL>
+ * <LI>The outdegree of the node is left unchanged, as well as the reference and the block count;
+ * <LI>all blocks are decremented by 1, except for the first one;
+ * <LI>the interval count is left unchanged;
+ * <LI>all interval lengths are decremented by the minimum interval length;
+ * <LI>the first left extreme is expressed as its difference from <var>x</var> (it will be negative if the first extreme is
+ * less than <var>x</var>); the remaining left extremes are expressed as their distance from the previous right extreme
+ * plus 2 (e.g., if the interval is [5..11] and the previous one was [1..3], then the left extreme 5 is expressed as
+ * 5-(3+2)=5-5=0);
+ * <LI>the first residual is expressed as its difference from <var>x</var> (it will be negative if the first residual is
+ * less than <var>x</var>); the remaining residuals are expressed as decremented differences from the previous residual.
+ * </OL>
+ *
+ * <H2>The Offset File</H2>
+ *
+ * <P>Since the graph is stored as a bit stream, we must have some way to know where each successor list starts.
+ * This information is stored in the offset file, which contains the bit offset of each successor list (in particular,
+ * the offset of the first successor list will be zero). As a commodity, the offset file contains an additional
+ * offset pointing just after the last successor list (providing, as a side-effect, the actual bit length of the graph file).
+ * Each offset (except for the first one) is stored as a suitably coded difference from the previous offset.
+ *
+ * <p>The list of offsets can be additionally stored as a serialised {@link EliasFanoMonotoneLongBigList}
+ * using a suitable command-line option. If the serialised big list is detected, it is loaded instead of parsing the offset list.
+ *
+ * <H2>The Property File</H2>
+ *
+ * <P>This file contains self-explaining entries that are necessary to correctly decode the graph and offset files, and
+ * moreover give some statistical information about the compressed graph (e.g., the number of bits per link).
+ * <dl>
+ * <dt><code>nodes</code>
+ * <dd>the number of nodes of the graph.
+ * <dt><code>nodes</code>
+ * <dd>the number of arcs of the graph.
+ * <dt><code>version</code>
+ * <dd>a version number.
+ * <dt><code>graphclass</code>
+ * <dd>the name of the class that should load this graph ({@link ImmutableGraph} convention).
+ * <dt><code>bitsperlink</code>
+ * <dd>the number of bits per link (overall graph size in bits divided by the number of arcs).
+ * <dt><code>bitspernode</code>
+ * <dd>the number of bits per node (overall graph size in bits divided by the number of nodes).
+ * <dt><code>compratio</code>
+ * <dd>the ratio between the graph size and the information-theoretical lower bound (the binary logarithm of the number of subsets of size <code>arcs</code> out of a universe of <code>nodes</code><sup>2</sup> elements).
+ * <dt><code>compressionflags</code>
+ * <dd>flags specifying the codes used for the components of the compression algorithm.
+ * <dt><code>zetak</code>
+ * <dd>if &zeta; codes are selected for residuals, the parameter <var>k</var>.
+ * <dt><code>windowsize</code>
+ * <dd>the window size.
+ * <dt><code>maxref</code>
+ * <dd>the maximum reference count.
+ * <dt><code>minintervallength</code>
+ * <dd>the minimum length of an interval.
+ * <dt><code>avgdist</code>
+ * <dd>the average distance of a reference.
+ * <dt><code>avgref</code>
+ * <dd>the average length of reference chains.
+ * <dt><code>bitsfor*</code>
+ * <dd>number of bits used by a specific compoenent of the algorithm (the sum is the number of bits used to store the graph).
+ * <dt><code>avgbitsfor*</code>
+ * <dd>number of bits used by a specific compoenent of the algorithm, divided by the number of nodes (the sum is the number of bits per node).
+ * <dt><code>*arcs</code>
+ * <dd>the number of arcs stored by each component of the algorithm (the sum is the number of arcs).
+ * <dt><code>*expstats</code>
+ * <dd>frequencies of the floor of the logarithm of successor gaps and residual gaps, separated by a comma; the statistics include the gap between each node
+ * and its first successor, after it has been passed through {@link Fast#int2nat(int)}, but discarding zeroes (which happen in
+ * very rare circumstance, and should be considered immaterial).
+ * <dt><code>*avg[log]gap</code>
+ * <dd>the average of the gaps (or of their logarithm) of successors and residuals: note that this data is computed from the exponential statistics above, and
+ * thus it is necessarily approximate.
+ * <dd>
+ * </dl>
+ *
+ * <H2>How The Graph File Is Loaded Into Memory</H2>
+ *
+ * <P>The natural way of using a graph file is to load it into a byte array and
+ * then index its bits using the suitable offset. This class will use a byte
+ * array for graphs smaller than {@link Integer#MAX_VALUE} bytes,
+ * and a {@link it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream}
+ * otherwise: in the latter case, expect a significant slowdown (as
+ * an {@link it.unimi.dsi.io.InputBitStream} can wrap directly
+ * a byte array).
+ *
+ * <P>Offsets are loaded using an {@link EliasFanoMonotoneLongBigList},
+ * which occupies exponentially less space than the graph itself (unless
+ * your graph is pathologically sparse). There is of course a cost involved in
+ * accessing the list with respect to accessing an array of longs.
+ *
+ * <p>Note that by default the {@link EliasFanoMonotoneLongBigList} instance is
+ * created from scratch using the file of offsets. This is a long and tedious
+ * process, in particular with large graphs. The main method of this class
+ * has an option that will generate such a list once for all and serialise it in a file with
+ * extension <code>.obl</code>. The list will be quickly deserialised
+ * if its modification date is later than that of the offset file.
+ *
+ * <H2>Not Loading the Graph File at All</H2>
+ *
+ * <P>For some applications (such as transposing a graph) it is not necessary to load the graph
+ * file in memory. Since this class is able to enumerate the links of a graph without using random
+ * access, it is possible not to load in memory any information at all, and obtain iterators that
+ * directly read from the graph file. To obtain this effect, you must call {@link #loadOffline(CharSequence)}.
+ *
+ * <H2>Memory&ndash;Mapping a Graph</H2>
+ *
+ * <p>Another interesting alternative is memory mapping. When using {@link BVGraph#loadMapped(CharSequence)},
+ * the graph will be mapped into memory, and the offsets loaded. The graph will provide random access and behave
+ * as if it was loaded into memory, but of course the access will be slower.
+ */
+
+public class BVGraph extends ImmutableGraph implements CompressionFlags {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(BVGraph.class);
+ /** The offset step parameter corresponding to sequential load. */
+ public static final int SEQUENTIAL = 0;
+ /** The offset step parameter corresponding to offline load. */
+ public static final int OFFLINE = -1;
+
+ /** The standard extension for the graph bit stream. */
+ public static final String GRAPH_EXTENSION = ".graph";
+ /** The standard extension for the graph-offsets bit stream. */
+ public static final String OFFSETS_EXTENSION = ".offsets";
+ /** The standard extension for the cached {@link LongBigList} containing the graph offsets. */
+ public static final String OFFSETS_BIG_LIST_EXTENSION = ".obl";
+ /** The standard extension for the stream of node outdegrees. */
+ public static final String OUTDEGREES_EXTENSION = ".outdegrees";
+ /** The buffer size we use for most operations. */
+ private static final int STD_BUFFER_SIZE = 1024 * 1024;
+
+ /** This number classifies the present graph format. When new features require introducing binary incompatibilities,
+ this number is bumped so to ensure that old classes do not try to read graphs they cannot understand. */
+ public final static int BVGRAPH_VERSION = 0;
+
+ /** The initial length of an array that will contain a successor list. */
+ protected static final int INITIAL_SUCCESSOR_LIST_LENGTH = 1024;
+
+ /** A special value for {@link #minIntervalLength} interpreted as meaning that the minimum interval length is infinity. */
+ public static final int NO_INTERVALS = 0;
+
+ /** The basename of the graph. This may be <code>null</code>, but trying to load the graph with an offset
+ * step of -1 will cause an exception. */
+ protected CharSequence basename;
+
+ /** The number of nodes of the graph. */
+ protected long n;
+
+ /** The number of arcs of the graph. */
+ protected long m;
+
+ /** When {@link #offsetType} is not -1, whether this graph is directly loaded into
+ * {@link #graphMemory}, or rather wrapped in a {@link it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream}
+ * specified by {@link #graphStream}. */
+ protected boolean isMemory;
+
+ /** When {@link #offsetType} is not -1, whether this graph is directly loaded into
+ * {@link #graphMemory}, or rather memory-mapped. */
+ protected boolean isMapped;
+
+ /** The byte array storing the compressed graph, if {@link #isMemory} is true and {@link #offsetType} is not -1.
+ *
+ * <P>This variable is loaded with a copy of the graph file, or with
+ * a rearrangement of the latter, depending on whether {@link #offsetType} is smaller than or equal to one. If
+ * {@link #offsetType} is -1, this variable is <code>null</code>, and node iterators are generated by opening
+ * streams directly on the graph file. */
+ protected byte graphMemory[];
+
+ /** The multi-byte array input stream storing the compressed graph, if {@link #isMemory} is false, {@link #isMapped} is false and {@link #offsetType} is not -1.
+ *
+ * <P>It is loaded with a copy of the graph file. If
+ * {@link #offsetType} is -1, this variable is <code>null</code>, and node iterators are generated by opening
+ * streams directly on the graph file. */
+ protected FastMultiByteArrayInputStream graphStream;
+
+ /** The memory-mapped input stream storing the compressed graph, if {@link #isMapped} is true.
+ *
+ * <P>It is loaded with a copy of the graph file. If
+ * {@link #offsetType} is -1, this variable is <code>null</code>, and node iterators are generated by opening
+ * streams directly on the graph file. */
+ protected ByteBufferInputStream mappedGraphStream;
+
+ /** This variable is <code>null</code> iff {@link #offsetType} is zero or less
+ * (implying that offsets have not been loaded). Otherwise, it is an
+ * Elias&ndash;Fano monotone list containing the pointers of
+ * the bit streams of one each {@link #offsetType} nodes. */
+ protected LongBigList offsets;
+
+ /** The offset type: 2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file. */
+ protected int offsetType;
+
+ /** If not {@link Long#MIN_VALUE}, the node whose degree is cached in {@link #cachedOutdegree}. */
+ protected long cachedNode = Long.MIN_VALUE;
+ /** If {@link #cachedNode} is not {@link Long#MIN_VALUE}, its cached outdegree. */
+ protected int cachedOutdegree;
+ /** If {@link #cachedNode} is not {@link Long#MIN_VALUE}, the position immediately after the coding of the outdegree of {@link #cachedNode}. */
+ protected long cachedPointer;
+
+ /** The maximum reference count. */
+ protected int maxRefCount = DEFAULT_MAX_REF_COUNT;
+
+ /** Default backward reference maximum length. */
+ public final static int DEFAULT_MAX_REF_COUNT = 3;
+
+ /** The window size. Zero means no references. */
+ protected int windowSize = DEFAULT_WINDOW_SIZE;
+
+ /** Default window size. */
+ public final static int DEFAULT_WINDOW_SIZE = 7;
+
+ /** The minimum interval length. */
+ protected int minIntervalLength = DEFAULT_MIN_INTERVAL_LENGTH;
+
+ /** Default minimum interval length. */
+ public final static int DEFAULT_MIN_INTERVAL_LENGTH = 4;
+
+ /** The value of <var>k</var> for &zeta;<sub><var>k</var></sub> coding (for residuals). */
+ protected int zetaK = DEFAULT_ZETA_K;
+
+ /** Default value of <var>k</var>. */
+ public final static int DEFAULT_ZETA_K = 3;
+
+ /** Flag: write outdegrees using &gamma; coding (default). */
+ public static final int OUTDEGREES_GAMMA = GAMMA;
+
+ /** Flag: write outdegrees using &delta; coding. */
+ public static final int OUTDEGREES_DELTA = DELTA;
+
+ /** Flag: write copy-block lists using &gamma; coding (default). */
+ public static final int BLOCKS_GAMMA = GAMMA << 4;
+
+ /** Flag: write copy-block lists using &delta; coding. */
+ public static final int BLOCKS_DELTA = DELTA << 4;
+
+ /** Flag: write residuals using &gamma; coding. */
+ public static final int RESIDUALS_GAMMA = GAMMA << 8;
+
+ /** Flag: write residuals using &zeta;<sub><var>k</var></sub> coding (default). */
+ public static final int RESIDUALS_ZETA = ZETA << 8;
+
+ /** Flag: write residuals using &delta; coding. */
+ public static final int RESIDUALS_DELTA = DELTA << 8;
+
+ /** Flag: write residuals using variable-length nibble coding. */
+ public static final int RESIDUALS_NIBBLE = NIBBLE << 8;
+
+ /** Flag: write residuals using Golomb coding. */
+ public static final int RESIDUALS_GOLOMB = GOLOMB << 8;
+
+ /** Flag: write references using &gamma; coding. */
+ public static final int REFERENCES_GAMMA = GAMMA << 12;
+
+ /** Flag: write references using &delta; coding. */
+ public static final int REFERENCES_DELTA = DELTA << 12;
+
+ /** Flag: write references using unary coding (default). */
+ public static final int REFERENCES_UNARY = UNARY << 12;
+
+ /** Flag: write block counts using &gamma; coding (default). */
+ public static final int BLOCK_COUNT_GAMMA = GAMMA << 16;
+
+ /** Flag: write block counts using &delta; coding. */
+ public static final int BLOCK_COUNT_DELTA = DELTA << 16;
+
+ /** Flag: write block counts using unary coding. */
+ public static final int BLOCK_COUNT_UNARY = UNARY << 16;
+
+ /** Flag: write offsets using &gamma; coding (default). */
+ public static final int OFFSETS_GAMMA = GAMMA << 20;
+
+ /** Flag: write offsets using &delta; coding. */
+ public static final int OFFSETS_DELTA = DELTA << 20;
+
+ /** The coding for outdegrees. By default, we use &gamma; coding. */
+ protected int outdegreeCoding = GAMMA;
+
+ /** The coding for copy-block lists. By default, we use &gamma; coding. */
+ protected int blockCoding = GAMMA;
+
+ /** The coding for residuals. By default, we use &zeta; coding. */
+ protected int residualCoding = ZETA;
+
+ /** The coding for references. By default, we use unary coding. */
+ protected int referenceCoding = UNARY;
+
+ /** The coding for block counts. By default, we use &gamma; coding. */
+ protected int blockCountCoding = GAMMA;
+
+ /** The coding for offsets. By default, we use &gamma; coding. */
+ protected int offsetCoding = GAMMA;
+
+ /** The compression flags used. */
+ private int flags = 0;
+
+ /** The number of arcs copied during a call to {@link #storeInternal(ImmutableGraph, CharSequence, ProgressLogger)}. */
+ private long copiedArcs;
+
+ /** The number of arcs that have been intervalised during a call to {@link #storeInternal(ImmutableGraph, CharSequence, ProgressLogger)}. */
+ private long intervalisedArcs;
+
+ /** The number of arcs that are represented explicitly. */
+ private long residualArcs;
+
+ private final static boolean STATS = false;
+ @SuppressWarnings("unused")
+ private final static boolean DEBUG = false;
+ private final static boolean ASSERTS = false;
+
+ private PrintWriter offsetStats, outdegreeStats, blockCountStats, blockStats, intervalCountStats, referenceStats, leftStats, lenStats, residualStats, residualCountStats;
+
+ @Override
+ public BVGraph copy() {
+ final BVGraph result = new BVGraph();
+ result.basename = basename;
+ result.n = n;
+ result.m = m;
+ result.isMemory = isMemory;
+ result.isMapped = isMapped;
+ result.graphMemory = graphMemory;
+ result.graphStream = graphStream != null ? new FastMultiByteArrayInputStream(graphStream) : null;
+ result.mappedGraphStream = mappedGraphStream != null ? mappedGraphStream.copy() : null;
+ result.offsets = offsets;
+ result.maxRefCount = maxRefCount;
+ result.windowSize = windowSize;
+ result.minIntervalLength = minIntervalLength;
+ result.offsetType = offsetType;
+ result.zetaK = zetaK;
+ result.outdegreeCoding = outdegreeCoding;
+ result.blockCoding = blockCoding;
+ result.residualCoding = residualCoding;
+ result.referenceCoding = referenceCoding;
+ result.blockCountCoding = blockCountCoding;
+ result.offsetCoding = offsetCoding;
+ result.flags = flags;
+ result.outdegreeIbs = offsetType <= 0 ? null : isMemory ? new InputBitStream(graphMemory): new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0);
+ return result;
+ }
+
+ protected BVGraph() {}
+
+ @Override
+ public long numNodes() {
+ return n;
+ }
+
+ @Override
+ public long numArcs() {
+ return m;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return offsets != null;
+ }
+
+ @Override
+ public CharSequence basename() {
+ return basename;
+ }
+
+ /** Returns the maximum reference count of this graph.
+ *
+ * @return the maximum reference count.
+ */
+ public int maxRefCount() {
+ return maxRefCount;
+ }
+
+ /** Returns the window size of this graph.
+ *
+ * @return the window size.
+ */
+ public int windowSize() {
+ return windowSize;
+ }
+
+ /* This family of protected methods is used throughout the class to read data
+ from the graph file following the codings indicated by the compression
+ flags. */
+
+ /** Reads an offset difference from the given stream.
+ *
+ * @param ibs an offset-file input bit stream.
+ * @return the next offset difference.
+ */
+ protected final long readOffset(final InputBitStream ibs) throws IOException {
+ switch(offsetCoding) {
+ case GAMMA: return ibs.readLongGamma();
+ case DELTA: return ibs.readLongDelta();
+ default: throw new UnsupportedOperationException("The required offset coding (" + offsetCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes an offset difference to the given stream.
+ *
+ * @param obs an offset-file output bit stream.
+ * @param x an offset difference to be stored in the stream.
+ * @return the number of bits written.
+ */
+ protected final int writeOffset(final OutputBitStream obs, final long x) throws IOException {
+ switch(offsetCoding) {
+ case GAMMA: return obs.writeLongGamma(x);
+ case DELTA: return obs.writeLongDelta(x);
+ default: throw new UnsupportedOperationException("The required offset coding (" + offsetCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads an outdegree from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next outdegree.
+ */
+ protected final int readOutdegree(final InputBitStream ibs) throws IOException {
+ switch(outdegreeCoding) {
+ case GAMMA: return ibs.readGamma();
+ case DELTA: return ibs.readDelta();
+ default: throw new UnsupportedOperationException("The required outdegree coding (" + outdegreeCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads an outdegree from the given stream at a given offset.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @param offset the offset at which the stream must be positioned.
+ * @return the next outdegree.
+ */
+ protected final int readOutdegree(final InputBitStream ibs, final long offset) throws IOException {
+ ibs.position(offset);
+ return readOutdegree(ibs);
+ }
+
+ /** Writes an outdegree to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param d an outdegree to be stored in the stream.
+ * @return the number of bits written.
+ */
+ protected final int writeOutdegree(final OutputBitStream obs, final int d) throws IOException {
+ switch(outdegreeCoding) {
+ case GAMMA: return obs.writeGamma(d);
+ case DELTA: return obs.writeDelta(d);
+ default: throw new UnsupportedOperationException("The required outdegree coding (" + outdegreeCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads a reference from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next reference.
+ */
+ protected final int readReference(final InputBitStream ibs) throws IOException {
+ final int ref;
+
+ switch(referenceCoding) {
+ case UNARY: ref = ibs.readUnary(); break;
+ case GAMMA: ref = ibs.readGamma(); break;
+ case DELTA: ref = ibs.readDelta(); break;
+ default: throw new UnsupportedOperationException("The required reference coding (" + referenceCoding + ") is not supported.");
+ }
+ if (ref > windowSize) throw new IllegalStateException("The required reference (" + ref + ") is incompatible with the window size (" + windowSize + ")");
+ return ref;
+ }
+
+ /** Writes a reference to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param ref the reference.
+ * @return the number of bits written.
+ */
+ protected final int writeReference(final OutputBitStream obs, final int ref) throws IOException {
+
+ if (ref > windowSize) throw new IllegalStateException("The required reference (" + ref + ") is incompatible with the window size (" + windowSize + ")");
+ switch(referenceCoding) {
+ case UNARY: return obs.writeUnary(ref);
+ case GAMMA: return obs.writeGamma(ref);
+ case DELTA: return obs.writeDelta(ref);
+ default: throw new UnsupportedOperationException("The required reference coding (" + referenceCoding + ") is not supported.");
+ }
+ }
+
+
+ /** Reads a block count from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next block count.
+ */
+ protected final int readBlockCount(final InputBitStream ibs) throws IOException {
+ switch(blockCountCoding) {
+ case UNARY: return ibs.readUnary();
+ case GAMMA: return ibs.readGamma();
+ case DELTA: return ibs.readDelta();
+ default: throw new UnsupportedOperationException("The required block count coding (" + blockCountCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a block count to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param count the block count.
+ * @return the number of written bits.
+ */
+ protected final int writeBlockCount(final OutputBitStream obs, final int count) throws IOException {
+ switch(blockCountCoding) {
+ case UNARY: return obs.writeUnary(count);
+ case GAMMA: return obs.writeGamma(count);
+ case DELTA: return obs.writeDelta(count);
+ default: throw new UnsupportedOperationException("The required block count coding (" + blockCountCoding + ") is not supported.");
+ }
+ }
+
+
+ /** Reads a block from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next block.
+ */
+ protected final int readBlock(final InputBitStream ibs) throws IOException {
+ switch(blockCoding) {
+ case UNARY: return ibs.readUnary();
+ case GAMMA: return ibs.readGamma();
+ case DELTA: return ibs.readDelta();
+ default: throw new UnsupportedOperationException("The required block coding (" + blockCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a block to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param block the block.
+ * @return the number of written bits.
+ */
+ protected final int writeBlock(final OutputBitStream obs, final int block) throws IOException {
+ switch(blockCoding) {
+ case UNARY: return obs.writeUnary(block);
+ case GAMMA: return obs.writeGamma(block);
+ case DELTA: return obs.writeDelta(block);
+ default: throw new UnsupportedOperationException("The required block coding (" + blockCoding + ") is not supported.");
+ }
+ }
+
+ /** Reads a residual from the given stream.
+ *
+ * @param ibs a graph-file input bit stream.
+ * @return the next residual.
+ */
+ protected final long readResidual(final InputBitStream ibs) throws IOException {
+ switch(residualCoding) {
+ case GAMMA: return ibs.readLongGamma();
+ case ZETA: return ibs.readLongZeta(zetaK);
+ case DELTA: return ibs.readLongDelta();
+ case GOLOMB: return ibs.readLongGolomb(zetaK);
+ case NIBBLE: return ibs.readLongNibble();
+ default: throw new UnsupportedOperationException("The required residuals coding (" + residualCoding + ") is not supported.");
+ }
+ }
+
+ /** Writes a residual to the given stream.
+ *
+ * @param obs a graph-file output bit stream.
+ * @param residual the residual.
+ * @return the number of written bits.
+ */
+ protected final long writeResidual(final OutputBitStream obs, final long residual) throws IOException {
+ switch(residualCoding) {
+ case GAMMA: return obs.writeLongGamma(residual);
+ case ZETA: return obs.writeLongZeta(residual, zetaK);
+ case DELTA: return obs.writeLongDelta(residual);
+ case GOLOMB: return obs.writeLongGolomb(residual, zetaK);
+ case NIBBLE: return obs.writeLongNibble(residual);
+ default: throw new UnsupportedOperationException("The required residuals coding (" + residualCoding + ") is not supported.");
+ }
+ }
+
+ /** A bit stream wrapping {@link #graphMemory}, or {@link #graphStream}, used <em>only</em> by {@link #outdegree(long)} and {@link #outdegreeInternal(long)}. */
+ private InputBitStream outdegreeIbs;
+
+ /* The code of the following two methods must be kept in sync. */
+
+ @Override
+ public long outdegree(final long x) throws IllegalStateException {
+ if (x == cachedNode) return cachedOutdegree;
+ if (x < 0 || x >= n) throw new IllegalArgumentException("Node index out of range: " + x);
+
+ /* Computing the outdegree is a most basic operation. Thus, it must be always
+ possible to compute the outdegree of a node independently of any other state
+ in a BVGraph. To this purpose, we have special-purpose input bit stream that
+ is used just to read outdegrees. */
+
+ try {
+ // Without offsets, we just give up.
+ if (offsetType <= 0) throw new IllegalStateException("You cannot compute the outdegree of a random node without offsets");
+ // We just position and read.
+ outdegreeIbs.position(offsets.getLong(cachedNode = x));
+ cachedOutdegree = readOutdegree(outdegreeIbs);
+ cachedPointer = outdegreeIbs.position();
+ return cachedOutdegree;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ private int outdegreeInternal(final long x) throws IOException {
+ if (x == cachedNode) return cachedOutdegree;
+ // We just position and read.
+ outdegreeIbs.position(offsets.getLong(cachedNode = x));
+ cachedOutdegree = readOutdegree(outdegreeIbs);
+ cachedPointer = outdegreeIbs.position();
+ return cachedOutdegree;
+ }
+
+
+ /** Returns an iterator over the successors of a given node.
+ *
+ * @param x a node.
+ * @return an iterator over the successors of the node.
+ */
+ @Override
+ public LazyLongIterator successors(final long x) {
+ // We just call successors(int, InputBitStream, int[][], int[], int[]) with
+ // a newly created input bit stream and null elsewhere.
+ if (x < 0 || x >= n) throw new IllegalArgumentException("Node index out of range: " + x);
+ if (offsetType <= 0) throw new UnsupportedOperationException("Random access to successor lists is not possible with sequential or offline graphs");
+ final InputBitStream ibs = isMemory ? new InputBitStream(graphMemory) : new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0);
+ return successors(x, ibs, null, null);
+ }
+
+ /** An iterator returning the offsets. */
+ private final static class OffsetsLongIterator implements LongIterator {
+ private final InputBitStream offsetIbs;
+ private final long n;
+ private long off;
+ private long i;
+ private final BVGraph g;
+
+ private OffsetsLongIterator(final BVGraph g, final InputBitStream offsetIbs) {
+ this.offsetIbs = offsetIbs;
+ this.g = g;
+ this.n = g.numNodes();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return i <= n;
+ }
+
+ @Override
+ public long nextLong() {
+ i++;
+ try {
+ return off = g.readOffset(offsetIbs) + off;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ }
+
+
+ /** An iterator returning the residuals of a node. */
+ private final static class ResidualLongIterator extends AbstractLazyLongIterator {
+ /** The graph associated to this iterator. */
+ private final BVGraph g;
+ /** The input bit stream from which residuals will be read. */
+ private final InputBitStream ibs;
+ /** The last residual returned. */
+ private long next;
+ /** The number of remaining residuals. */
+ private int remaining;
+
+ private ResidualLongIterator(final BVGraph g, final InputBitStream ibs, final int residualCount, final long x) {
+ this.g = g;
+ this.remaining = residualCount;
+ this.ibs = ibs;
+ try {
+ this.next = x + Fast.nat2int(g.readResidual(ibs));
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public long nextLong() {
+ if (remaining == 0) return -1;
+ try {
+ final long result = next;
+ if (--remaining != 0) next += g.readResidual(ibs) + 1;
+ return result;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public long skip(long n) {
+ if (n >= remaining) {
+ n = remaining;
+ remaining = 0;
+ return n;
+ }
+ try {
+ for(long i = n; i-- != 0;) next += g.readResidual(ibs) + 1;
+ remaining -= n;
+ return n;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ }
+
+
+
+ /** Given an {@link InputBitStream} wrapping a graph file, returns an iterator over the
+ * successors of a given node <code>x</code>.
+ *
+ * <P>This method can be used in two different ways:
+ * <OL><LI>by providing a node and an input bit stream wrapping a graph file, it is possible
+ * to access the successor list of the node (provided that offsets have been loaded);
+ * <LI>by providing additional data, which essentially are used to keep some state
+ * about the graph, it is possible to perform an efficient sequential visit of all
+ * successor lists (even when no offsets were loaded).
+ * </OL>
+ *
+ * <P>This method may modify the offset and the outdegree caches if <code>window</code> is <code>null</code>.
+ *
+ * @param x a node.
+ * @param ibs an input bit stream wrapping a graph file. After this method returns, the state of <code>ibs</code> is undefined:
+ * however, after the iterator returned is exhausted, <code>ibs</code> will positioned just after the successor list of <code>x</code>.
+ * @param window either <code>null</code>, or a double array with the following meaning: <code>window[(x-i) mod windowSize]</code>
+ * contains, for all <code>i</code> between 1 (inclusive) and {@link #windowSize} (exclusive), the list of successors
+ * of node <code>x</code>&minus;<code>i</code>. If <code>window</code> is not <code>null</code> then <code>ibs</code>
+ * must be positioned before the successor list of <code>x</code>. This parameter will not be modified.
+ * @param outd if <code>window</code> is not <code>null</code>, this is an array with as many elements
+ * as {@link #windowSize}; <code>outd[(x-i) mod windowSize]</code> contains the outdegree of node <code>x</code>
+ * &minus;<code>i</code> for <code>i</code> greater than 0; at the end, this will be true also for <code>i</code> equal to 0.
+ * @return an iterator over the successors of <code>x</code>.
+ * @throws IllegalStateException if <code>window</code> is <code>null</code> and {@link #offsetType} is 0.
+ *
+ */
+ protected LazyLongIterator successors(final long x, final InputBitStream ibs, final long window[][], final int outd[]) throws IllegalStateException {
+ final int ref, refIndex;
+ int i, extraCount, blockCount = 0;
+ long[] block = null, left = null, len = null;
+
+ if (x < 0 || x >= n) throw new IllegalArgumentException("Node index out of range:" + x);
+
+ try {
+ final int d;
+ final int cyclicBufferSize = windowSize + 1;
+
+ if (window == null) {
+ d = outdegreeInternal(x);
+ ibs.position(cachedPointer);
+ }
+ else d = outd[(int)(x % cyclicBufferSize)] = readOutdegree(ibs);
+
+ if (d == 0) return LazyLongIterators.EMPTY_ITERATOR;
+
+ // We read the reference only if the actual window size is larger than one (i.e., the one specified by the user is larger than 0).
+ if (windowSize > 0) ref = readReference(ibs);
+ else ref = -1;
+
+ refIndex = (int)((x - ref + cyclicBufferSize) % cyclicBufferSize); // The index in window[] of the node we are referring to (it makes sense only if ref>0).
+
+ if (ref > 0) { // This catches both no references at all and no reference specifically for this node.
+ if ((blockCount = readBlockCount(ibs)) != 0) block = new long[blockCount];
+
+ int copied = 0, total = 0; // The number of successors copied, and the total number of successors specified in some copy block.
+ for(i = 0; i < blockCount; i++) {
+ block[i] = readBlock(ibs) + (i == 0 ? 0 : 1);
+ total += block[i];
+ if (i % 2 == 0) copied += block[i];
+ }
+ // If the block count is even, we must compute the number of successors copied implicitly.
+ if (blockCount % 2 == 0) copied += (window != null ? outd[refIndex] : outdegreeInternal(x - ref)) - total;
+ extraCount = d - copied;
+ }
+ else extraCount = d;
+
+ int intervalCount = 0; // Number of intervals
+
+ if (extraCount > 0) {
+
+ // Prepare to read intervals, if any
+ if (minIntervalLength != NO_INTERVALS && (intervalCount = ibs.readGamma()) != 0) {
+
+ long prev = 0; // Holds the last integer in the last interval.
+ left = new long[intervalCount];
+ len = new long[intervalCount];
+
+ // Now we read intervals
+ left[0] = prev = Fast.nat2int(ibs.readLongGamma()) + x;
+ len[0] = ibs.readLongGamma() + minIntervalLength;
+
+ prev += len[0];
+ extraCount -= len[0];
+
+ for (i = 1; i < intervalCount; i++) {
+ left[i] = prev = ibs.readLongGamma() + prev + 1;
+ len[i] = ibs.readLongGamma() + minIntervalLength;
+ prev += len[i];
+ extraCount -= len[i];
+ }
+ }
+ }
+
+ final int residualCount = extraCount; // Just to be able to use an anonymous class.
+
+ final LazyLongIterator residualIterator = residualCount == 0 ? null : new ResidualLongIterator(this, ibs, residualCount, x);
+
+ // The extra part is made by the contribution of intervals, if any, and by the residuals iterator.
+ final LazyLongIterator extraIterator = intervalCount == 0
+ ? residualIterator
+ : (residualCount == 0
+ ? (LazyLongIterator)new LongIntervalSequenceIterator(left, len)
+ : (LazyLongIterator)new MergedLongIterator(new LongIntervalSequenceIterator(left, len), residualIterator)
+ );
+
+ final LazyLongIterator blockIterator = ref <= 0
+ ? null
+ : new MaskedLongIterator(
+ // ...block for masking copy and...
+ block,
+ // ...the reference list (either computed recursively or stored in window)...
+ window != null
+ ? LazyLongIterators.wrap(window[refIndex], outd[refIndex])
+ :
+ // This is the recursive lazy part of the construction.
+ successors(x - ref, isMemory ? new InputBitStream(graphMemory) : new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0), null, null)
+ );
+
+ if (ref <= 0) return extraIterator;
+ else return extraIterator == null
+ ? blockIterator
+ : (LazyLongIterator)new MergedLongIterator(blockIterator, extraIterator, d);
+
+ }
+ catch (IOException e) {
+ LOGGER.error("Exception while accessing node " + x, e);
+ throw new RuntimeException(e);
+ }
+ }
+
+
+ private class BVGraphNodeIterator extends NodeIterator {
+ @SuppressWarnings("hiding")
+ final private long n = numNodes();
+ /** Our bit stream. */
+ final InputBitStream ibs;
+ /** We keep the size of the cyclic buffer (the window size + 1) in a local variable. */
+ final private int cyclicBufferSize = windowSize + 1;
+ /** At any time, window will be ready to be passed to {@link BVGraph#successors(int, InputBitStream, int[][], int[], int[])} */
+ final private long window[][] = new long[cyclicBufferSize][INITIAL_SUCCESSOR_LIST_LENGTH];
+ /** At any time, outd will be ready to be passed to {@link BVGraph#successors(int, InputBitStream, int[][], int[], int[])} */
+ final private int outd[] = new int[cyclicBufferSize];
+ /** The index of the node from which we started iterating. */
+ final private long from;
+ /** The index of the node just before the next one. */
+ private long curr;
+
+ public BVGraphNodeIterator(final InputBitStream ibs, final long from) throws IOException {
+ if (from < 0 || from > n) throw new IllegalArgumentException("Node index out of range: " + from);
+ this.from = from;
+ this.ibs = ibs;
+ if (from != 0) {
+ if (offsetType <= 0) throw new IllegalStateException("You cannot iterate from a chosen node without offsets");
+
+ int pos;
+ for(long i = 1; i < Math.min(from + 1, cyclicBufferSize); i++) {
+ pos = (int)((from - i + cyclicBufferSize) % cyclicBufferSize);
+ outd[pos] = BVGraph.this.outdegreeInternal(from - i);
+ LongBigArrays.copyFromBig(BVGraph.this.successorBigArray(from - i), 0, window[pos] = LongArrays.grow(window[pos], outd[pos], 0), 0, outd[pos]);
+ }
+ ibs.position(offsets.getLong(from)); // We must fix the bit stream position so that we are *before* the outdegree.
+ }
+ curr = from - 1;
+ }
+
+ /** At each call, we build the successor iterator (making a call to {@link BVGraph#successors(long, InputBitStream, long[][], int[])},
+ * and we completely iterate over it, filling the appropriate entry in <code>window</code>. */
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+
+ final int currIndex = (int)(++curr % cyclicBufferSize);
+ final LazyLongIterator i = BVGraph.this.successors(curr, ibs, window, outd);
+
+ final int d = outd[currIndex];
+ if (window[currIndex].length < d) window[currIndex] = new long[d];
+ final long[] w = window[currIndex];
+ for(int j = 0; j < d; j++) w[j] = i.nextLong();
+
+ return curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return (curr < n - 1);
+ }
+
+ @Override
+ public LazyLongIterator successors() {
+ if (curr == from - 1) throw new IllegalStateException();
+
+ final int currIndex = (int)(curr % cyclicBufferSize);
+ return LazyLongIterators.wrap(window[currIndex], outd[currIndex]);
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (curr == from - 1) throw new IllegalStateException();
+ final int index = (int)(curr % cyclicBufferSize);
+ /* A simple heuristic to avoid that nodes of very high degree force the creation of such large arrays
+ * that LongBigArrays.wrap() is forced to do a copy at each invocation. */
+ if (window[index].length > BigArrays.SEGMENT_SIZE && outd[index] <= BigArrays.SEGMENT_SIZE) {
+ final long[] a = new long[BigArrays.SEGMENT_SIZE];
+ System.arraycopy(window[index], 0, a, 0, outd[index]);
+ window[index] = a;
+ }
+ return LongBigArrays.wrap(window[index]);
+ }
+
+ @Override
+ public long outdegree() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return outd[(int)(curr % cyclicBufferSize)];
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ ibs.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+ };
+
+
+ /** This method returns a node iterator for scanning the graph sequentially, starting from the given node.
+ * It keeps track of a sliding window of {@link #windowSize()} previous successor lists
+ * to speed up the iteration of graphs with significant referentiation.
+ *
+ * @param from the node from which the iterator will iterate.
+ * @return a {@link NodeIterator} for accessing nodes and successors sequentially.
+ */
+
+ @Override
+ public NodeIterator nodeIterator(final long from) {
+ try {
+ return offsetType == -1
+ ? new BVGraphNodeIterator(new InputBitStream(new FileInputStream(basename + GRAPH_EXTENSION), STD_BUFFER_SIZE), from)
+ : new BVGraphNodeIterator(isMemory ? new InputBitStream(graphMemory) : new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0), from);
+ } catch (FileNotFoundException e) {
+ throw new IllegalStateException("The graph file \"" + basename + GRAPH_EXTENSION + "\" cannot be found");
+ } catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+
+ /* The following private methods handle the flag mask. They are the only methods which replicate
+ * the shifting logic specified in the flag-mask definition.
+ */
+
+ /** Sets the {@link #flags} attribute to the given value, and updates appropriately the
+ * individual coding attributes (<code>g&hellip;Coding</code>).
+ *
+ * <P>If a certain bit-slot within <code>flags</code> is not specified (i.e., 0) the corresponding
+ * coding variable is left unchanged, making the assumption that it is the default value (this condition
+ * is anyway not checked for).
+ *
+ * @param flags a mask of flags as specified by the constants of this class.
+ */
+ private void setFlags(final int flags) {
+ this.flags = flags;
+ if ((flags & 0xF) != 0) outdegreeCoding = flags & 0xF;
+ if (((flags >>> 4) & 0xF) != 0) blockCoding = (flags >>> 4) & 0xF;
+ if (((flags >>> 8) & 0xF) != 0) residualCoding = (flags >>> 8) & 0xF;
+ if (((flags >>> 12) & 0xF) != 0) referenceCoding = (flags >>> 12) & 0xF;
+ if (((flags >>> 16) & 0xF) != 0) blockCountCoding = (flags >>> 16) & 0xF;
+ if (((flags >>> 20) & 0xF) != 0) offsetCoding = (flags >>> 20) & 0xF;
+ }
+
+ /** Produces a string representing the values coded in the given flag mask.
+ *
+ * @param flags a flag mask.
+ * @return a string representing the flag mask.
+ */
+ private static MutableString flags2String(final int flags) {
+ MutableString s = new MutableString();
+
+ if ((flags & 0xF) != 0) s.append(" | ").append("OUTDEGREES_").append(CompressionFlags.CODING_NAME[flags & 0xF]);
+ if (((flags >>> 4) & 0xF) != 0) s.append(" | ").append("BLOCKS_").append(CompressionFlags.CODING_NAME[(flags >>> 4) & 0xF]);
+ if (((flags >>> 8) & 0xF) != 0) s.append(" | ").append("RESIDUALS_").append(CompressionFlags.CODING_NAME[(flags >>> 8) & 0xF]);
+ if (((flags >>> 12) & 0xF) != 0) s.append(" | ").append("REFERENCES_").append(CompressionFlags.CODING_NAME[(flags >>> 12) & 0xF]);
+ if (((flags >>> 16) & 0xF) != 0) s.append(" | ").append("BLOCK_COUNT_").append(CompressionFlags.CODING_NAME[(flags >>> 16) & 0xF]);
+ if (((flags >>> 20) & 0xF) != 0) s.append(" | ").append("OFFSETS_").append(CompressionFlags.CODING_NAME[(flags >>> 20) & 0xF]);
+
+ if (s.length() != 0) s.delete(0, 3);
+ return s;
+ }
+
+ /** Produces a flag mask corresponding to a given string.
+ *
+ * @param flagString a flag string.
+ * @return the flag mask.
+ * @throws IOException if the flag string is malformed.
+ */
+ private static int string2Flags(final String flagString) throws IOException {
+ int flags = 0;
+
+ if (flagString != null && flagString.length() != 0) {
+ String f[] = flagString.split("\\|");
+ for(int i = 0; i < f.length; i++) {
+ try {
+ flags |= BVGraph.class.getField(f[i].trim()).getInt(BVGraph.class);
+ }
+ catch (Exception notFound) {
+ throw new IOException("Compression flag " + f[i] + " unknown.");
+ }
+ }
+ }
+ return flags;
+ }
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory.
+ *
+ * @param basename the basename of the graph.
+ * @param offsetType the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file).
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(CharSequence basename, int offsetType, ProgressLogger pl) throws IOException {
+ return new BVGraph().loadInternal(basename, offsetType, pl);
+ }
+
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with no progress logger.
+ *
+ * @param basename the basename of the graph.
+ * @param offsetType the desired offset type (2 is memory mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file).
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(final CharSequence basename, final int offsetType) throws IOException {
+ return BVGraph.load(basename, offsetType, null);
+ }
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with no progress logger and
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(CharSequence basename) throws IOException {
+ return BVGraph.load(basename, 1);
+ }
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static BVGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, 1, pl);
+ }
+
+ /** Creates a new {@link BVGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the offsets, or <code>null</code>.
+ * @return an {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+ public static BVGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, 2, pl);
+ }
+
+ /** Creates a new {@link BVGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+ public static BVGraph loadMapped(CharSequence basename) throws IOException {
+ return BVGraph.loadMapped(basename, null);
+ }
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence, ProgressLogger)} or {@link #loadMapped(CharSequence, ProgressLogger)} instead.
+ */
+ @Deprecated
+ public static BVGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, 0, pl);
+ }
+
+
+ /** Creates a new {@link BVGraph} by loading a compressed graph file from disk to memory, with no progress logger and
+ * without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence)} or {@link #loadMapped(CharSequence)} instead.
+ */
+ @Deprecated
+ public static BVGraph loadSequential(CharSequence basename) throws IOException {
+ return BVGraph.loadSequential(basename, null);
+ }
+
+
+
+ /** Creates a new {@link BVGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata.
+ */
+ public static BVGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return BVGraph.load(basename, -1, pl);
+ }
+
+
+
+ /** Creates a new {@link BVGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link BVGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata.
+ */
+ public static BVGraph loadOffline(CharSequence basename) throws IOException {
+ return BVGraph.loadOffline(basename, (ProgressLogger)null);
+ }
+
+
+ /** Loads a compressed graph file from disk into this graph. Note that this method should
+ * be called <em>only</em> on a newly created graph.
+ *
+ * @param basename the basename of the graph.
+ * @param offsetType the desired offset type (2 is memory-mapping, 1 is normal random-access loading, 0 means that we do not want to load offsets at all, -1 that
+ * the we do not want even load the graph file).
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return this graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ protected BVGraph loadInternal(final CharSequence basename, int offsetType, final ProgressLogger pl) throws IOException {
+
+ // First of all, we read the property file to get the relevant data.
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ this.offsetType = offsetType;
+ this.basename = new MutableString(basename);
+
+ // Soft check--we accept standard stuff, too.
+ if (! this.getClass().getName().equals(properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY).replace("it.unimi.dsi.webgraph", "it.unimi.dsi.big.webgraph")))
+ throw new IOException("This class (" + this.getClass().getName() + ") cannot load a graph stored using class \"" + properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY) + "\"");
+
+ // We parse the properties and perform some consistency check and assignments.
+ setFlags(string2Flags(properties.getProperty("compressionflags")));
+ if (properties.getProperty("version") == null) throw new IOException("Missing format version information");
+ else if (Integer.parseInt(properties.getProperty("version")) > BVGRAPH_VERSION) throw new IOException("This graph uses format " + properties.getProperty("version") + ", but this class can understand only graphs up to format " + BVGRAPH_VERSION);;
+ n = Long.parseLong(properties.getProperty("nodes"));
+ m = Long.parseLong(properties.getProperty("arcs"));
+ windowSize = Integer.parseInt(properties.getProperty("windowsize"));
+ maxRefCount = Integer.parseInt(properties.getProperty("maxrefcount"));
+ minIntervalLength = Integer.parseInt(properties.getProperty("minintervallength"));
+ if (properties.getProperty("zetak") != null) zetaK = Integer.parseInt(properties.getProperty("zetak"));
+
+ if (offsetType < -1 || offsetType > 2) throw new IllegalArgumentException("Illegal offset type " + offsetType);
+ final InputBitStream offsetIbs = offsetType > 0 ? new InputBitStream(new FileInputStream(basename + OFFSETS_EXTENSION), STD_BUFFER_SIZE) : null;
+
+ if (offsetType >= 0) {
+ final FileInputStream fis = new FileInputStream(basename + GRAPH_EXTENSION);
+
+ if (offsetType == 2) {
+ mappedGraphStream = ByteBufferInputStream.map(fis.getChannel(), MapMode.READ_ONLY);
+ isMapped = true;
+ }
+ else {
+ // read the whole graph into memory
+ if (pl != null) {
+ pl.itemsName = "bytes";
+ pl.start("Loading graph...");
+ }
+
+ if (fis.getChannel().size() <= Integer.MAX_VALUE) {
+ graphMemory = new byte[(int) fis.getChannel().size()];
+ BinIO.loadBytes(fis, graphMemory);
+ fis.close();
+ isMemory = true;
+ }
+ else graphStream = new FastMultiByteArrayInputStream(fis, fis.getChannel().size());
+
+ if (pl != null) {
+ pl.count = isMemory ? graphMemory.length : graphStream.length;
+ pl.done();
+ }
+ }
+ }
+
+ if (offsetType == 1 || offsetType == 2) {
+ // read offsets, if required
+
+ if (pl != null) {
+ pl.itemsName = "deltas";
+ pl.start("Loading offsets...");
+ }
+ // We try to load a cached big list.
+ final File offsetsBigListFile = new File(basename + OFFSETS_BIG_LIST_EXTENSION);
+
+ if (offsetsBigListFile.exists()) {
+ if (new File(basename + OFFSETS_EXTENSION).lastModified() > offsetsBigListFile.lastModified()) LOGGER.warn("A cached long big list of offsets was found, but the corresponding offsets file has a later modification time");
+ else try {
+ offsets = (LongBigList)BinIO.loadObject(offsetsBigListFile);
+ }
+ catch (ClassNotFoundException e) {
+ LOGGER.warn("A cached long big list of offsets was found, but its class is unknown", e);
+ }
+ }
+ if (offsets == null) offsets = new EliasFanoMonotoneLongBigList(n + 1, (isMapped ? mappedGraphStream.length() : isMemory ? graphMemory.length : graphStream.length) * Byte.SIZE + 1, new OffsetsLongIterator(this, offsetIbs));
+
+ if (pl != null) {
+ pl.count = n + 1;
+ pl.done();
+ if (offsets instanceof EliasFanoMonotoneLongBigList) pl.logger().info("Pointer bits per node: " + Util.format(((EliasFanoMonotoneLongBigList)offsets).numBits() / (n + 1.0)));
+ }
+ }
+
+ if (offsetIbs != null) offsetIbs.close();
+
+ // We finally create the outdegreeIbs and, if needed, the two caches
+ if (offsetType >= 0) outdegreeIbs = isMemory ? new InputBitStream(graphMemory): new InputBitStream(isMapped ? mappedGraphStream.copy() : new FastMultiByteArrayInputStream(graphStream), 0);
+
+ return this;
+ }
+
+
+ /** This method tries to express an increasing sequence of natural numbers <code>x</code> as a union of an increasing
+ * sequence of intervals and an increasing sequence of residual elements. More precisely, this intervalization works
+ * as follows: first, one looks at <code>x</code> as a sequence of intervals (i.e., maximal sequences of consecutive
+ * elements); those intervals whose length is &ge; <code>minInterval</code> are stored in the lists <code>left</code>
+ * (the list of left extremes) and <code>len</code> (the list of lengths; the length of an integer interval is the
+ * number of integers in that interval). The remaining integers, called <em>residuals</em> are stored in the
+ * <code>residual</code> list.
+ *
+ * <P>Note that the previous content of <code>left</code>, <code>len</code> and <code>residual</code> is lost.
+ *
+ * @param x the list to be intervalized (an increasing list of natural numbers).
+ * @param minInterval the least length that a maximal sequence of consecutive elements must have in order for it to
+ * be considered as an interval.
+ * @param left the resulting list of left extremes of the intervals.
+ * @param len the resulting list of interval lengths.
+ * @param residuals the resulting list of residuals.
+ * @return the number of intervals.
+ */
+ protected static int intervalize(final LongArrayList x, final int minInterval, final LongArrayList left, final LongArrayList len, final LongArrayList residuals) {
+ int nInterval = 0;
+ int vl = x.size();
+ long v[] = x.elements();
+ int i, j;
+
+ left.clear(); len.clear(); residuals.clear();
+ for(i = 0; i < vl; i++) {
+ j = 0;
+ if (i < vl - 1 && v[i] + 1 == v[i + 1]) {
+ do j++; while(i + j < vl - 1 && v[i + j] + 1 == v[i + j + 1]);
+ j++;
+ // Now j is the number of integers in the interval.
+ if (j >= minInterval) {
+ left.add(v[i]);
+ len.add(j);
+ nInterval++;
+ i += j - 1;
+ }
+ }
+ if (j < minInterval) residuals.add(v[i]);
+ }
+ return nInterval;
+ }
+
+
+
+
+ /** Scratch variables used by the {@link #diffComp(OutputBitStream, int, int, int[], int, int[], int, boolean)} method. */
+ private final LongArrayList extras = new LongArrayList(), blocks = new LongArrayList(), residuals = new LongArrayList(),
+ left = new LongArrayList(), len = new LongArrayList();
+
+ /** Compresses differentially the given list. This method is given a node (with index <code>currNode</code>) called the
+ * current node, with its successor list (contained in the array <code>currList[0..currLen-1]</code>), and another node
+ * (with index <code>currNode</code>&minus;<code>ref</code>), called the reference node, with its successor list (contained in the array
+ * <code>refList[0..refLen-1]</code>). This method produces, onto the given output bit stream, the compressed successor
+ * list of the current node using the reference node given (except for the outdegree); the number of bits written is returned.
+ *
+ * Note that <code>ref</code> may be zero, in which case no differential compression is made.
+ *
+ * @param obs an output bit stream where the compressed data will be stored.
+ * @param currNode the index of the node this list of outlinks refers to.
+ * @param ref the distance from the reference list.
+ * @param refList the reference list.
+ * @param refLen the length of the reference list.
+ * @param currList the current list.
+ * @param currLen the current list length.
+ * @param forReal if true, we are really writing data (i.e., <code>obs</code> is not just a bit count stream).
+ * @return the number of bits written.
+ */
+ private long diffComp(final OutputBitStream obs, final long currNode, final int ref, final long refList[], int refLen, final long currList[], final int currLen, final boolean forReal) throws IOException {
+ // Bits already written onto the output bit stream
+ final long writtenBitsAtStart = obs.writtenBits();
+
+ // We build the list of blocks copied and skipped (alternatively) from the previous list.
+ int i, j = 0, k = 0, currBlockLen = 0, t;
+ long prev = 0;
+ boolean copying = true;
+
+ if (ref == 0) refLen = 0; // This guarantees that we will not try to differentially compress when ref == 0.
+
+ extras.clear();
+ blocks.clear();
+
+ // j is the index of the next successor of the current node we must examine
+ // k is the index of the next successor of the reference node we must examine
+ // copying is true iff we are producing a copy block (instead of an ignore block)
+ // currBlockLen is the number of entries (in the reference list) we have already copied/ignored (in the current block)
+ while(j < currLen && k < refLen) {
+ if (copying) { // First case: we are currectly copying entries from the reference list
+ if (currList[j] > refList[k]) {
+ /* If while copying we trespass the current element of the reference list,
+ we must stop copying. */
+ blocks.add(currBlockLen);
+ copying = false;
+ currBlockLen = 0;
+ }
+ else if (currList[j] < refList[k]) {
+ /* If while copying we find a non-matching element of the reference list which
+ is larger than us, we can just add the current element to the extra list
+ and move on. j gets increased. */
+ extras.add(currList[j++]);
+ }
+ else { // currList[j] == refList[k]
+ /* If the current elements of the two lists are equal, we just increase the block length.
+ both j and k get increased. */
+ j++;
+ k++;
+ currBlockLen++;
+ if (forReal) copiedArcs++;
+ }
+ }
+ else { // Second case: we are currently skipping entries from the reference list
+ if (currList[j] < refList[k]) {
+ /* If we did not trespass the current element of the reference list, we just
+ add the current element to the extra list and move on. j gets increased. */
+ extras.add(currList[j++]);
+ }
+ else if (currList[j] > refList[k]) {
+ /* If we trespassed the currented element of the reference list, we
+ increase the block length. k gets increased. */
+ k++;
+ currBlockLen++;
+ }
+ else { // currList[j] == refList[k]
+ /* If we found a match we flush the current block and start a new copying phase. */
+ blocks.add(currBlockLen);
+ copying = true;
+ currBlockLen = 0;
+ }
+ }
+ }
+
+ /* We do not record the last block. The only case when we have to enqueue the last block's length
+ * is when we were copying and we did not copy up to the end of the reference list.
+ */
+ if (copying && k < refLen) blocks.add(currBlockLen);
+
+ // If there are still missing elements, we add them to the extra list.
+ while(j < currLen) extras.add(currList[j++]);
+
+ // We store locally the resulting arrays for faster access.
+ final long block[] = blocks.elements(), blockCount = blocks.size(), extraCount = extras.size();
+
+ // If we have a nontrivial reference window we write the reference to the reference list.
+ if (windowSize > 0) {
+ t = writeReference(obs, ref);
+ if (forReal) bitsForReferences += t;
+ }
+
+ if (STATS) if (forReal) referenceStats.println(ref);
+
+ // Then, if the reference is not void we write the length of the copy list.
+ if (ref != 0) {
+ t = writeBlockCount(obs, (int)blockCount);
+ if (forReal) bitsForBlocks += t;
+
+ if (STATS) if (forReal) blockCountStats.println(blockCount);
+
+ // Then, we write the copy list; all lengths except the first one are decremented.
+ if (blockCount > 0) {
+ t = writeBlock(obs, (int)block[0]);
+ if (forReal) bitsForBlocks += t;
+ for(i = 1; i < blockCount; i++) {
+ t = writeBlock(obs, (int)(block[i] - 1));
+ if (forReal) bitsForBlocks += t;
+ }
+ if (STATS) if (forReal) {
+ blockStats.println(block[0]);
+ for(i = 1; i < blockCount; i++) blockStats.println(block[i] - 1);
+ }
+ }
+ }
+
+ // Finally, we write the extra list.
+ if (extraCount > 0) {
+
+ long[] residual;
+ final int residualCount;
+
+ if (minIntervalLength != NO_INTERVALS) {
+ // If we are to produce intervals, we first compute them.
+ final int intervalCount = intervalize(extras, minIntervalLength, left, len, residuals);
+
+ // We write the number of intervals.
+ t = obs.writeGamma(intervalCount);
+ if (forReal) bitsForIntervals += t;
+
+ if (STATS) if (forReal) intervalCountStats.println(intervalCount);
+
+ long currIntLen;
+
+ // We write out the intervals.
+ for(i = 0; i < intervalCount; i++) {
+ if (i == 0) t = obs.writeLongGamma(Fast.int2nat((prev = left.getLong(i)) - currNode));
+ else t = obs.writeLongGamma(left.getLong(i) - prev - 1);
+ if (forReal) bitsForIntervals += t;
+ currIntLen = len.getLong(i);
+ prev = left.getLong(i) + currIntLen;
+ if (forReal) intervalisedArcs += currIntLen;
+ t = obs.writeLongGamma(currIntLen - minIntervalLength);
+ if (forReal) bitsForIntervals += t;
+ }
+
+ if (STATS) if (forReal) for(i = 0; i < intervalCount; i++) {
+ if (i == 0) leftStats.println(Fast.int2nat((prev = left.getLong(i)) - currNode));
+ else leftStats.println(left.getLong(i) - prev - 1);
+ prev = left.getLong(i) + len.getLong(i);
+ lenStats.println(len.getLong(i) - minIntervalLength);
+ }
+
+
+ residual = residuals.elements();
+ residualCount = residuals.size();
+ }
+ else {
+ residual = extras.elements();
+ residualCount = extras.size();
+ }
+
+ if (STATS) if (forReal) residualCountStats.println(residualCount);
+
+ // Now we write out the residuals, if any
+ if (residualCount != 0) {
+ if (forReal) {
+ residualArcs += residualCount;
+ updateBins(currNode, residual, residualCount, residualGapStats);
+ }
+ long u = writeResidual(obs, Fast.int2nat((prev = residual[0]) - currNode));
+ if (forReal) bitsForResiduals += u;
+ for(i = 1; i < residualCount; i++) {
+ if (residual[i] == prev) throw new IllegalArgumentException("Repeated successor " + prev + " in successor list of node " + currNode);
+ u = writeResidual(obs, residual[i] - prev - 1);
+ if (forReal) bitsForResiduals += u;
+ prev = residual[i];
+ }
+
+ if (STATS) if (forReal) {
+ residualStats.println(Fast.int2nat((prev = residual[0]) - currNode));
+ for(i = 1; i < residualCount; i++) {
+ residualStats.println(residual[i] - prev - 1);
+ prev = residual[i];
+ }
+ }
+ }
+
+ }
+
+ return obs.writtenBits() - writtenBitsAtStart;
+ }
+
+ /** Writes the given graph using a given base name.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param windowSize the window size (-1 for the default value).
+ * @param maxRefCount the maximum reference count (-1 for the default value).
+ * @param minIntervalLength the minimum interval length (-1 for the default value, {@link #NO_INTERVALS} to disable).
+ * @param zetaK the parameter used for residual &zeta;-coding, if used (-1 for the default value).
+ * @param flags the flag mask.
+ * @param pl a progress logger to log the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength,
+ int zetaK, int flags, ProgressLogger pl) throws IOException {
+ BVGraph g = new BVGraph();
+ if (windowSize != -1) g.windowSize = windowSize;
+ if (maxRefCount != -1) g.maxRefCount = maxRefCount;
+ if (minIntervalLength != -1) g.minIntervalLength = minIntervalLength;
+ if (zetaK != -1) g.zetaK = zetaK;
+ g.setFlags(flags);
+ g.storeInternal(graph, basename, pl);
+ }
+
+ /** Writes the given graph using a given base name, without any progress logger.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param windowSize the window size (-1 for the default value).
+ * @param maxRefCount the maximum reference count (-1 for the default value).
+ * @param minIntervalLength the minimum interval length (-1 for the default value, {@link #NO_INTERVALS} to disable).
+ * @param zetaK the parameter used for residual &zeta;-coding, if used (-1 for the default value).
+ * @param flags the flag mask.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, int windowSize, int maxRefCount, int minIntervalLength,
+ int zetaK, int flags) throws IOException {
+ BVGraph.store(graph, basename, windowSize, maxRefCount, minIntervalLength, zetaK, flags, (ProgressLogger)null);
+ }
+
+ /** Writes the given graph using a given base name, with all
+ * parameters set to their default values.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param pl a progress logger to log the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename, ProgressLogger pl) throws IOException {
+ BVGraph.store(graph, basename, -1, -1, -1, -1, 0, pl);
+ }
+
+
+ /** Writes the given graph using a given base name, without any progress logger and with all
+ * parameters set to their default values.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ public static void store(ImmutableGraph graph, CharSequence basename) throws IOException {
+ BVGraph.store(graph, basename, (ProgressLogger)null);
+ }
+
+
+ /** Updates a list of exponential bins using the gaps a given list of strinctly increasing integers.
+ * @param currNode the current node.
+ * @param list a strictly increasing list of integers.
+ * @param length the number of valid elements in <code>list</code>.
+ * @param bin the bins.
+ */
+ protected static void updateBins(final long currNode, final long[] list, final int length, final long[] bin) {
+ for(int i = length - 1; i-- != 0;) bin[Fast.mostSignificantBit(list[i + 1] - list[i])]++;
+ final int msb = Fast.mostSignificantBit(Fast.int2nat(list[0] - currNode));
+ if (msb >= 0) bin[msb]++;
+ }
+
+ /** Statistics for the gap width of successor lists (exponentially binned). */
+ private long[] successorGapStats;
+ /** Statistics for the gap width of residuals (exponentially binned). */
+ private long[] residualGapStats;
+ /** Bits used for outdegress. */
+ private long bitsForOutdegrees;
+ /** Bits used to write backward references. */
+ private long bitsForReferences;
+ /** Bits used to write inclusion-exclusion blocks. */
+ private long bitsForBlocks;
+ /** Bits used to write residuals. */
+ private long bitsForResiduals;
+ /** Bits used to write intervals. */
+ private long bitsForIntervals;
+
+
+ /** Writes the given graph <code>graph</code> using a given base name, and the compression parameters and flags
+ * of this graph object. Note that the latter is relevant only as far as parameters and flags are concerned; its
+ * content is really irrelevant.
+ *
+ * @param graph a graph to be compressed.
+ * @param basename a base name.
+ * @param pl a progress logger to measure the state of compression, or <code>null</code> if no logging is required.
+ * @throws IOException if some exception is raised while writing the graph.
+ */
+ private void storeInternal(ImmutableGraph graph, CharSequence basename, ProgressLogger pl) throws IOException {
+ // Used for differential compression
+ final OutputBitStream bitCount = new OutputBitStream(NullOutputStream.getInstance(), 0);
+ int outd, currIndex, j, bestIndex, cand;
+ long best, t, bitOffset = 0, currNode = -1;
+ copiedArcs = 0;
+ intervalisedArcs = 0;
+ residualArcs = 0;
+
+ OutputBitStream graphObs = new OutputBitStream(new FileOutputStream(basename + GRAPH_EXTENSION), STD_BUFFER_SIZE);
+ OutputBitStream offsetObs = new OutputBitStream(new FileOutputStream(basename + OFFSETS_EXTENSION), STD_BUFFER_SIZE);
+
+ if (STATS) {
+ offsetStats = new PrintWriter(new FileWriter(basename + ".offsetStats"));
+ referenceStats = new PrintWriter(new FileWriter(basename + ".referenceStats"));
+ outdegreeStats = new PrintWriter(new FileWriter(basename + ".outdegreeStats"));
+ blockCountStats = new PrintWriter(new FileWriter(basename + ".blockCountStats"));
+ blockStats = new PrintWriter(new FileWriter(basename + ".blockStats"));
+ intervalCountStats = new PrintWriter(new FileWriter(basename + ".intervalCountStats"));
+ leftStats = new PrintWriter(new FileWriter(basename + ".leftStats"));
+ lenStats = new PrintWriter(new FileWriter(basename + ".lenStats"));
+ residualCountStats = new PrintWriter(new FileWriter(basename + ".residualCountStats"));
+ residualStats = new PrintWriter(new FileWriter(basename + ".residualStats"));
+ }
+
+ final int cyclicBufferSize = windowSize + 1;
+ // Cyclic array of previous lists.
+ long list[][] = new long[cyclicBufferSize][INITIAL_SUCCESSOR_LIST_LENGTH];
+ // For each list, its length.
+ int listLen[] = new int[cyclicBufferSize];
+ // For each list, the depth of its references.
+ int refCount[] = new int[cyclicBufferSize];
+ successorGapStats = new long[64];
+ residualGapStats = new long[64];
+
+ long totRef = 0, totDist = 0, totLinks = 0;
+
+ // Note that it is fundamental that the time required to set up the iterator is not measured by the progress logger.
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ nodeIterator.hasNext(); // Forces offline graphs to fill buffers.
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ try {
+ pl.expectedUpdates = graph.numNodes();
+ }
+ catch(UnsupportedOperationException ignore) {}
+ pl.start("Storing...");
+ }
+
+ // We iterate over the nodes of graph
+ while(nodeIterator.hasNext()) {
+ // currNode is the currently examined node, of outdegree outd, with index currIndex (within the cyclic array)
+ long u = nodeIterator.nextLong();
+ if (++currNode != u) throw new IllegalStateException("Invalid node sequence: expected " + currNode + ", found " + u);
+ outd = (int)nodeIterator.outdegree();// get the number of successors of currNode
+ currIndex = (int)(currNode % cyclicBufferSize);
+
+ // We write the current offset to the offset stream
+ writeOffset(offsetObs, graphObs.writtenBits() - bitOffset);
+
+ if (STATS) offsetStats.println(graphObs.writtenBits() - bitOffset);
+
+ bitOffset = graphObs.writtenBits();
+
+ // We write the node outdegree
+ bitsForOutdegrees += writeOutdegree(graphObs, outd);
+
+ if (STATS) outdegreeStats.println(outd);
+
+ if (outd > list[currIndex].length) list[currIndex] = LongArrays.ensureCapacity(list[currIndex], outd);
+
+ // The successor list we are going to compress and write out
+ LongBigArrays.copyFromBig(nodeIterator.successorBigArray(), 0, list[currIndex], 0, outd);
+ listLen[currIndex] = outd;
+
+ if (outd > 0) {
+ updateBins(currNode, list[currIndex], outd, successorGapStats);
+ try {
+ // Now we check the best candidate for compression.
+ best = Long.MAX_VALUE;
+ bestIndex = -1;
+
+ refCount[currIndex] = -1;
+
+ for(j = 0; j < cyclicBufferSize; j++) {
+ cand = (int)((currNode - j + cyclicBufferSize) % cyclicBufferSize);
+ if (refCount[cand] < maxRefCount && listLen[cand] != 0
+ && (t = diffComp(bitCount, currNode, j, list[cand], listLen[cand], list[currIndex], listLen[currIndex], false)) < best) {
+ best = t;
+ bestIndex = cand;
+ }
+ }
+
+ if (ASSERTS) assert bestIndex >= 0;
+ refCount[currIndex] = refCount[bestIndex] + 1;
+ diffComp(graphObs, currNode, (int)((currNode - bestIndex + cyclicBufferSize) % cyclicBufferSize), list[bestIndex],
+ listLen[bestIndex], list[currIndex], listLen[currIndex], true);
+
+ totLinks += outd;
+ totRef += refCount[currIndex];
+ totDist += (currNode - bestIndex + cyclicBufferSize) % cyclicBufferSize;
+ } catch (RuntimeException e) {
+ LOGGER.debug("An exception occurred while storing node " + currNode);
+ throw e;
+ }
+ }
+
+
+ if (pl != null && ((currNode + 1) & ((1 << 20) - 1)) == 0) LOGGER.info(new Formatter(Locale.ROOT).format(
+ "bits/link: %.3f; bits/node: %.3f; avgref: %.3f; avgdist: %.3f.",
+ Double.valueOf((double)graphObs.writtenBits() / (totLinks != 0 ? totLinks : 1)),
+ Double.valueOf((double)graphObs.writtenBits() / currNode),
+ Double.valueOf((double)totRef / currNode),
+ Double.valueOf((double)totDist / currNode)
+ ).toString()
+ );
+
+ if (pl != null) pl.update();
+ }
+
+ if (currNode + 1 != graph.numNodes()) throw new IllegalStateException("The graph claimed to have " + graph.numNodes() + " nodes, but the node iterator returned " + (currNode + 1));
+
+ // We write the final offset to the offset stream.
+ writeOffset(offsetObs, graphObs.writtenBits() - bitOffset);
+
+ graphObs.close();
+ offsetObs.close();
+
+ if (pl != null) pl.done();
+
+ final DecimalFormat format = ((DecimalFormat)NumberFormat.getInstance(Locale.US));
+ format.applyPattern("0.###");
+
+ final Properties properties = new Properties();
+ final long n = graph.numNodes(); // At this point this *must* work (see ArcListASCIIGraph)
+ properties.setProperty("nodes", String.valueOf(n));
+ properties.setProperty("arcs", String.valueOf(totLinks));
+ properties.setProperty("windowsize", String.valueOf(windowSize));
+ properties.setProperty("maxrefcount", String.valueOf(maxRefCount));
+ properties.setProperty("minintervallength", String.valueOf(minIntervalLength));
+ if (residualCoding == ZETA) properties.setProperty("zetak", String.valueOf(zetaK));
+ properties.setProperty("compressionflags", flags2String(flags).toString());
+ properties.setProperty("avgref", format.format((double)totRef / n));
+ properties.setProperty("avgdist", format.format((double) totDist / n));
+ properties.setProperty("copiedarcs", String.valueOf(copiedArcs));
+ properties.setProperty("intervalisedarcs", String.valueOf(intervalisedArcs));
+ properties.setProperty("residualarcs", String.valueOf(residualArcs));
+ properties.setProperty("bitsperlink", format.format((double)graphObs.writtenBits() / totLinks));
+ properties.setProperty("compratio", format.format(graphObs.writtenBits() * Math.log(2) / (stirling((double)n * n) - stirling(totLinks) - stirling((double)n * n - totLinks))));
+ properties.setProperty("bitspernode", format.format((double)graphObs.writtenBits() / n));
+ properties.setProperty("avgbitsforoutdegrees", format.format((double)bitsForOutdegrees / n));
+ properties.setProperty("avgbitsforreferences", format.format((double)bitsForReferences / n));
+ properties.setProperty("avgbitsforblocks", format.format((double)bitsForBlocks / n));
+ properties.setProperty("avgbitsforresiduals", format.format((double)bitsForResiduals / n));
+ properties.setProperty("avgbitsforintervals", format.format((double)bitsForIntervals / n));
+ properties.setProperty("bitsforoutdegrees", Long.toString(bitsForOutdegrees));
+ properties.setProperty("bitsforreferences", Long.toString(bitsForReferences));
+ properties.setProperty("bitsforblocks", Long.toString(bitsForBlocks));
+ properties.setProperty("bitsforresiduals", Long.toString(bitsForResiduals));
+ properties.setProperty("bitsforintervals", Long.toString(bitsForIntervals));
+ properties.setProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY, this.getClass().getName());
+ properties.setProperty("version", String.valueOf(BVGRAPH_VERSION));
+ final FileOutputStream propertyFile = new FileOutputStream(basename + PROPERTIES_EXTENSION);
+ // Binned data
+ int l;
+ for(l = successorGapStats.length; l-- != 0;) if (successorGapStats[l] != 0) break;
+ StringBuilder s = new StringBuilder();
+ BigInteger totGap = BigInteger.ZERO;
+ double totLogGap = 0;
+ long numGaps = 0;
+
+ long g = 1;
+ for(int i = 0; i <= l; i++) {
+ if (i != 0) s.append(',');
+ s.append(successorGapStats[i]);
+ numGaps += successorGapStats[i];
+ totGap = totGap.add(BigInteger.valueOf(g * 2 + g - 1).multiply(BigInteger.valueOf(successorGapStats[i])));
+ totLogGap += (Fast.log2(g * 2 + g + 1) - 1) * successorGapStats[i];
+ g *= 2;
+ }
+
+ properties.setProperty("successorexpstats", s.toString());
+ properties.setProperty("successoravggap", numGaps == 0 ? "0" : new BigDecimal(totGap).divide(BigDecimal.valueOf(numGaps * 2), 3, RoundingMode.HALF_EVEN).toString());
+ properties.setProperty("successoravgloggap", numGaps == 0 ? "0" : Double.toString(totLogGap / numGaps));
+
+ s.setLength(0);
+
+ for(l = residualGapStats.length; l-- != 0;) if (residualGapStats[l] != 0) break;
+ g = 1;
+ numGaps = 0;
+ totLogGap = 0;
+ totGap = BigInteger.ZERO;
+ for(int i = 0; i <= l; i++) {
+ if (i != 0) s.append(',');
+ s.append(residualGapStats[i]);
+ totGap = totGap.add(BigInteger.valueOf(g * 2 + g - 1).multiply(BigInteger.valueOf(residualGapStats[i])));
+ totLogGap += (Fast.log2(g * 2 + g + 1) - 1) * residualGapStats[i];
+ numGaps += residualGapStats[i];
+ g *= 2;
+ }
+
+ properties.setProperty("residualexpstats", s.toString());
+ properties.setProperty("residualavggap", numGaps == 0 ? "0" : new BigDecimal(totGap).divide(BigDecimal.valueOf(numGaps * 2), 3, RoundingMode.HALF_EVEN).toString());
+ properties.setProperty("residualavgloggap", numGaps == 0 ? "0" : Double.toString(totLogGap / numGaps));
+
+ properties.store(propertyFile, "BVGraph properties");
+
+ propertyFile.close();
+
+ if (STATS) {
+ offsetStats.close();
+ referenceStats.close();
+ outdegreeStats.close();
+ blockCountStats.close();
+ blockStats.close();
+ intervalCountStats.close();
+ leftStats.close();
+ lenStats.close();
+ residualCountStats.close();
+ residualStats.close();
+ }
+ }
+
+ private static double stirling(double n) {
+ return n * Math.log(n) - n + (1./2) * Math.log(2 * Math.PI * n) ;
+ }
+
+ /** Write the offset file to a given bit stream.
+ * @param obs the output bit stream to which offsets will be written.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public void writeOffsets(final OutputBitStream obs, final ProgressLogger pl) throws IOException {
+ final BVGraphNodeIterator nodeIterator = (BVGraphNodeIterator) nodeIterator(0);
+ long n = numNodes();
+ long lastOffset = 0;
+ while(n-- != 0) {
+ // We fetch the current position of the underlying input bit stream, which is at the start of the next node.
+ writeOffset(obs, (int)(nodeIterator.ibs.readBits() - lastOffset));
+ lastOffset = nodeIterator.ibs.readBits();
+ nodeIterator.nextLong();
+ nodeIterator.outdegree();
+ nodeIterator.successorBigArray();
+ if (pl != null) pl.update();
+ }
+ writeOffset(obs, nodeIterator.ibs.readBits() - lastOffset);
+ }
+
+
+ /** Reads an immutable graph and stores it as a {@link BVGraph}. */
+ public static void main(String args[]) throws SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException, ClassNotFoundException, InstantiationException {
+ String source, dest;
+ Class<?> graphClass;
+ int flags = 0;
+
+ SimpleJSAP jsap = new SimpleJSAP(BVGraph.class.getName(), "Compresses differentially a graph. Source and destination are basenames from which suitable filenames will be stemmed; alternatively, if the suitable option was specified, source is a spec (see below). For more information about the compression techniques, see the Javadoc documentation.",
+ new Parameter[] {
+ new FlaggedOption("comp", JSAP.STRING_PARSER, null, JSAP.NOT_REQUIRED, 'c', "comp", "A compression flag (may be specified several times).").setAllowMultipleDeclarations(true),
+ new FlaggedOption("windowSize", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_WINDOW_SIZE), JSAP.NOT_REQUIRED, 'w', "window-size", "Reference window size (0 to disable)."),
+ new FlaggedOption("maxRefCount", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_MAX_REF_COUNT), JSAP.NOT_REQUIRED, 'm', "max-ref-count", "Maximum number of backward references (-1 for ∞)."),
+ new FlaggedOption("minIntervalLength", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_MIN_INTERVAL_LENGTH), JSAP.NOT_REQUIRED, 'i', "min-interval-length", "Minimum length of an interval (0 to disable)."),
+ new FlaggedOption("zetaK", JSAP.INTEGER_PARSER, String.valueOf(DEFAULT_ZETA_K), JSAP.NOT_REQUIRED, 'k', "zeta-k", "The k parameter for zeta-k codes."),
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("offline", 'o', "offline", "No-op for backward compatibility."),
+ new Switch("once", '1', "once", "Use the read-once load method to read a graph from standard input."),
+ new Switch("offsets", 'O', "offsets", "Generates offsets for the source graph."),
+ new Switch("list", 'L', "list", "Precomputes an Elias-Fano list of offsets for the source graph."),
+ new Switch("degrees", 'd', "degrees", "Stores the outdegrees of all nodes using &gamma; coding."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph, or a source spec if --spec was given; it is immaterial when --once is specified."),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph; if omitted, no recompression is performed. This is useful in conjunction with --offsets and --list."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ for(String compressionFlag: jsapResult.getStringArray("comp")) {
+ try {
+ flags |= BVGraph.class.getField(compressionFlag).getInt(BVGraph.class);
+ }
+ catch (Exception notFound) {
+ throw new JSAPException("Compression method " + compressionFlag + " unknown.");
+ }
+ }
+
+ final int windowSize = jsapResult.getInt("windowSize");
+ final int zetaK = jsapResult.getInt("zetaK");
+ int maxRefCount = jsapResult.getInt("maxRefCount");
+ if (maxRefCount == -1) maxRefCount = Integer.MAX_VALUE;
+ final int minIntervalLength = jsapResult.getInt("minIntervalLength");
+ final boolean once = jsapResult.getBoolean("once");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean writeOffsets = jsapResult.getBoolean("offsets");
+ final boolean list = jsapResult.getBoolean("list");
+ final boolean degrees = jsapResult.getBoolean("degrees");
+ graphClass = jsapResult.getClass("graphClass");
+ source = jsapResult.getString("sourceBasename");
+ dest = jsapResult.getString("destBasename");
+
+ final ImmutableGraph graph;
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ if (graphClass != null) {
+ if (spec) {
+ System.err.println("Options --graph-class and --spec are incompatible");
+ System.exit(1);
+ }
+ if (once) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.ONCE.toMethod(), InputStream.class).invoke(null, System.in);
+ else graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.OFFLINE.toMethod(), CharSequence.class).invoke(null, source);
+ }
+ else {
+ if (!spec) graph = once ? ImmutableGraph.loadOnce(System.in) : ImmutableGraph.loadOffline(source, pl);
+ else graph = ObjectParser.fromSpec(source, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ }
+
+ if (dest != null) {
+ if (writeOffsets || list || degrees) throw new IllegalArgumentException("You cannot specify a destination graph with these options");
+ BVGraph.store(graph, dest, windowSize, maxRefCount, minIntervalLength, zetaK, flags, pl);
+ }
+ else {
+ if (! (graph instanceof BVGraph)) throw new IllegalArgumentException("The source graph is not a BVGraph");
+ final BVGraph bvGraph = (BVGraph)graph;
+ if (writeOffsets) {
+ final OutputBitStream offsets = new OutputBitStream(graph.basename() + OFFSETS_EXTENSION, 64 * 1024);
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Writing offsets...");
+ ((BVGraph)graph).writeOffsets(offsets, pl);
+ offsets.close();
+ pl.count = graph.numNodes();
+ pl.done();
+ }
+ if (list) {
+ final InputBitStream offsets = new InputBitStream(graph.basename() + OFFSETS_EXTENSION);
+ BinIO.storeObject(new EliasFanoMonotoneLongBigList(graph.numNodes() + 1, new File(graph.basename() + GRAPH_EXTENSION).length() * Byte.SIZE + 1, new OffsetsLongIterator(bvGraph, offsets)), graph.basename() + OFFSETS_BIG_LIST_EXTENSION);
+ offsets.close();
+ }
+ if (degrees) {
+ final OutputBitStream outdegrees = new OutputBitStream(graph.basename() + OUTDEGREES_EXTENSION, 64 * 1024);
+ NodeIterator nodeIterator = graph.nodeIterator();
+ for(long i = graph.numNodes(); i-- != 0;) {
+ nodeIterator.nextLong();
+ outdegrees.writeLongGamma(nodeIterator.outdegree());
+ }
+
+ outdegrees.close();
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/BuildHostMap.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/BuildHostMap.java
new file mode 100644
index 0000000..5cd96bd
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/BuildHostMap.java
@@ -0,0 +1,129 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2008-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.BufferedReader;
+import java.io.DataOutputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.regex.Pattern;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Charsets;
+import com.google.common.net.InternetDomainName;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** A class computing host-related data given a list of URLs (usually, the URLs of the nodes of a web graph).
+ * All processing is performed by the static utility method {@link #run(BufferedReader, PrintStream, DataOutputStream, DataOutputStream, boolean, ProgressLogger)}.
+ *
+ * <p><strong>Warning:</strong> this class provides a main method that saves the host list to standard output, but it
+ * does some logging, too, so be careful not to log to standard output.
+ *
+ * @author Sebastiano Vigna
+ */
+public class BuildHostMap {
+ private final static Logger LOGGER = LoggerFactory.getLogger(BuildHostMap.class);
+
+ public static final Pattern DOTTED_ADDRESS = Pattern.compile("(([0-9A-Fa-f]+[:])*[0-9A-Fa-f]+)|((((0x[0-9A-Fa-f]+)|([0-9]+))\\.)*((0x[0-9A-Fa-f]+)|([0-9]+)))");
+
+ /** This method reads URLs and writes hosts (or, possibly, top private domains), together with a map
+ * from URLs to hosts and a host count.
+ *
+ * <p><strong>Warning</strong>: presently, this method uses an {@link Object2IntOpenHashMap} to store the
+ * map from host names to host indices. Thus, it cannot handle more than &approx;700 million hosts.
+ *
+ * @param br the buffered reader returning the list of URLs.
+ * @param hosts the print stream where hosts will be printed.
+ * @param mapDos the data output stream where the map from URLs to hosts will be written (one long per URL).
+ * @param countDos the data output stream where the host counts will be written (one long per host).
+ * @param topPrivateDomain if true, we use {@link InternetDomainName#topPrivateDomain()} to map to top private domains, rather than hosts.
+ * @param pl a progress logger, or {@code null}.
+ */
+ public static void run(final BufferedReader br, final PrintStream hosts, final DataOutputStream mapDos, final DataOutputStream countDos, final boolean topPrivateDomain, ProgressLogger pl) throws IOException, URISyntaxException {
+ Object2IntOpenHashMap<String> map = new Object2IntOpenHashMap<>();
+ long[] count = new long[1024];
+ map.defaultReturnValue(-1);
+ int hostIndex = -1;
+
+ if (pl != null) pl.start("Reading URLS...");
+ for(String s, name; (s = br.readLine()) != null;) {
+ final URI uri = new URI(s);
+ name = uri.getHost();
+ if (name == null) throw new IllegalArgumentException();
+ if (topPrivateDomain) {
+ if (! DOTTED_ADDRESS.matcher(name).matches()) {
+ final InternetDomainName idn = InternetDomainName.from(name);
+ if (idn.isUnderPublicSuffix()) name = idn.topPrivateDomain().toString();
+ }
+ }
+
+ if ((hostIndex = map.getInt(name)) == -1) {
+ hosts.println(name);
+ map.put(name, hostIndex = map.size());
+ }
+ mapDos.writeLong(hostIndex);
+ count = LongArrays.grow(count, hostIndex + 1);
+ count[hostIndex]++;
+ if (pl != null) pl.lightUpdate();
+ }
+
+ BinIO.storeLongs(count, 0, map.size(), countDos);
+ if (pl != null) pl.done();
+ }
+
+
+ public static void main(String[] arg) throws IOException, JSAPException, URISyntaxException {
+
+ final SimpleJSAP jsap = new SimpleJSAP(BuildHostMap.class.getName(), "Reads a list of URLs from standard input, computes the host map and counts and saves the host list to standard output.",
+ new Parameter[] {
+ new Switch("topPrivateDomain", 't', "top-private-domain", "Use top private domains instead of hosts."),
+ new UnflaggedOption("map", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the host map will be stored as a list of longs in DataOutput format."),
+ new UnflaggedOption("counts", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The filename where the host count will be stored as a list of longs in DataOutput format.")
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+ final BufferedReader fbr = new BufferedReader(new InputStreamReader(System.in, Charsets.ISO_8859_1));
+ final DataOutputStream mapDos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(jsapResult.getString("map"))));
+ final DataOutputStream countDos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(jsapResult.getString("counts"))));
+ run(fbr, System.out, mapDos, countDos, jsapResult.getBoolean("topPrivateDomain"), new ProgressLogger(LOGGER));
+ mapDos.close();
+ countDos.close();
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/CompressionFlags.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/CompressionFlags.java
new file mode 100644
index 0000000..944f4bd
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/CompressionFlags.java
@@ -0,0 +1,50 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2006-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+/** This interface provides constants to be used as compression flags. */
+
+
+public interface CompressionFlags {
+
+ /** &delta; coding (see {@link it.unimi.dsi.io.OutputBitStream#writeDelta(int)}). */
+ public static final int DELTA = 1;
+
+ /** &gamma; coding (see {@link it.unimi.dsi.io.OutputBitStream#writeGamma(int)}). */
+ public static final int GAMMA = 2;
+
+ /** Golomb coding (see {@link it.unimi.dsi.io.OutputBitStream#writeGolomb(int,int)}). */
+ public static final int GOLOMB = 3;
+
+ /** Skewed Golomb coding (see {@link it.unimi.dsi.io.OutputBitStream#writeSkewedGolomb(int,int)}). */
+ public static final int SKEWED_GOLOMB = 4;
+
+ /** Unary coding (see {@link it.unimi.dsi.io.OutputBitStream#writeUnary(int)}). */
+ public static final int UNARY = 5;
+
+ /** &zeta;<sub><var>k</var></sub> coding (see {@link it.unimi.dsi.io.OutputBitStream#writeZeta(int,int)}). */
+ public static final int ZETA = 6;
+
+ /** Variable-length nibble coding (see {@link it.unimi.dsi.io.OutputBitStream#writeNibble(int)}). */
+ public static final int NIBBLE = 7;
+
+ public static final String[] CODING_NAME = { "DEFAULT", "DELTA", "GAMMA", "GOLOMB", "SKEWED_GOLOMB", "UNARY", "ZETA", "NIBBLE" };
+
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/EFGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/EFGraph.java
new file mode 100644
index 0000000..578dd82
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/EFGraph.java
@@ -0,0 +1,1243 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.bits.Fast.MSBS_STEP_8;
+import static it.unimi.dsi.bits.Fast.ONES_STEP_4;
+import static it.unimi.dsi.bits.Fast.ONES_STEP_8;
+
+import java.io.Closeable;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.RandomAccessFile;
+import java.lang.reflect.InvocationTargetException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.FileChannel;
+import java.nio.channels.WritableByteChannel;
+import java.text.DecimalFormat;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+import it.unimi.dsi.util.ByteBufferLongBigList;
+import it.unimi.dsi.webgraph.LazyIntSkippableIterator;
+
+/** An immutable graph based on the Elias&ndash;Fano representation of monotone sequences.
+ *
+ * @author Sebastiano Vigna
+ */
+
+public class EFGraph extends ImmutableGraph {
+ private static final Logger LOGGER = LoggerFactory.getLogger(EFGraph.class);
+
+ /** The standard extension for the graph longword bit stream. */
+ public static final String GRAPH_EXTENSION = ".graph";
+ /** The standard extension for the graph-offsets bit stream. */
+ public static final String OFFSETS_EXTENSION = ".offsets";
+ /** The standard extension for the cached {@link LongBigList} containing the graph offsets. */
+ public static final String OFFSETS_BIG_LIST_EXTENSION = ".obl";
+ /** The default size of the bit cache. */
+ public final static int DEFAULT_CACHE_SIZE = 16 * 1024 * 1024;
+ /** This number classifies the present graph format. When new features require introducing binary incompatibilities,
+ * this number is bumped so to ensure that old classes do not try to read graphs they cannot understand. */
+ public final static int EFGRAPH_VERSION = 0;
+ /** The default base-two logarithm of the quantum. */
+ public static final int DEFAULT_LOG_2_QUANTUM = 8;
+
+ /** The number of nodes of the graph. */
+ protected final long n;
+ /** The upper bound used during the graph construction (greater than or equal to {@link #n}. */
+ protected final long upperBound;
+ /** The number of arcs of the graph. */
+ protected final long m;
+ /** The list containing the graph. */
+ protected final LongBigList graph;
+ /** An Elias&ndash;Fano monotone list containing the pointers of
+ * the bit streams stored in {@link #graph}. */
+ protected final LongBigList offsets;
+ /** The basename of this graph (or possibly <code>null</code>). */
+ protected final CharSequence basename;
+ /** A longword bit reader used to read outdegrees. */
+ protected final LongWordBitReader outdegreeLongWordBitReader;
+ /** The base-two logarithm of the indexing quantum. */
+ protected final int log2Quantum;
+ /** If not {@link Long#MIN_VALUE}, the node whose degree is cached in {@link #cachedOutdegree}. */
+ protected long cachedNode;
+ /** If {@link #cachedNode} is not {@link Long#MIN_VALUE}, its cached outdegree. */
+ protected long cachedOutdegree;
+ /** If {@link #cachedNode} is not {@link Long#MIN_VALUE}, the position immediately after the coding of the outdegree of {@link #cachedNode}. */
+ protected long cachedPointer;
+
+ protected EFGraph(final CharSequence basename, final long n, final long m, final long upperBound, final int log2Quantum, LongBigList graph, LongBigList offsets) {
+ this.basename = basename;
+ this.n = n;
+ this.m = m;
+ this.upperBound = upperBound;
+ this.log2Quantum = log2Quantum;
+ this.graph = graph;
+ this.offsets = offsets;
+ outdegreeLongWordBitReader = new LongWordBitReader(graph, 0);
+ cachedNode = Long.MIN_VALUE;
+ }
+
+ @Override
+ public CharSequence basename() {
+ return basename;
+ }
+
+ /** Returns the number of lower bits for the Elias&ndash;Fano encoding of a list of given
+ * length, upper bound and strictness.
+ *
+ * @param length the number of elements of the list.
+ * @param upperBound an upper bound for the elements of the list.
+ * @return the number of bits for the Elias&ndash;Fano encoding of a list with the
+ * specified parameters.
+ */
+ public static int lowerBits(final long length, final long upperBound) {
+ return length == 0 ? 0 : Math.max(0, Fast.mostSignificantBit(upperBound / length));
+ }
+
+ /** Returns the size in bits of forward or skip pointers to the Elias&ndash;Fano encoding of a list of given
+ * length, upper bound and strictness.
+ *
+ * @param length the number of elements of the list.
+ * @param upperBound an upper bound for the elements of the list.
+ * @return the size of bits of forward or skip pointers the Elias&ndash;Fano encoding of a list with the
+ * specified parameters.
+ */
+ public static int pointerSize(final long length, final long upperBound) {
+ return Math.max(0, Fast.ceilLog2(length + (upperBound >>> lowerBits(length, upperBound))));
+ }
+
+ /** Returns the number of forward or skip pointers to the Elias&ndash;Fano encoding of a list of given
+ * length, upper bound and strictness.
+ *
+ * @param length the number of elements of the list.
+ * @param upperBound an upper bound for the elements of the list.
+ * @param log2Quantum the logarithm of the quantum size.
+ * @return an upper bound on the number of skip pointers or
+ * the (exact) number of forward pointers.
+ */
+ public static long numberOfPointers(final long length, final long upperBound, final int log2Quantum) {
+ if (length == 0) return 0;
+ return (upperBound >>> lowerBits(length, upperBound)) >>> log2Quantum;
+ }
+
+ protected final static class LongWordCache implements Closeable {
+ /** The spill file. */
+ private final File spillFile;
+ /** A channel opened on {@link #spillFile}. */
+ private final FileChannel spillChannel;
+ /** A cache for longwords. Will be spilled to {@link #spillChannel} in case more than {@link #cacheLength} bits are added. */
+ private final ByteBuffer cache;
+ /** The current bit buffer. */
+ private long buffer;
+ /** The current number of free bits in {@link #buffer}. */
+ private int free;
+ /** The length of the cache, in bits. */
+ private final long cacheLength;
+ /** The number of bits currently stored. */
+ private long length;
+ /** Whether {@link #spillChannel} should be repositioned at 0 before usage. */
+ private boolean spillMustBeRewind;
+
+ @SuppressWarnings("resource")
+ public LongWordCache(final int cacheSize, final String suffix) throws IOException {
+ spillFile = File.createTempFile(EFGraph.class.getName(), suffix);
+ spillFile.deleteOnExit();
+ spillChannel = new RandomAccessFile(spillFile, "rw").getChannel();
+ cache = ByteBuffer.allocateDirect(cacheSize).order(ByteOrder.nativeOrder());
+ cacheLength = cacheSize * 8L;
+ free = Long.SIZE;
+ }
+
+ private void flushBuffer() throws IOException {
+ cache.putLong(buffer);
+ if (! cache.hasRemaining()) {
+ if (spillMustBeRewind) {
+ spillMustBeRewind = false;
+ spillChannel.position(0);
+ }
+ cache.flip();
+ spillChannel.write(cache);
+ cache.clear();
+ }
+ }
+
+ public int append(final long value, final int width) throws IOException {
+ assert width == Long.SIZE || (-1L << width & value) == 0;
+ buffer |= value << (Long.SIZE - free);
+ length += width;
+
+ if (width < free) free -= width;
+ else {
+ flushBuffer();
+
+ if (width == free) {
+ buffer = 0;
+ free = Long.SIZE;
+ }
+ else {
+ // free < Long.SIZE
+ buffer = value >>> free;
+ free = Long.SIZE - width + free; // width > free
+ }
+ }
+ return width;
+ }
+
+ public void clear() {
+ length = buffer = 0;
+ free = Long.SIZE;
+ cache.clear();
+ spillMustBeRewind = true;
+ }
+
+ @Override
+ public void close() throws IOException {
+ spillChannel.close();
+ spillFile.delete();
+ }
+
+ public long length() {
+ return length;
+ }
+
+ public void writeUnary(int l) throws IOException {
+ if (l >= free) {
+ // Phase 1: align
+ l -= free;
+ length += free;
+ flushBuffer();
+
+ // Phase 2: jump over longwords
+ buffer = 0;
+ free = Long.SIZE;
+ while(l >= Long.SIZE) {
+ flushBuffer();
+ l -= Long.SIZE;
+ length += Long.SIZE;
+ }
+ }
+
+ append(1L << l, l + 1);
+ }
+
+ public long readLong() throws IOException {
+ if (! cache.hasRemaining()) {
+ cache.clear();
+ spillChannel.read(cache);
+ cache.flip();
+ }
+ return cache.getLong();
+ }
+
+ public void rewind() throws IOException {
+ if (free != Long.SIZE) cache.putLong(buffer);
+
+ if (length > cacheLength) {
+ cache.flip();
+ spillChannel.write(cache);
+ spillChannel.position(0);
+ cache.clear();
+ spillChannel.read(cache);
+ cache.flip();
+ }
+ else cache.rewind();
+ }
+ }
+
+ public final static class LongWordOutputBitStream {
+ private static final int BUFFER_SIZE = 64 * 1024;
+
+ /** The 64-bit buffer, whose upper {@link #free} bits do not contain data. */
+ private long buffer;
+ /** The Java nio buffer used to write with prescribed endianness. */
+ private final ByteBuffer byteBuffer;
+ /** The number of upper free bits in {@link #buffer} (strictly positive). */
+ private int free;
+ /** The output channel. */
+ private final WritableByteChannel writableByteChannel;
+
+ public LongWordOutputBitStream(final WritableByteChannel writableByteChannel, final ByteOrder byteOrder) {
+ this.writableByteChannel = writableByteChannel;
+ byteBuffer = ByteBuffer.allocateDirect(BUFFER_SIZE).order(byteOrder);
+ free = Long.SIZE;
+ }
+
+ public int append(final long value, final int width) throws IOException {
+ assert width == Long.SIZE || (-1L << width & value) == 0;
+ buffer |= value << (Long.SIZE - free);
+
+ if (width < free) free -= width;
+ else {
+ byteBuffer.putLong(buffer); // filled
+ if (! byteBuffer.hasRemaining()) {
+ byteBuffer.flip();
+ writableByteChannel.write(byteBuffer);
+ byteBuffer.clear();
+ }
+
+ if (width == free) {
+ buffer = 0;
+ free = Long.SIZE;
+ }
+ else {
+ // free < Long.SIZE
+ buffer = value >>> free;
+ free = Long.SIZE - width + free; // width > free
+ }
+ }
+ return width;
+ }
+
+ public long append(final long[] value, final long length) throws IOException {
+ long l = length;
+ for(int i = 0; l > 0; i++) {
+ final int width = (int)Math.min(l, Long.SIZE);
+ append(value[i], width);
+ l -= width;
+ }
+
+ return length;
+ }
+
+ public long append(final LongBigList value, final long length) throws IOException {
+ long l = length;
+ for(long i = 0; l > 0; i++) {
+ final int width = (int)Math.min(l, Long.SIZE);
+ append(value.getLong(i), width);
+ l -= width;
+ }
+
+ return length;
+ }
+
+ public long append(final LongArrayBitVector bv) throws IOException {
+ return append(bv.bits(), bv.length());
+ }
+
+ public long append(final LongWordCache cache) throws IOException {
+ long l = cache.length();
+ cache.rewind();
+ while(l > 0) {
+ final int width = (int)Math.min(l, Long.SIZE);
+ append(cache.readLong(), width);
+ l -= width;
+ }
+
+ return cache.length();
+ }
+
+ public int align() throws IOException {
+ if (free != Long.SIZE) {
+ byteBuffer.putLong(buffer); // partially filled
+ if (! byteBuffer.hasRemaining()) {
+ byteBuffer.flip();
+ writableByteChannel.write(byteBuffer);
+ byteBuffer.clear();
+ }
+
+ final int result = free;
+ buffer = 0;
+ free = Long.SIZE;
+ return result;
+ }
+
+ return 0;
+ }
+
+ public int writeNonZeroGamma(long value) throws IOException {
+ if (value <= 0) throw new IllegalArgumentException("The argument " + value + " is not strictly positive.");
+ final int msb = Fast.mostSignificantBit(value);
+ final long unary = 1L << msb;
+ append(unary, msb + 1);
+ append(value ^ unary, msb);
+ return 2 * msb + 1;
+ }
+
+ public int writeGamma(long value) throws IOException {
+ if (value < 0) throw new IllegalArgumentException("The argument " + value + " is negative.");
+ return writeNonZeroGamma(value + 1);
+ }
+
+ public void close() throws IOException {
+ byteBuffer.putLong(buffer);
+ byteBuffer.flip();
+ writableByteChannel.write(byteBuffer);
+ writableByteChannel.close();
+ }
+ }
+
+ protected final static class Accumulator implements Closeable {
+ /** The minimum size in bytes of a {@link LongWordCache}. */
+ private static final int MIN_CACHE_SIZE = 16;
+ /** The accumulator for successors (to zeros or ones). */
+ private final LongWordCache successors;
+ /** The accumulator for high bits. */
+ private final LongWordCache upperBits;
+ /** The accumulator for low bits. */
+ private final LongWordCache lowerBits;
+ /** The number of lower bits. */
+ private int l;
+ /** A mask extracting the {@link #l} lower bits. */
+ private long lowerBitsMask;
+ /** The number of elements that will be added to this list. */
+ private long length;
+ /** The current length of the list. */
+ private long currentLength;
+ /** The current prefix sum (decremented by {@link #currentLength} if {@link #strict} is true). */
+ private long currentPrefixSum;
+ /** An upper bound to the sum of all values that will be added to the list (decremented by {@link #currentLength} if {@link #strict} is true). */
+ private long correctedUpperBound;
+ /** The logarithm of the indexing quantum. */
+ private int log2Quantum;
+ /** The indexing quantum. */
+ private long quantum;
+ /** The size of a pointer (the ceiling of the logarithm of {@link #maxUpperBits}). */
+ private int pointerSize;
+ /** The last position where a one was set. */
+ private long lastOnePosition;
+ /** The expected number of points. */
+ private long expectedNumberOfPointers;
+ /** The number of bits used for the upper-bits array. */
+ public long bitsForUpperBits;
+ /** The number of bits used for the lower-bits array. */
+ public long bitsForLowerBits;
+ /** The number of bits used for forward/skip pointers. */
+ public long bitsForPointers;
+
+ public Accumulator(int bufferSize, int log2Quantum) throws IOException {
+ // A reasonable logic to allocate space.
+ bufferSize = bufferSize & -bufferSize; // Ensure power of 2.
+ /* Very approximately, half of the cache for lower, half for upper, and a small fraction (8/quantum) for pointers.
+ * This will generate a much larger cache than expected if quantum is very small. */
+ successors = new LongWordCache(Math.max(MIN_CACHE_SIZE, bufferSize >>> Math.max(3, log2Quantum - 3)), "pointers");
+ lowerBits = new LongWordCache(Math.max(MIN_CACHE_SIZE, bufferSize / 2), "lower");
+ upperBits = new LongWordCache(Math.max(MIN_CACHE_SIZE, bufferSize / 2), "upper");
+ }
+
+ public int lowerBits() {
+ return l;
+ }
+
+ public int pointerSize() {
+ return pointerSize;
+ }
+
+ public long numberOfPointers() {
+ return expectedNumberOfPointers;
+ }
+
+ public void init(final long length, final long upperBound, final boolean strict, final boolean indexZeroes, final int log2Quantum) {
+ this.log2Quantum = log2Quantum;
+ this.length = length;
+ quantum = 1L << log2Quantum;
+ successors.clear();
+ lowerBits.clear();
+ upperBits.clear();
+ correctedUpperBound = upperBound - (strict ? length : 0);
+ final long correctedLength = length + (! strict && indexZeroes ? 1 : 0); // The length including the final terminator
+ if (correctedUpperBound < 0) throw new IllegalArgumentException();
+
+ currentPrefixSum = 0;
+ currentLength = 0;
+ lastOnePosition = -1;
+
+ l = EFGraph.lowerBits(correctedLength, upperBound);
+
+
+ lowerBitsMask = (1L << l) - 1;
+
+ pointerSize = EFGraph.pointerSize(correctedLength, upperBound);
+ expectedNumberOfPointers = EFGraph.numberOfPointers(correctedLength, upperBound, log2Quantum);
+ // System.err.println("l = " + l + " numberOfPointers = " + expectedNumberOfPointers + " pointerSize = " + pointerSize);
+ }
+
+ public void add(final long x) throws IOException {
+ if (currentLength != 0 && x == 0) throw new IllegalArgumentException();
+ // System.err.println("add(" + x + "), l = " + l + ", length = " + length);
+ currentPrefixSum += x;
+ if (currentPrefixSum > correctedUpperBound) throw new IllegalArgumentException("Too large prefix sum: " + currentPrefixSum + " >= " + correctedUpperBound);
+ if (l != 0) lowerBits.append(currentPrefixSum & lowerBitsMask, l);
+ final long onePosition = (currentPrefixSum >>> l) + currentLength;
+
+ upperBits.writeUnary((int)(onePosition - lastOnePosition - 1));
+
+ long zeroesBefore = lastOnePosition - currentLength + 1;
+ for(long position = lastOnePosition + (zeroesBefore & -1L << log2Quantum) + quantum - zeroesBefore; position < onePosition; position += quantum, zeroesBefore += quantum)
+ successors.append(position + 1, pointerSize);
+
+ lastOnePosition = onePosition;
+ currentLength++;
+ }
+
+ public long dump(final LongWordOutputBitStream lwobs) throws IOException {
+ if (currentLength != length) throw new IllegalStateException();
+ // Add last fictional document pointer equal to the number of documents.
+ add(correctedUpperBound - currentPrefixSum);
+ assert pointerSize == 0 || successors.length() / pointerSize == expectedNumberOfPointers : "Expected " + expectedNumberOfPointers + " pointers, found " + successors.length() / pointerSize;
+ //System.err.println("pointerSize :" + pointerSize);
+ bitsForPointers = lwobs.append(successors);
+ // System.err.println("pointers: " + bitsForPointers);
+ bitsForLowerBits = lwobs.append(lowerBits);
+ // System.err.println("lower: " + bitsForLowerBits);
+ bitsForUpperBits = lwobs.append(upperBits);
+ // System.err.println("upper: " + bitsForUpperBits);
+ return bitsForLowerBits + bitsForUpperBits + bitsForPointers;
+ }
+
+ @Override
+ public void close() throws IOException {
+ successors.close();
+ upperBits.close();
+ lowerBits.close();
+ }
+ }
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, with no progress logger and
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static EFGraph load(CharSequence basename) throws IOException {
+ return loadInternal(basename, false, null);
+ }
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, with
+ * all offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ public static EFGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return loadInternal(basename, false, pl);
+ }
+
+ /** Creates a new {@link EFGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+ public static EFGraph loadMapped(CharSequence basename) throws IOException {
+ return loadInternal(basename, true, null);
+ }
+
+ /** Creates a new {@link EFGraph} by memory-mapping a graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the offsets, or <code>null</code>.
+ * @return an {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+ public static EFGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return loadInternal(basename, true, pl);
+ }
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence, ProgressLogger)} or {@link #loadMapped(CharSequence, ProgressLogger)} instead.
+ */
+ @Deprecated
+ public static EFGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return EFGraph.load(basename, pl);
+ }
+
+
+ /** Creates a new {@link EFGraph} by loading a compressed graph file from disk to memory, with no progress logger and
+ * without offsets.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence)} or {@link #loadMapped(CharSequence)} instead.
+ */
+ @Deprecated
+ public static EFGraph loadSequential(CharSequence basename) throws IOException {
+ return EFGraph.load(basename);
+ }
+
+ /** Creates a new {@link EFGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata.
+ */
+ public static EFGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return EFGraph.loadMapped(basename, pl);
+ }
+
+
+
+ /** Creates a new {@link EFGraph} by loading just the metadata of a compressed graph file.
+ *
+ * @param basename the basename of the graph.
+ * @return a {@link EFGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the metadata.
+ */
+ public static EFGraph loadOffline(CharSequence basename) throws IOException {
+ return EFGraph.loadMapped(basename, null);
+ }
+
+ /** An iterator returning the offsets. */
+ private final static class OffsetsLongIterator implements LongIterator {
+ private final InputBitStream offsetIbs;
+ private final long n;
+ private long offset;
+ private long i;
+
+ private OffsetsLongIterator(final InputBitStream offsetIbs, final long n) {
+ this.offsetIbs = offsetIbs;
+ this.n = n;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return i <= n;
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ i++;
+ try {
+ return offset += offsetIbs.readLongDelta();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ }
+
+ /** Loads a compressed graph file from disk into this graph. Note that this method should
+ * be called <em>only</em> on a newly created graph.
+ *
+ * @param basename the basename of the graph.
+ * @param mapped whether we want to memory-map the file.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return this graph.
+ */
+ protected static EFGraph loadInternal(final CharSequence basename, final boolean mapped, final ProgressLogger pl) throws IOException {
+ // First of all, we read the property file to get the relevant data.
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ // Soft check--we accept standard stuff, too.
+ if (! EFGraph.class.getName().equals(properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY).replace("it.unimi.dsi.webgraph", "it.unimi.dsi.big.webgraph")))
+ throw new IOException("This class (" + EFGraph.class.getName() + ") cannot load a graph stored using class \"" + properties.getProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY) + "\"");
+
+ if (properties.getProperty("version") == null) throw new IOException("Missing format version information");
+ else if (Integer.parseInt(properties.getProperty("version")) > EFGRAPH_VERSION) throw new IOException("This graph uses format " + properties.getProperty("version") + ", but this class can understand only graphs up to format " + EFGRAPH_VERSION);;
+ final long n = Long.parseLong(properties.getProperty("nodes"));
+ final long m = Long.parseLong(properties.getProperty("arcs"));
+ final long upperBound = properties.containsKey("upperbound") ? Long.parseLong(properties.getProperty("upperbound")) : n;
+ final long quantum = Long.parseLong(properties.getProperty("quantum"));
+ final int log2Quantum = Fast.mostSignificantBit(quantum);
+ if (1L << log2Quantum != quantum) throw new IllegalArgumentException("Illegal quantum (must be a power of 2): " + quantum);
+
+ final ByteOrder byteOrder;
+ if (properties.get("byteorder").equals(ByteOrder.BIG_ENDIAN.toString())) byteOrder = ByteOrder.BIG_ENDIAN;
+ else if (properties.get("byteorder").equals(ByteOrder.LITTLE_ENDIAN.toString())) byteOrder = ByteOrder.LITTLE_ENDIAN;
+ else throw new IllegalArgumentException("Unknown byte order " + properties.get("byteorder"));
+
+ final FileInputStream graphIs = new FileInputStream(basename + GRAPH_EXTENSION);
+ final LongBigList graph;
+ if (mapped) graph = ByteBufferLongBigList.map(graphIs.getChannel(), byteOrder);
+ else {
+ if (pl != null) {
+ pl.itemsName = "bytes";
+ pl.start("Loading graph...");
+ }
+
+ graph = it.unimi.dsi.webgraph.EFGraph.loadLongBigList(basename + GRAPH_EXTENSION, byteOrder);
+
+ if (pl != null) {
+ pl.count = graph.size64() * (Long.SIZE / Byte.SIZE);
+ pl.done();
+ }
+ graphIs.close();
+ }
+
+ if (pl != null) {
+ pl.itemsName = "deltas";
+ pl.start("Loading offsets...");
+ }
+
+ // We try to load a cached big list.
+ final File offsetsBigListFile = new File(basename + OFFSETS_BIG_LIST_EXTENSION);
+ LongBigList offsets = null;
+
+ if (offsetsBigListFile.exists()) {
+ if (new File(basename + OFFSETS_EXTENSION).lastModified() > offsetsBigListFile.lastModified()) LOGGER.warn("A cached long big list of offsets was found, but the corresponding offsets file has a later modification time");
+ else try {
+ offsets = (LongBigList)BinIO.loadObject(offsetsBigListFile);
+ }
+ catch (ClassNotFoundException e) {
+ LOGGER.warn("A cached long big list of offsets was found, but its class is unknown", e);
+ }
+ }
+
+ if (offsets == null) {
+ final InputBitStream offsetIbs = new InputBitStream(basename + OFFSETS_EXTENSION);
+ offsets = new EliasFanoMonotoneLongBigList(n + 1, graph.size64() * Long.SIZE + 1, new OffsetsLongIterator(offsetIbs, n));
+ offsetIbs.close();
+ }
+
+ if (pl != null) {
+ pl.count = n + 1;
+ pl.done();
+ if (offsets instanceof EliasFanoMonotoneLongBigList) pl.logger().info("Pointer bits per node: " + Util.format(((EliasFanoMonotoneLongBigList)offsets).numBits() / (n + 1.0)));
+ }
+
+ return new EFGraph(basename, n, m, upperBound, log2Quantum, graph, offsets);
+ }
+
+
+ public static void store(ImmutableGraph graph, final long upperBound, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ store(graph, upperBound, basename, DEFAULT_LOG_2_QUANTUM, DEFAULT_CACHE_SIZE, ByteOrder.nativeOrder(), pl);
+ }
+
+ public static void store(ImmutableGraph graph, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ store(graph, basename, DEFAULT_LOG_2_QUANTUM, DEFAULT_CACHE_SIZE, ByteOrder.nativeOrder(), pl);
+ }
+
+ public static void store(ImmutableGraph graph, final CharSequence basename) throws IOException {
+ store(graph, basename, null);
+ }
+
+ private static double stirling(double n) {
+ return n * Math.log(n) - n + (1./2) * Math.log(2 * Math.PI * n) ;
+ }
+
+ public static void store(ImmutableGraph graph, final CharSequence basename, final int log2Quantum, final int cacheSize, final ByteOrder byteOrder, final ProgressLogger pl) throws IOException {
+ store(graph, graph.numNodes(), basename, log2Quantum, cacheSize, byteOrder, pl);
+ }
+
+ public static void store(ImmutableGraph graph, final long upperBound, final CharSequence basename, final int log2Quantum, final int cacheSize, final ByteOrder byteOrder, final ProgressLogger pl) throws IOException {
+ if (log2Quantum < 0) throw new IllegalArgumentException(Integer.toString(log2Quantum));
+
+ final Accumulator successorsAccumulator = new Accumulator(cacheSize, log2Quantum);
+ final FileOutputStream graphOs = new FileOutputStream(basename + GRAPH_EXTENSION);
+ final FileChannel graphChannel = graphOs.getChannel();
+ final LongWordOutputBitStream graphStream = new LongWordOutputBitStream(graphChannel, byteOrder);
+ final OutputBitStream offsets = new OutputBitStream(basename + OFFSETS_EXTENSION);
+
+ long numberOfArcs = 0;
+ long bitsForOutdegrees = 0;
+ long bitsForSuccessors = 0;
+ offsets.writeLongDelta(0);
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ try {
+ pl.expectedUpdates = graph.numNodes();
+ }
+ catch(UnsupportedOperationException ignore) {}
+ pl.start("Storing...");
+ }
+
+ for(NodeIterator nodeIterator = graph.nodeIterator(); nodeIterator.hasNext();) {
+ nodeIterator.nextLong();
+ final long outdegree = nodeIterator.outdegree();
+ numberOfArcs += outdegree;
+ long lastSuccessor = 0;
+ final int outdegreeBits = graphStream.writeGamma(outdegree);
+ bitsForOutdegrees += outdegreeBits;
+ successorsAccumulator.init(outdegree, upperBound, false, true, log2Quantum);
+ final LazyLongIterator successors = nodeIterator.successors();
+ for(long successor; (successor = successors.nextLong()) != -1;) {
+ successorsAccumulator.add(successor - lastSuccessor);
+ lastSuccessor = successor;
+ }
+
+ final long successorsBits = successorsAccumulator.dump(graphStream);
+ bitsForSuccessors += successorsBits;
+ offsets.writeLongDelta(outdegreeBits + successorsBits);
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ successorsAccumulator.close();
+ graphStream.close();
+ graphOs.close();
+ offsets.close();
+
+ final long n = graph.numNodes();
+
+ if (pl != null) {
+ pl.done();
+ if (pl.count != n) throw new IllegalStateException("The graph claimed to have " + graph.numNodes() + " nodes, but the node iterator returned " + pl.count);
+ }
+
+ final DecimalFormat format = new java.text.DecimalFormat("0.###");
+ final long writtenBits = new File(basename + GRAPH_EXTENSION).length() * 8;
+
+ final Properties properties = new Properties();
+ properties.setProperty("nodes", String.valueOf(n));
+ properties.setProperty("arcs", String.valueOf(numberOfArcs));
+ if (upperBound != n) properties.setProperty("upperbound", String.valueOf(upperBound));
+ properties.setProperty("quantum", String.valueOf(1L << log2Quantum));
+ properties.setProperty("byteorder", byteOrder.toString());
+ properties.setProperty("bitsperlink", format.format((double)writtenBits / numberOfArcs));
+ properties.setProperty("compratio", format.format(writtenBits * Math.log(2) / (stirling((double)n * n) - stirling(numberOfArcs) - stirling((double)n * n - numberOfArcs))));
+ properties.setProperty("bitspernode", format.format((double)writtenBits / n));
+ properties.setProperty("avgbitsforoutdegrees", format.format((double)bitsForOutdegrees / n));
+ properties.setProperty("bitsforoutdegrees", Long.toString(bitsForOutdegrees));
+ properties.setProperty("bitsforsuccessors", Long.toString(bitsForSuccessors));
+ properties.setProperty(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY, EFGraph.class.getName());
+ properties.setProperty("version", String.valueOf(EFGRAPH_VERSION));
+ final FileOutputStream propertyFile = new FileOutputStream(basename + PROPERTIES_EXTENSION);
+ properties.store(propertyFile, "EFGraph properties");
+ propertyFile.close();
+ }
+
+
+ protected final static class LongWordBitReader {
+
+ private static final boolean DEBUG = false;
+
+ /** The underlying list. */
+ private final LongBigList list;
+ /** The extraction width for {@link #extract()} and {@link #extract(long)}. */
+ private final int l;
+ /** {@link Long#SIZE} minus {@link #l}, cached. */
+ private final int longSizeMinusl;
+ /** The extraction mask for {@link #l} bits. */
+ private final long mask;
+
+ /** The 64-bit buffer, whose lower {@link #filled} bits contain data. */
+ private long buffer;
+ /** The number of lower used bits {@link #buffer}. */
+ private int filled;
+ /** The current position in the list. */
+ private long curr;
+
+ public LongWordBitReader(final LongBigList list, final int l) {
+ assert l < Long.SIZE;
+ this.list = list;
+ this.l = l;
+ this.longSizeMinusl = Long.SIZE - l;
+ mask = (1L << l) - 1;
+ curr = -1;
+ }
+
+ public LongWordBitReader position(final long position) {
+ if (DEBUG) System.err.println(this + ".position(" + position + ") [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ buffer = list.getLong(curr = position / Long.SIZE);
+ final int bitPosition = (int)(position % Long.SIZE);
+ buffer >>>= bitPosition;
+ filled = Long.SIZE - bitPosition;
+
+ if (DEBUG) System.err.println(this + ".position() filled: " + filled + " buffer: " + Long.toBinaryString(buffer));
+ return this;
+ }
+
+ public long position() {
+ return curr * Long.SIZE + Long.SIZE - filled;
+ }
+
+ private long extractInternal(final int width) {
+ if (DEBUG) System.err.println(this + ".extract(" + width + ") [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ if (width <= filled) {
+ long result = buffer & (1L << width) - 1;
+ filled -= width;
+ buffer >>>= width;
+ return result;
+ }
+ else {
+ long result = buffer;
+ buffer = list.getLong(++curr);
+
+ final int remainder = width - filled;
+ // Note that this WON'T WORK if remainder == Long.SIZE, but that's not going to happen.
+ result |= (buffer & (1L << remainder) - 1) << filled;
+ buffer >>>= remainder;
+ filled = Long.SIZE - remainder;
+ return result;
+ }
+ }
+
+ public long extract() {
+ if (DEBUG) System.err.println(this + ".extract() " + l + " bits [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ if (l <= filled) {
+ final long result = buffer & mask;
+ filled -= l;
+ buffer >>>= l;
+ return result;
+ }
+ else {
+ long result = buffer;
+ buffer = list.getLong(++curr);
+ result |= buffer << filled & mask;
+ // Note that this WON'T WORK if remainder == Long.SIZE, but that's not going to happen.
+ buffer >>>= l - filled;
+ filled += longSizeMinusl;
+ return result;
+ }
+ }
+
+ public long extract(long position) {
+ if (DEBUG) System.err.println(this + ".extract(" + position + ") [l=" + l + "]");
+
+ final int bitPosition = (int)(position % Long.SIZE);
+ final int totalOffset = bitPosition + l;
+ final long result = list.getLong(curr = position / Long.SIZE) >>> bitPosition;
+
+ if (totalOffset <= Long.SIZE) {
+ buffer = result >>> l;
+ filled = Long.SIZE - totalOffset;
+ return result & mask;
+ }
+
+ final long t = list.getLong(++curr);
+
+ buffer = t >>> totalOffset;
+ filled = 2 * Long.SIZE - totalOffset;
+
+ return result | t << -bitPosition & mask;
+ }
+
+ public int readUnary() {
+ if (DEBUG) System.err.println(this + ".readUnary() [buffer = " + Long.toBinaryString(buffer) + ", filled = " + filled + "]");
+
+ int accumulated = 0;
+
+ for(;;) {
+ if (buffer != 0) {
+ final int msb = Long.numberOfTrailingZeros(buffer);
+ filled -= msb + 1;
+ /* msb + 1 can be Long.SIZE, so we must break down the shift. */
+ buffer >>>= msb;
+ buffer >>>= 1;
+ if (DEBUG) System.err.println(this + ".readUnary() => " + (msb + accumulated));
+ return msb + accumulated;
+ }
+ accumulated += filled;
+ buffer = list.getLong(++curr);
+ filled = Long.SIZE;
+ }
+
+ }
+
+ public long readNonZeroGamma() {
+ final int msb = readUnary();
+ return extractInternal(msb) | (1L << msb);
+ }
+
+ public long readGamma() {
+ return readNonZeroGamma() - 1;
+ }
+ }
+
+
+ @Override
+ public long numNodes() {
+ return n;
+ }
+
+ @Override
+ public long numArcs() {
+ return m;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return true;
+ }
+
+ @Override
+ public long outdegree(long x) {
+ if (x == cachedNode) return cachedOutdegree;
+ cachedOutdegree = outdegreeLongWordBitReader.position(offsets.getLong(cachedNode = x)).readGamma();
+ cachedPointer = outdegreeLongWordBitReader.position();
+ return cachedOutdegree;
+ }
+
+
+ protected final static class EliasFanoSuccessorReader extends AbstractLazyLongIterator implements LazyLongSkippableIterator {
+ private final static int SKIPPING_THRESHOLD = 8;
+ /** The number of nodes in the graph. */
+ private final long n;
+ /** The upper bound used at construction time. */
+ private final long upperBound;
+ /** The underlying list. */
+ protected final LongBigList graph;
+ /** The longword bit reader for pointers. */
+ protected final LongWordBitReader skipPointers;
+ /** The starting position of the pointers. */
+ protected final long skipPointersStart;
+ /** The starting position of the upper bits. */
+ protected final long upperBitsStart;
+ /** The longword bit reader for the lower bits. */
+ private final LongWordBitReader lowerBits;
+ /** The starting position of the lower bits. */
+ private final long lowerBitsStart;
+ /** The logarithm of the quantum, cached from the graph. */
+ protected final int log2Quantum;
+ /** The quantum, cached from the graph. */
+ protected final int quantum;
+ /** The size of a pointer. */
+ protected final int pointerSize;
+ /** The outdegree. */
+ protected final long outdegree;
+ /** The 64-bit window. */
+ protected long window;
+ /** The current word position in the list of upper bits. */
+ protected long curr;
+ /** The index of the current prefix sum. */
+ public long currentIndex;
+ /** The number of lower bits. */
+ private final int l;
+ /** The last value returned by {@link #nextInt()}, {@link Long#MIN_VALUE} if the list has never
+ * be accessed, or {@link LazyIntSkippableIterator#END_OF_LIST} if the list has been exhausted. */
+ private long last;
+
+ public EliasFanoSuccessorReader(final long n, final long upperBound, final LongBigList graph, final long outdegree, final long skipPointersStart, final int log2Quantum) {
+ this.n = n;
+ this.upperBound = upperBound;
+ this.graph = graph;
+ this.log2Quantum = log2Quantum;
+ this.quantum = 1 << log2Quantum;
+ this.outdegree = outdegree;
+ this.skipPointersStart = skipPointersStart;
+
+ l = lowerBits(outdegree + 1, upperBound);
+ final long numberOfPointers = numberOfPointers(outdegree + 1, upperBound, log2Quantum);
+ pointerSize = pointerSize(outdegree + 1, upperBound);
+
+ lowerBitsStart = skipPointersStart + pointerSize * numberOfPointers;
+ upperBitsStart = lowerBitsStart + l * (outdegree + 1);
+
+ skipPointers = numberOfPointers == 0 ? null : new LongWordBitReader(graph, pointerSize);
+ (lowerBits = new LongWordBitReader(graph, l)).position(lowerBitsStart);
+ position(upperBitsStart);
+ last = Long.MIN_VALUE;
+ }
+
+ private void position(final long position) {
+ window = graph.getLong(curr = position / Long.SIZE) & -1L << (int)(position);
+ }
+
+ private long getNextUpperBits() {
+ while(window == 0) window = graph.getLong(++curr);
+ final long upperBits = curr * Long.SIZE + Long.numberOfTrailingZeros(window) - currentIndex++ - upperBitsStart;
+ window &= window - 1;
+ return upperBits;
+ }
+
+ @Override
+ public long nextLong() {
+ if (currentIndex >= outdegree) {
+ last = END_OF_LIST;
+ return -1;
+ }
+ return last = getNextUpperBits() << l | lowerBits.extract();
+ }
+
+ @Override
+ public long skipTo(final long lowerBound) {
+ if (lowerBound <= last) return last;
+ final long zeroesToSkip = lowerBound >>> l;
+ long delta = zeroesToSkip - ((last & (-1L >>> 1)) >>> l); // This catches last = Integer.MIN_VALUE and turns it into 0
+ assert delta >= 0;
+
+ if (delta < SKIPPING_THRESHOLD) {
+ do nextLong(); while (last < lowerBound);
+ return last == n ? last = END_OF_LIST : last;
+ }
+
+ if (delta > quantum) {
+ final long block = zeroesToSkip >>> log2Quantum;
+ assert block > 0;
+ assert block <= numberOfPointers(outdegree + 1, upperBound, log2Quantum);
+ final long blockZeroes = block << log2Quantum;
+ final long skip = skipPointers.extract(skipPointersStart + (block - 1) * pointerSize);
+ assert skip != 0;
+ position(upperBitsStart + skip);
+ currentIndex = skip - blockZeroes;
+ delta = zeroesToSkip - curr * Long.SIZE + currentIndex + upperBitsStart;
+ }
+
+ assert delta >= 0 : delta;
+
+ for(int bitCount; (bitCount = Long.bitCount(~window)) < delta;) {
+ window = graph.getLong(++curr);
+ delta -= bitCount;
+ currentIndex += Long.SIZE - bitCount;
+ }
+
+ /* Note that for delta == 1 the following code is a NOP, but the test for zero is so faster that
+ it is not worth replacing with a > 1. Predecrementing won't work as delta might be zero. */
+ if (delta-- != 0) {
+ // Phase 1: sums by byte
+ final long word = ~window;
+ assert delta < Long.bitCount(word) : delta + " >= " + Long.bitCount(word);
+ long byteSums = word - ((word & 0xa * ONES_STEP_4) >>> 1);
+ byteSums = (byteSums & 3 * ONES_STEP_4) + ((byteSums >>> 2) & 3 * ONES_STEP_4);
+ byteSums = (byteSums + (byteSums >>> 4)) & 0x0f * ONES_STEP_8;
+ byteSums *= ONES_STEP_8;
+
+ // Phase 2: compare each byte sum with delta to obtain the relevant byte
+ final long rankStep8 = delta * ONES_STEP_8;
+ final long byteOffset = (((((rankStep8 | MSBS_STEP_8) - byteSums) & MSBS_STEP_8) >>> 7) * ONES_STEP_8 >>> 53) & ~0x7;
+
+ final int byteRank = (int)(delta - (((byteSums << 8) >>> byteOffset) & 0xFF));
+
+ final int select = (int)(byteOffset + Fast.selectInByte[(int)(word >>> byteOffset & 0xFF) | byteRank << 8]);
+
+ // We cancel up to, but not including, the target one.
+ window &= -1L << select;
+ currentIndex += select - delta;
+ }
+
+ final long lower = lowerBits.extract(lowerBitsStart + l * currentIndex);
+ last = getNextUpperBits() << l | lower;
+
+ for(;;) {
+ if (last >= lowerBound) return last == n ? last = END_OF_LIST : last;
+ nextLong();
+ }
+ }
+
+ @Override
+ public String toString() {
+ return this.getClass().getSimpleName() + '@' + Integer.toHexString(System.identityHashCode(this));
+ }
+ }
+
+ @Override
+ public LazyLongSkippableIterator successors(final long x) {
+ return new EliasFanoSuccessorReader(n, upperBound, graph, outdegree(x), cachedPointer, log2Quantum);
+ }
+
+ @Override
+ public EFGraph copy() {
+ return new EFGraph(basename, n, m, upperBound, log2Quantum, graph instanceof ByteBufferLongBigList ? ((ByteBufferLongBigList)graph).copy() : graph, offsets);
+ }
+
+ public static void main(String args[]) throws SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException, JSAPException, ClassNotFoundException, InstantiationException {
+ String source, dest;
+ Class<?> graphClass;
+
+ final SimpleJSAP jsap = new SimpleJSAP(BVGraph.class.getName(), "Compresses a graph using the Elias-Fano representation. Source and destination are basenames from which suitable filenames will be stemmed; alternatively, if the suitable option was specified, source is a spec (see below). For more information about the compression techniques, see the Javadoc documentation.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("log2Quantum", JSAP.INTEGER_PARSER, Integer.toString(DEFAULT_LOG_2_QUANTUM), JSAP.NOT_REQUIRED, 'q', "--log2-quantum", "The base-two logarithm of the indexing quantum."),
+ new Switch("offline", 'o', "offline", "No-op for backward compatibility."),
+ new Switch("once", '1', "once", "Use the read-once load method to read a graph from standard input."),
+ new Switch("list", 'L', "list", "Precomputes an Elias-Fano list of offsets for the source graph."),
+ new Switch("fixedWidthList", 'F', "fixed-width-list", "Precomputes a list of fixed-width offsets for the source graph."),
+ new UnflaggedOption("sourceBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source graph, or a source spec if --spec was given; it is immaterial when --once is specified."),
+ new UnflaggedOption("destBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the destination graph; if omitted, no recompression is performed. This is useful in conjunction with --offsets and --list."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean once = jsapResult.getBoolean("once");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean list = jsapResult.getBoolean("list");
+ final boolean fixedWidthList = jsapResult.getBoolean("fixedWidthList");
+ final int log2Quantum = jsapResult.getInt("log2Quantum");
+ graphClass = jsapResult.getClass("graphClass");
+ source = jsapResult.getString("sourceBasename");
+ dest = jsapResult.getString("destBasename");
+
+ final ImmutableGraph graph;
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ if (graphClass != null) {
+ if (spec) {
+ System.err.println("Options --graph-class and --spec are incompatible");
+ System.exit(1);
+ }
+ if (once) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.ONCE.toMethod(), InputStream.class).invoke(null, System.in);
+ else graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.OFFLINE.toMethod(), CharSequence.class).invoke(null, source);
+ }
+ else {
+ if (!spec) graph = once ? ImmutableGraph.loadOnce(System.in) : ImmutableGraph.loadOffline(source, pl);
+ else graph = ObjectParser.fromSpec(source, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ }
+
+ if (dest != null) {
+ if (list || fixedWidthList) throw new IllegalArgumentException("You cannot specify a destination graph with these options");
+ EFGraph.store(graph, dest, log2Quantum, DEFAULT_CACHE_SIZE, ByteOrder.nativeOrder(), pl);
+ }
+ else {
+ if (! (graph instanceof EFGraph)) throw new IllegalArgumentException("The source graph is not an EFGraph");
+ final InputBitStream offsets = new InputBitStream(graph.basename() + OFFSETS_EXTENSION);
+ final long sizeInBits = new File(graph.basename() + GRAPH_EXTENSION).length() * Byte.SIZE + 1;
+ final OffsetsLongIterator offsetsIterator = new OffsetsLongIterator(offsets, graph.numNodes());
+ if (list) {
+ BinIO.storeObject(new EliasFanoMonotoneLongBigList(graph.numNodes() + 1, sizeInBits, offsetsIterator), graph.basename() + OFFSETS_BIG_LIST_EXTENSION);
+ }
+ else if (fixedWidthList) {
+ final LongBigList t = LongArrayBitVector.getInstance().asLongBigList(Fast.length(sizeInBits));
+ while(offsetsIterator.hasNext()) t.add(offsetsIterator.nextLong());
+ BinIO.storeObject(t, graph.basename() + OFFSETS_BIG_LIST_EXTENSION);
+ }
+ offsets.close();
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/GraphClassParser.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/GraphClassParser.java
new file mode 100644
index 0000000..72f7588
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/GraphClassParser.java
@@ -0,0 +1,54 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import com.martiansoftware.jsap.ParseException;
+import com.martiansoftware.jsap.stringparsers.ClassStringParser;
+
+/** A small wrapper around JSAP's standard {@link ClassStringParser}. It
+ * tries to prefix the package names in {@link #PACKAGE} to the provided
+ * class name, making the specification of graph classes on the command line much easier. */
+
+public class GraphClassParser extends ClassStringParser {
+ /** The packages that will be prepended to each graph class. */
+ public final static String[] PACKAGE = { "it.unimi.dsi.big.webgraph", "it.unimi.dsi.big.webgraph.labelling" };
+
+ private final static GraphClassParser INSTANCE = new GraphClassParser();
+
+ @SuppressWarnings("deprecation")
+ protected GraphClassParser() {}
+
+ public static ClassStringParser getParser() {
+ return INSTANCE;
+ }
+
+ /** Parses the given class name, but as a first try prepends the package names found in {@link #PACKAGE}.
+ * @param className the name of a class, possibly without package specification.
+ */
+ @Override
+ public Object parse(String className) throws ParseException {
+ for(String p: PACKAGE) {
+ try {
+ return super.parse(p + "." + className);
+ }
+ catch(Exception notFound) {}
+ }
+ return super.parse(className);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ImmutableGraph.java
new file mode 100644
index 0000000..117eb49
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ImmutableGraph.java
@@ -0,0 +1,900 @@
+package it.unimi.dsi.big.webgraph;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.lang.reflect.InvocationTargetException;
+import java.util.NoSuchElementException;
+import java.util.Properties;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.lang.FlyweightPrototype;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.AbstractLazyIntIterator;
+import it.unimi.dsi.webgraph.LazyIntIterator;
+
+
+/** A simple abstract class representing an immutable graph.
+ *
+ * <P>Subclasses of this class are used to create and access <em>immutable graphs</em>, that is,
+ * graphs that are computed once for all, stored conveniently, and then accessed repeatedly.
+ * Moreover, immutable graphs are usually very large&mdash;so large that two such graphs may not
+ * fit into central memory (the main example being a sizable portion of the web).
+ *
+ * <P>A subclass of this class must implement methods to obtain the {@linkplain
+ * #numNodes() number of nodes}, the {@linkplain #outdegree(long) outdegree of a
+ * node} and the successors of a node (either {@link #successors(long)}
+ * or {@link #successorBigArray(long)}). Additionally, it may provide methods to
+ * obtain the {@linkplain #numNodes() number of arcs}, and a {@linkplain #basename() basename}.
+ *
+ * <P>This class provides {@link #equals(Object)} and {@link #hashCode()} methods that consider
+ * two graph equals if they have the same size and all their successor lists are equal.
+ *
+ * <H2>Iterating on successors</H2>
+ *
+ * <p>Starting with WebGraph 2.0, the iterator architecture is <em>fully lazy</em>&mdash;you have no
+ * <code>hasNext()</code> method. Rather, the {@link LazyLongIterator} returned by {@link #successors(long)}
+ * will return -1 when no more successors are available. The idiomatic forms for enumerating successors
+ * <i>via</i> iterators are
+ * <pre>
+ * LazyLongIterator successors = g.successors(x);
+ * int d = g.outdegree(x);
+ * while(d-- != 0) doSomething(successors.nextInt());
+ * </pre>
+ * and
+ * <pre>
+ * LazyLongIterator successors = g.successors(x);
+ * int t;
+ * while((t = successors.nextInt()) != -1) doSomething(t);
+ * </pre>
+ *
+ * <p>The alternative method {@link #successorBigArray(long)} provides an array containing the successors
+ * <em>and possibly more elements</em>. Use {@link #outdegree(long)} to know how many elements are valid.
+ * The efficiency of {@link #successors(long)} and {@link #successorBigArray(long)} may vary depending on the
+ * implementation.
+ *
+ * <H2>Building an immutable graph</H2>
+ *
+ * <P>Due to their large size, immutable
+ * graphs have a peculiar serialisation scheme. Every subclass of this class
+ * <strong>must</strong> implement a number of static methods that create an immutable
+ * graph, given a string (usually a basename for a set of files) and, optionally, a {@link it.unimi.dsi.logging.ProgressLogger}.
+ * The signatures that <strong>must</strong> be implemented are
+ * <UL>
+ * <LI><code>ImmutableGraph load(CharSequence, ProgressLogger)</code>;
+ * <LI><code>ImmutableGraph load(CharSequence)</code>;
+ * <LI><code>ImmutableGraph loadOffline(CharSequence, ProgressLogger)</code>;
+ * <LI><code>ImmutableGraph loadOffline(CharSequence)</code>.
+ * <LI><code>ImmutableGraph loadOnce(InputStream)</code>;
+ * </UL>
+ *
+ * <p>Additionally, the following signatures <strong>can</strong> be implemented:
+ * <UL>
+ * <LI><code>ImmutableGraph loadMapped(CharSequence, ProgressLogger)</code>;
+ * <LI><code>ImmutableGraph loadMapped(CharSequence)</code>;
+ * </UL>
+ *
+ * <P>The special semantics associated to <code>loadOffline()</code>
+ * is that the immutable graph should be set up, and possibly some metadata could be read from disk, but no
+ * actual data is loaded into memory; the class should guarantee that offline sequential access (i.e., by means
+ * of {@link #nodeIterator(long)}) is still possible. In other words, in most cases {@link #nodeIterator(long)} will have to be
+ * overridden by the subclasses to behave properly even in an offline setting (see {@link #nodeIterator()}).
+ * The special semantics associated with <code>loadOnce()</code> is that the graph can be traversed
+ * <em>just once</em> using a call to {@link #nodeIterator()}. The special semantics associated with <code>loadMapped()</code>
+ * is that metadata could be read from disk, but the graph will be accessed by memory mapping; the class
+ * should guarantee that random access is possible.
+ *
+ * <P>Note that a simple class may just implement all special forms of graph loading delegating to the standard
+ * load method (see, e.g., {@link it.unimi.dsi.big.webgraph.ASCIIGraph}).
+ * Specific implementations of {@link ImmutableGraph} may also decide to expose internal load methods
+ * to make it easier to write load methods for subclasses
+ * (see, e.g., {@link it.unimi.dsi.big.webgraph.BVGraph#loadInternal(CharSequence, int, ProgressLogger) loadInternal()}).
+ *
+ * <P>Analogously, a subclass of this class <strong>may</strong> also implement
+ * <UL>
+ * <LI><code>store(ImmutableGraph, CharSequence, ProgressLogger)</code>;
+ * <LI><code>store(ImmutableGraph, CharSequence)</code>.
+ * </UL>
+ *
+ * These methods must store in compressed form a given immutable graph, using the default values
+ * for compression parameters, etc. It is likely, however, that more
+ * of <code>store</code> methods are available, as parameters vary wildly
+ * from subclass to subclass. The method {@link #store(Class, ImmutableGraph, CharSequence, ProgressLogger)}
+ * invokes by reflection the methods above on the provided class.
+ *
+ * <P>The standard method to build a new immutable graph is creating a (possibly anonymous) class
+ * that extends this class, and save it using a concrete subclass (e.g., {@link it.unimi.dsi.big.webgraph.BVGraph}). See
+ * the source of {@link it.unimi.dsi.big.webgraph.Transform} for several examples.
+ *
+ * <H2>Properties Conventions</H2>
+ *
+ * <P>To provide a simple way to load an immutable graph without knowing in advance its class,
+ * the following convention may be followed: a graph with basename <var><code>name</code></var> may feature
+ * a Java property file <code><var>name</var>.properties</code> with a property <code>graphclass</code>
+ * containing the actual class of the graph. In this case, you can use the implementation of the load/store
+ * methods contained in this class, similarly to the standard Java serialisation scheme. {@link BVGraph}, for instance,
+ * follows this convention, but {@link ASCIIGraph} does not.
+ *
+ * <P>The reason why this convention is not enforced is that it is sometimes useful to write lightweight classes,
+ * mostly for debugging purposes, whose graph representation is entirely contained in a single file (e.g., {@link ASCIIGraph}),
+ * so that {@link #loadOnce(InputStream)} can be easily implemented.
+ *
+ * <H2>Facilities for loading an immutable graph</H2>
+ *
+ * <P>{@link ImmutableGraph} provides ready-made implementations of the load methods that work as follows: they
+ * opens a property file with the given basename, and look for the <code>graphclass</code> property; then, they simply
+ * delegates the actual load to the specified graph class by reflection.
+ *
+ * <h2>Thread-safety and flyweight copies</h2>
+ *
+ * <p>Implementations of this class need not be thread-safe. However, they implement the
+ * {@link FlyweightPrototype} pattern: the {@link #copy()} method is
+ * thread-safe and will return a lightweight copy of the graph&mdash;usually, all immutable
+ * data will be shared between copies. Concurrent access to different copies is safe.
+ *
+ * <p>Note that by contract {@link #copy()} is guaranteed to work only if {@link #randomAccess()}
+ * returns true.
+ */
+
+
+public abstract class ImmutableGraph implements FlyweightPrototype<ImmutableGraph> {
+ private final static Logger LOGGER = LoggerFactory.getLogger(ImmutableGraph.class);
+
+ public static final String GRAPHCLASS_PROPERTY_KEY = "graphclass";
+ /** The standard extension of property files. */
+ public static final String PROPERTIES_EXTENSION = ".properties";
+
+
+ /** A list of the methods that can be used to load a graph. They are used
+ * by {@link ImmutableGraph} and other classes to represent standard
+ * (i.e., random access), sequential, offline and read-once graph loading. */
+
+ public static enum LoadMethod {
+ STANDARD,
+ @Deprecated
+ SEQUENTIAL,
+ OFFLINE,
+ ONCE,
+ MAPPED;
+
+ public String toMethod() {
+ switch(this) {
+ case STANDARD: return "load";
+ case SEQUENTIAL: return "loadSequential";
+ case OFFLINE: return "loadOffline";
+ case ONCE: return "loadOnce";
+ case MAPPED: return "loadMapped";
+ default: throw new AssertionError();
+ }
+ }
+ };
+
+ /** Returns the number of nodes of this graph.
+ *
+ * <p>Albeit this method is not optional, it is allowed that this method throws
+ * an {@link UnsupportedOperationException} if this graph has never been entirely
+ * traversed using a {@link #nodeIterator() node iterator}. This apparently bizarre
+ * behaviour is necessary to support implementations as {@link ArcListASCIIGraph}, which
+ * do not know the actual number of nodes until a traversal has been completed.
+ *
+ * @return the number of nodes.
+ */
+ public abstract long numNodes();
+
+ /** A method returning the number of nodes as an integer, for easier backward compatibility.
+ *
+ * @return {@link #numNodes()}, if it is smaller than {@link Integer#MAX_VALUE}; otherwise,
+ * an exception will be thrown.
+ * @throws IllegalStateException if {@link #numNodes()} is larger than {@link Integer#MAX_VALUE}.
+ */
+ public int intNumNodes() {
+ final long numNodes = numNodes();
+ if (numNodes > Integer.MAX_VALUE) throw new IllegalStateException("This graph has more than Integer.MAX_VALUE nodes");
+ return (int)numNodes;
+ }
+
+ /** Returns the number of arcs of this graph (optional operation).
+ *
+ * @return the number of arcs.
+ */
+ public long numArcs() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Checks whether this graph provides random access to successor lists.
+ *
+ * @return true if this graph provides random access to successor lists.
+ */
+ public abstract boolean randomAccess();
+
+ /** Returns a symbolic basename for this graph (optional operation).
+ *
+ * <P>Implementors of this class may provide a basename (usually
+ * a pathname from which various files storing the graph are stemmed).
+ * This method is optional because it is sometimes unmeaningful (e.g.,
+ * for one-off anonymous classes).
+ *
+ * @return the basename.
+ */
+ public CharSequence basename() {
+ throw new UnsupportedOperationException();
+ }
+
+ /** Returns a lazy iterator over the successors of a given node. The iteration terminates
+ * when -1 is returned.
+ *
+ * <P>This implementation just wraps the array returned by {@link #successorBigArray(long)}. Subclasses
+ * are encouraged to override this implementation.
+ *
+ * <p>The semantics of this method has been significantly modified in WebGraph 2.0 to take advantage of the new,
+ * faster lazy architecture.
+ *
+ * @param x a node.
+ * @return a lazy iterator over the successors of the node.
+ */
+ public LazyLongIterator successors(final long x) {
+ /* If successorArray(x) succeeds, the outdegree is an integer. */
+ return LazyLongIterators.wrap(successorBigArray(x), outdegree(x));
+ }
+
+ /** Returns a reference to an array containing the successors of a given node.
+ *
+ * <P>The returned array may contain more entries than the outdegree of <code>x</code>.
+ * However, only those with indices from 0 (inclusive) to the outdegree of <code>x</code> (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwraps the iterator returned by {@link #successors(long)}. Subclasses
+ * are encouraged to override this implementation.
+ *
+ * @param x a node.
+ * @return an array whose first elements are the successors of the node; the array must not
+ * be modified by the caller.
+ */
+ public long[][] successorBigArray(final long x) {
+ final long[][] successor = LongBigArrays.newBigArray(outdegree(x));
+ LazyLongIterators.unwrap(successors(x), successor);
+ return successor;
+ }
+
+ /** Returns the outdegree of a node.
+ *
+ * @param x a node.
+ * @throws IllegalStateException if called without offsets.
+ * @return the outdegree of the given node.
+ */
+ public abstract long outdegree(long x);
+
+ /** Returns a node iterator for scanning the graph sequentially, starting from the given node.
+ *
+ * <P>This implementation just calls the random-access methods ({@link #successors(long)} and
+ * {@link #outdegree(long)}). More specific implementations may choose to maintain some extra state
+ * to make the enumeration more efficient.
+ *
+ * @param from the node from which the iterator will iterate.
+ * @return a {@link NodeIterator} for accessing nodes and successors sequentially.
+ */
+ public NodeIterator nodeIterator(final long from) {
+ return new NodeIterator() {
+ long curr = from - 1;
+ final long n = numNodes();
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ return ++curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return (curr < n - 1);
+ }
+
+ @Override
+ public LazyLongIterator successors() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return ImmutableGraph.this.successors(curr);
+ }
+
+ @Override
+ public long outdegree() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return ImmutableGraph.this.outdegree(curr);
+ }
+
+ };
+ }
+
+ /** Returns a node iterator for scanning the graph sequentially, starting from the first node.
+ *
+ * @return a {@link NodeIterator} for accessing nodes and successors sequentially.
+ */
+ public NodeIterator nodeIterator() {
+ return nodeIterator(0);
+ }
+
+ /** Returns a flyweight copy of this immutable graph.
+ *
+ * @return a flyweight copy of this immutable graph.
+ * @throws UnsupportedOperationException if flyweight copies are not supported:
+ * support is guaranteed only if {@link #randomAccess()} returns true.
+ * @see FlyweightPrototype
+ */
+
+ @Override
+ public abstract ImmutableGraph copy();
+
+ /** Returns an iterator enumerating the outdegrees of the nodes of this graph.
+ *
+ * @return an iterator enumerating the outdegrees of the nodes of this graph.
+ */
+ public LongIterator outdegrees() {
+ return randomAccess() ?
+ new LongIterator() {
+ private final long n = numNodes();
+ private long next = 0;
+ @Override
+ public boolean hasNext() {
+ return next < n;
+ }
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ return outdegree(next++);
+ }
+ } :
+ new LongIterator() {
+ private final NodeIterator nodeIterator = nodeIterator();
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+ @Override
+ public long nextLong() {
+ nodeIterator.nextLong();
+ return nodeIterator.outdegree();
+ }
+ };
+ }
+
+
+
+ @Override
+ public String toString() {
+ final StringBuilder s = new StringBuilder();
+
+ long numArcs = -1;
+ try {
+ numArcs = numArcs();
+ }
+ catch(UnsupportedOperationException ignore) {}
+
+ s.append("Nodes: " + numNodes() + "\nArcs: " + (numArcs == -1 ? "unknown" : Long.toString(numArcs)) + "\n");
+
+ final NodeIterator nodeIterator = nodeIterator();
+ LazyLongIterator successors;
+ long curr;
+ for (long i = numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextLong();
+ s.append("Successors of " + curr + " (degree " + nodeIterator.outdegree() + "):");
+ successors = nodeIterator.successors();
+ long d = nodeIterator.outdegree();
+ while (d-- != 0) s.append(" " + successors.nextLong());
+ s.append('\n');
+ }
+ return s.toString();
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, without
+ * offsets.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence)} or {@link #loadMapped(CharSequence)} instead.
+ */
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, null);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, without
+ * offsets.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @deprecated Use {@link #loadOffline(CharSequence, ProgressLogger)} or {@link #loadMapped(CharSequence, ProgressLogger)} instead.
+ */
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, null, pl);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading offline a graph file.
+ *
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+ public static ImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, null);
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading offline a graph file.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+ public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, null, pl);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by memory-mapping a graph file.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the offsets, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.MAPPED, basename, null, pl);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by memory-mapping a graph file.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while memory-mapping the graph or reading the offsets.
+ */
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(LoadMethod.MAPPED, basename, null);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading a read-once graph from an input stream.
+ *
+ * <p>This implementation just throws a {@link UnsupportedOperationException}. There
+ * is no way to write a generic implementation, because there is no way to know
+ * in advance the class that should read the graph.
+ *
+ * @param is an input stream containing the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ * @throws UnsupportedOperationException if this graph class does not support read-once graphs.
+ */
+
+ public static ImmutableGraph loadOnce(final InputStream is) throws IOException {
+ throw new UnsupportedOperationException("This class does not support read-once loading");
+ }
+
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, with
+ * all offsets, using no progress logger.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(LoadMethod.STANDARD, basename, null);
+ }
+
+ /** Creates a new {@link ImmutableGraph} by loading a graph file from disk to memory, with
+ * all offsets, using a progress logger.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param basename the basename of the graph.
+ * @param pl a progress logger used while loading the graph, or <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.STANDARD, basename, null, pl);
+ }
+
+ private static final ProgressLogger UNUSED = new ProgressLogger();
+
+ /** Creates a new {@link ImmutableGraph} using the given method and no progress logger.
+ *
+ * @param method the load method.
+ * @param basename the basename of the graph, if <code>method</code> is not {@link LoadMethod#ONCE}.
+ * @param is an input stream the containing the graph, if <code>method</code> is {@link LoadMethod#ONCE}.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ private static ImmutableGraph load(LoadMethod method, CharSequence basename, InputStream is) throws IOException {
+ return load(method, basename, is, UNUSED);
+ }
+
+ /** Creates a new immutable graph by loading a graph file from disk to memory, delegating the
+ * actual loading to the class specified in the <code>graphclass</code> property within the property
+ * file (named <code><var>basename</var>.properties</code>). The exact load method to be used
+ * depends on the <code>method</code> argument.
+ *
+ * <P>This method uses the properties convention described in the {@linkplain ImmutableGraph introduction}.
+ *
+ * @param method the method to be used to load the graph.
+ * @param basename the basename of the graph, if <code>method</code> is not {@link LoadMethod#ONCE}.
+ * @param is an input stream the containing the graph, if <code>method</code> is {@link LoadMethod#ONCE}.
+ * @param pl the progress logger; it can be <code>null</code>.
+ * @return an {@link ImmutableGraph} containing the specified graph.
+ * @throws IOException if an I/O exception occurs while reading the graph.
+ */
+ protected static ImmutableGraph load(LoadMethod method, CharSequence basename, InputStream is, ProgressLogger pl) throws IOException {
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ String graphClassName;
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ if ((graphClassName = properties.getProperty(GRAPHCLASS_PROPERTY_KEY)) == null) throw new IOException("The property file for " + basename + " does not contain a graphclass property");
+
+ // Small kludge to fix old usage of toString() instead of getName();
+ if (graphClassName.startsWith("class ")) graphClassName = graphClassName.substring(6);
+
+ // Small kludge to try to load graphs created with the standard version.
+ if (graphClassName.startsWith("it.unimi.dsi.webgraph")) {
+ final String standardGraphClassName = graphClassName.replace("it.unimi.dsi.webgraph", "it.unimi.dsi.big.webgraph");
+ LOGGER.warn("Replacing class " + graphClassName + " with " + standardGraphClassName);
+ graphClassName = standardGraphClassName;
+ }
+
+ final Class<?> graphClass;
+
+ ImmutableGraph graph = null;
+
+ try {
+ graphClass = Class.forName(graphClassName);
+
+ if (method == LoadMethod.ONCE) graph = (ImmutableGraph)graphClass.getMethod(method.toMethod(), InputStream.class).invoke(null, is);
+ else {
+ if (pl == UNUSED) graph = (ImmutableGraph)graphClass.getMethod(method.toMethod(), CharSequence.class).invoke(null, basename);
+ else graph = (ImmutableGraph)graphClass.getMethod(method.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ }
+ } catch (InvocationTargetException e) {
+ if (e.getCause() instanceof IOException) throw (IOException) e.getCause();
+ throw new RuntimeException(e);
+ } catch(Exception e) {
+ throw new RuntimeException(e);
+ }
+
+ return graph;
+ }
+
+
+ /** Stores an immutable graph using a specified subclass and a progress logger.
+ *
+ * <P>This method is a useful shorthand that invoke by reflection the store method of a given subclass.
+ * Note, however, that usually a subclass will provide more refined store methods with more parameters.
+ *
+ * @param graphClass the subclass of {@link ImmutableGraph} that should store the graph.
+ * @param graph the graph to store.
+ * @param basename the basename.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+
+ public static void store(final Class<?> graphClass, final ImmutableGraph graph, final CharSequence basename, final ProgressLogger pl) throws IOException {
+ if (! ImmutableGraph.class.isAssignableFrom(graphClass)) throw new ClassCastException(graphClass.getName() + " is not a subclass of ImmutableGraph");
+ try {
+ if (pl == UNUSED) graphClass.getMethod("store", ImmutableGraph.class, CharSequence.class).invoke(null, graph, basename);
+ else graphClass.getMethod("store", ImmutableGraph.class, CharSequence.class, ProgressLogger.class).invoke(null, graph, basename, pl);
+ } catch (InvocationTargetException e) {
+ if (e.getCause() instanceof IOException) throw (IOException) e.getCause();
+ throw new RuntimeException(e);
+ } catch(Exception e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ /** Stores an immutable graph using a specified subclass.
+ *
+ * @param graphClass the subclass of {@link ImmutableGraph} that should store the graph.
+ * @param graph the graph to store.
+ * @param basename the basename.
+ * @see #store(Class, ImmutableGraph, CharSequence, ProgressLogger)
+ */
+
+ public static void store(final Class<?> graphClass, final ImmutableGraph graph, final CharSequence basename) throws IOException {
+ store(graphClass, graph, basename, UNUSED);
+ }
+
+ /** Compare this immutable graph to another object.
+ *
+ * @return true iff the given object is an immutable graph of the same size, and
+ * the successor list of every node of this graph is equal to the successor list of the corresponding node of <code>o</code>.
+ */
+
+ @Override
+ public boolean equals(final Object o) {
+ if (! (o instanceof ImmutableGraph)) return false;
+ final ImmutableGraph g = (ImmutableGraph) o;
+ long n = numNodes();
+ if (n != g.numNodes()) return false;
+ final NodeIterator i = nodeIterator(), j = g.nodeIterator();
+ LazyLongIterator s, t;
+ long d;
+
+ while(n-- != 0) {
+ i.nextLong();
+ j.nextLong();
+ if ((d = i.outdegree())
+ != j.outdegree()) return false;
+ s = i.successors();
+ t = j.successors();
+ while(d-- != 0) if (s.nextLong() != t.nextLong()) return false;
+ }
+
+ return true;
+ }
+
+ /** Returns a hash code for this immutable graph.
+ *
+ * @return a hash code for this immutable graph.
+ */
+
+ @Override
+ public int hashCode() {
+ long n = numNodes();
+ long h = -1;
+ final NodeIterator i = nodeIterator();
+ LazyLongIterator s;
+
+ while(n-- != 0) {
+ h = h * 31 + i.nextLong();
+ s = i.successors();
+ long x;
+ while((x = s.nextLong()) != -1) h = h * 31 + x;
+ }
+
+ return (int)(h ^ h >>> 32);
+ }
+
+ private static final class ImmutableGraphAdapter extends ImmutableGraph {
+ private final it.unimi.dsi.webgraph.ImmutableGraph graph;
+
+ public ImmutableGraphAdapter(final it.unimi.dsi.webgraph.ImmutableGraph graph) {
+ this.graph = graph;
+ }
+
+ private final void ensureNode(final long x) {
+ if (x >= Integer.MAX_VALUE) throw new IllegalArgumentException(Long.toString(x));
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final long from) {
+ ensureNode(from - 1);
+ return new NodeIterator() {
+ // This is necessary to work around graphs implementing just nodeIterator().
+ final it.unimi.dsi.webgraph.NodeIterator nodeIterator = from == 0 ? graph.nodeIterator() : graph.nodeIterator((int)from);
+
+ @Override
+ public long nextLong() {
+ return nodeIterator.nextInt();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+
+ @Override
+ public long outdegree() {
+ return nodeIterator.outdegree();
+ }
+
+ @Override
+ public LazyLongIterator successors() {
+ return new AbstractLazyLongIterator() {
+ LazyIntIterator iterator = nodeIterator.successors();
+ @Override
+ public long nextLong() {
+ return iterator.nextInt();
+ }
+ };
+ }
+ };
+ }
+
+ @Override
+ public long numArcs() {
+ return graph.numArcs();
+ }
+
+ @Override
+ public long numNodes() {
+ return graph.numNodes();
+ }
+
+ @Override
+ public long outdegree(final long x) {
+ ensureNode(x);
+ return graph.outdegree((int)x);
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return graph.randomAccess();
+ }
+
+ @Override
+ public LazyLongIterator successors(final long x) {
+ ensureNode(x);
+ return new AbstractLazyLongIterator() {
+ final LazyIntIterator iterator = graph.successors((int)x);
+ @Override
+ public long nextLong() {
+ return iterator.nextInt();
+ }
+ };
+ }
+
+ @Override
+ public CharSequence basename() {
+ return graph.basename();
+ }
+
+ @Override
+ public ImmutableGraph copy() {
+ return new ImmutableGraphAdapter(graph.copy());
+ }
+
+ @Override
+ public boolean equals(Object o) {
+ if (! (o instanceof ImmutableGraph)) return false;
+ return graph.equals(ImmutableGraph.wrap((ImmutableGraph)o));
+ }
+
+ @Override
+ public int hashCode() {
+ return graph.hashCode();
+ }
+
+ @Override
+ public String toString() {
+ return graph.toString();
+ }
+ }
+
+ public static ImmutableGraph wrap(final it.unimi.dsi.webgraph.ImmutableGraph graph) {
+ return new ImmutableGraphAdapter(graph);
+ }
+
+ private static final class BigImmutableGraphAdapter extends it.unimi.dsi.webgraph.ImmutableGraph {
+ private final ImmutableGraph graph;
+
+ public BigImmutableGraphAdapter(final ImmutableGraph graph) {
+ this.graph = graph;
+ }
+
+ private final int check(final long x) {
+ if (x > Integer.MAX_VALUE) throw new IllegalArgumentException(Long.toString(x));
+ return (int)x;
+ }
+
+ @Override
+ public it.unimi.dsi.webgraph.NodeIterator nodeIterator(final int from) {
+ return new it.unimi.dsi.webgraph.NodeIterator() {
+ // This is necessary to work around graphs implementing just nodeIterator().
+ NodeIterator nodeIterator = from == 0 ? graph.nodeIterator() : graph.nodeIterator(from);
+
+ @Override
+ public int nextInt() {
+ return check(nodeIterator.nextLong());
+ }
+
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+
+ @Override
+ public int outdegree() {
+ return check(nodeIterator.outdegree());
+ }
+
+ @Override
+ public LazyIntIterator successors() {
+ return new AbstractLazyIntIterator() {
+ final LazyLongIterator iterator = nodeIterator.successors();
+ @Override
+ public int nextInt() {
+ return check(iterator.nextLong());
+ }
+ };
+ }
+ };
+ }
+
+ @Override
+ public long numArcs() {
+ return graph.numArcs();
+ }
+
+ @Override
+ public int numNodes() {
+ return check(graph.numNodes());
+ }
+
+ @Override
+ public int outdegree(final int x) {
+ return check(graph.outdegree(x));
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return graph.randomAccess();
+ }
+
+ @Override
+ public AbstractLazyIntIterator successors(final int x) {
+ return new AbstractLazyIntIterator() {
+ final LazyLongIterator iterator = graph.successors(x);
+ @Override
+ public int nextInt() {
+ return check(iterator.nextLong());
+ }
+ };
+ }
+
+ @Override
+ public CharSequence basename() {
+ return graph.basename();
+ }
+
+ @Override
+ public it.unimi.dsi.webgraph.ImmutableGraph copy() {
+ return new BigImmutableGraphAdapter(graph.copy());
+ }
+
+ @Override
+ public boolean equals(final Object o) {
+ if (! (o instanceof it.unimi.dsi.webgraph.ImmutableGraph)) return false;
+ return graph.equals(ImmutableGraph.wrap((it.unimi.dsi.webgraph.ImmutableGraph)o));
+ }
+
+ @Override
+ public int hashCode() {
+ return graph.hashCode();
+ }
+
+ @Override
+ public String toString() {
+ return graph.toString();
+ }
+ }
+
+ public static it.unimi.dsi.webgraph.ImmutableGraph wrap(ImmutableGraph graph) {
+ return new BigImmutableGraphAdapter(graph);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ImmutableSequentialGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ImmutableSequentialGraph.java
new file mode 100644
index 0000000..c775b5c
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ImmutableSequentialGraph.java
@@ -0,0 +1,44 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract immutable graph that throws an {@link java.lang.UnsupportedOperationException}
+ * on all random-access methods.
+ *
+ * <p>The main purpose of this class is to be used as a base for the numerous anonymous
+ * classes that do not support random access.
+ */
+
+public abstract class ImmutableSequentialGraph extends ImmutableGraph {
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public long[][] successorBigArray(final long x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public long outdegree(final long x) { throw new UnsupportedOperationException(); }
+ /** Returns false.
+ * @return false.
+ */
+ @Override
+ public boolean randomAccess() { return false; }
+
+ /** Throws an {@link UnsupportedOperationException}. */
+ @Override
+ public ImmutableGraph copy() { throw new UnsupportedOperationException(); }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/IncrementalImmutableSequentialGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/IncrementalImmutableSequentialGraph.java
new file mode 100644
index 0000000..fe89b8f
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/IncrementalImmutableSequentialGraph.java
@@ -0,0 +1,166 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+
+import java.util.NoSuchElementException;
+import java.util.concurrent.ArrayBlockingQueue;
+
+
+/** An adapter exposing an {@link ImmutableGraph} that can be filled incrementally using
+ * a family of {@linkplain #add(long[][], long, long) addition methods} that make it possible to specify
+ * the list of successors of each node in increasing order. At the end of the process, the user
+ * must add the special marker list {@link #END_OF_GRAPH}.
+ *
+ * <p>The class provides a single
+ * call to {@link #nodeIterator()}: once the returned {@link NodeIterator} has been exhausted, {@link #numNodes()} will return the number of nodes,
+ * which will be equal to the number of calls to addition methods.
+ *
+ * <p>The class works using a producer/consumer patten: in a typical usage, the thread invoking the
+ * addition method will be different from the thread performing the traversal, as in
+ * <pre class= code>
+ * final IncrementalImmutableSequentialGraph g = new IncrementalImmutableSequentialGraph();
+ * ExecutorService executor = Executors.newSingleThreadExecutor();
+ * final Future&lt;Void&gt; future = executor.submit(new Callable&lt;Void&gt;() {
+ * public Void call() throws IOException {
+ * BVGraph.store(g, basename);
+ * return null;
+ * }
+ * });
+ *
+ * // Do one add() for each node, to specify the successors
+ *
+ * g.add(IncrementalImmutableSequentialGraph.END_OF_GRAPH);
+ * future.get();
+ * executor.shutdown();
+ *</pre>
+ */
+
+public class IncrementalImmutableSequentialGraph extends ImmutableSequentialGraph {
+ /** A marker for the end of the graph. */
+ public static long[][] END_OF_GRAPH = new long[0][0];
+
+ /** The number of nodes (known after a traversal). */
+ private long n;
+ /** The queue connecting the add methods and node iterator successor mehotds. */
+ private final ArrayBlockingQueue<long[][]> successorQueue;
+
+ public IncrementalImmutableSequentialGraph() {
+ n = -1;
+ this.successorQueue = new ArrayBlockingQueue<>(100);
+ }
+
+ @Override
+ public long numNodes() {
+ if (n == -1) throw new UnsupportedOperationException("The number of nodes is unknown (you need to complete a traversal)");
+ return n;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ if (n != -1) throw new IllegalStateException();
+ return new NodeIterator() {
+ long i = 0;
+ private long[][] currentSuccessor;
+ private long[][] nextSuccessor;
+ @Override
+ public boolean hasNext() {
+ if (nextSuccessor == END_OF_GRAPH) return false;
+ if (nextSuccessor != null) return true;
+
+ try {
+ nextSuccessor = successorQueue.take();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e.getMessage(), e);
+ }
+
+ final boolean end = nextSuccessor == END_OF_GRAPH;
+ if (end) n = i;
+ return ! end;
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ currentSuccessor = nextSuccessor;
+ nextSuccessor = null;
+ return i++;
+ }
+
+ @Override
+ public long outdegree() {
+ if (currentSuccessor == null) throw new IllegalStateException();
+ return LongBigArrays.length(currentSuccessor);
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (currentSuccessor == null) throw new IllegalStateException();
+ return currentSuccessor;
+ }
+ };
+ }
+
+ /** Adds a new node having as successors contained in the specified big array fragment.
+ *
+ * <p>The fragment must be sorted in increasing order.
+ *
+ * @param successor a big array.
+ * @param offset the first valid entry in <code>successor</code>.
+ * @param length the number of valid entries.
+ */
+ public void add(final long[][] successor, final long offset, final long length) throws InterruptedException {
+ successorQueue.put(LongBigArrays.copy(successor, offset, length));
+ }
+
+ /** Adds a new node having as successors contained in the specified big array.
+ *
+ * <p>The array must be sorted in increasing order.
+ *
+ * @param successor a big array.
+ */
+ public void add(final long[][] successor) throws InterruptedException {
+ successorQueue.put(successor);
+ }
+
+ /** Adds a new node having as successors contained in the specified array fragment.
+ *
+ * <p>The fragment must be sorted in increasing order.
+ *
+ * @param successor an array.
+ * @param offset the first valid entry in <code>successor</code>.
+ * @param length the number of valid entries.
+ */
+ public void add(final long[] successor, final int offset, final int length) throws InterruptedException {
+ add(LongBigArrays.wrap(successor), offset, length);
+ }
+
+ /** Adds a new node having as successors contained in the specified array.
+ *
+ * <p>The array must be sorted in increasing order.
+ *
+ *
+ * @param successor an array.
+ */
+ public void add(final long[] successor) throws InterruptedException {
+ add(LongBigArrays.wrap(successor));
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyIntIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyIntIterator.java
new file mode 100644
index 0000000..7123ab2
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyIntIterator.java
@@ -0,0 +1,45 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A lazy iterator over the integers.
+ *
+ * <p>An instance of this class represent a (skippable) iterator over the integers.
+ * The iterator is exhausted when an implementation-dependent special marker is
+ * returned. This fully lazy architecture halves the number of method
+ * calls w.r.t. Java's eager iterators.
+ */
+
+public interface LazyIntIterator {
+ /** The next integer returned by this iterator, or the special
+ * marker if this iterator is exhausted.
+ *
+ * @return next integer returned by this iterator, or the special
+ * marker if this iterator is exhausted.
+ */
+ public int nextInt();
+
+ /** Skips a given number of elements.
+ *
+ * @param n the number of elements to skip.
+ * @return the number of elements actually skipped (which might
+ * be less than <code>n</code> if this iterator is exhausted).
+ */
+ public int skip(int n);
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyIntIterators.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyIntIterators.java
new file mode 100644
index 0000000..97807d9
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyIntIterators.java
@@ -0,0 +1,277 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+import java.util.NoSuchElementException;
+
+/** A class providing static methods and objects that do useful
+ * things with {@linkplain LazyIntIterator lazy integer iterators}. */
+
+public class LazyIntIterators {
+
+ protected LazyIntIterators() {}
+
+ /** An empty lazy iterator. */
+ public final static LazyIntIterator EMPTY_ITERATOR = new LazyIntIterator() {
+ @Override
+ public int nextInt() { return -1; }
+ @Override
+ public int skip(final int n) { return 0; }
+ };
+
+ /** Unwraps the elements returned by a lazy iterator into an array.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @param array an array.
+ * @return the number of elements unwrapped into <code>array</code> starting from index 0.
+ */
+ public static int unwrap(final LazyIntIterator lazyIntIterator, final int array[]) {
+ int j, t, l = array.length;
+ for(j = 0; j < l && (t = lazyIntIterator.nextInt()) != -1; j++) array[j] = t;
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into an array fragment.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @param array an array.
+ * @param offset the index of the first element ot <code>array</code> to be used.
+ * @param length the maximum number of elements to be unwrapped.
+ * @return the number of elements unwrapped into <code>array</code> starting from index <code>offset</code>.
+ */
+ public static int unwrap(final LazyIntIterator lazyIntIterator, final int array[], final int offset, final int length) {
+ int j, t, l = Math.min(length, array.length - offset);
+ for(j = 0; j < l && (t = lazyIntIterator.nextInt()) != -1; j++) array[offset + j] = t;
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a new array.
+ *
+ * <p>If you need the resulting array to contain the
+ * elements returned by <code>lazyIntIterator</code>, but some more elements set to zero
+ * would cause no harm, consider using {@link #unwrapLoosely(LazyIntIterator)}, which
+ * usually avoids a final call to {@link IntArrays#trim(int[], int)}.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @return an array containing the elements returned by <code>lazyIntIterator</code>.
+ * @see #unwrapLoosely(LazyIntIterator)
+ */
+ public static int[] unwrap(final LazyIntIterator lazyIntIterator) {
+ int array[] = new int[16];
+ int j = 0, t;
+
+ while((t = lazyIntIterator.nextInt()) != -1) {
+ if (j == array.length) array = IntArrays.grow(array, j + 1);
+ array[j++] = t;
+ }
+
+ return IntArrays.trim(array, j);
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a new array that can contain additional entries set to zero.
+ *
+ * <p>If you need the resulting array to contain <em>exactly</em> the
+ * elements returned by <code>lazyIntIterator</code>, consider using {@link #unwrap(LazyIntIterator)}, but this
+ * method avoids a final call to {@link IntArrays#trim(int[], int)}.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @return an array containing the elements returned by <code>lazyIntIterator</code>; note
+ * that in general it might contains some final zeroes beyond the elements returned by <code>lazyIntIterator</code>,
+ * so the number of elements actually written into <code>array</code> must be known externally.
+ * @see #unwrap(LazyIntIterator)
+ */
+ public static int[] unwrapLoosely(final LazyIntIterator lazyIntIterator) {
+ int array[] = new int[16];
+ int j = 0, t;
+
+ while((t = lazyIntIterator.nextInt()) != -1) {
+ if (j == array.length) array = IntArrays.grow(array, j + 1);
+ array[j++] = t;
+ }
+
+ return array;
+ }
+
+ /** A lazy iterator returning the elements of a given array. */
+
+ private static final class ArrayLazyIntIterator implements LazyIntIterator {
+ /** The underlying array. */
+ private final int[] a;
+ /** The number of valid elements in {@link #a}, starting from 0. */
+ private final int length;
+ /** The next element of {@link #a} that will be returned. */
+ private int pos;
+
+ public ArrayLazyIntIterator(final int a[], final int length) {
+ this.a = a;
+ this.length = length;
+ }
+
+ @Override
+ public int nextInt() {
+ if (pos == length) return -1;
+ return a[pos++];
+ }
+
+ @Override
+ public int skip(final int n) {
+ final int toSkip = Math.min(n, length - pos);
+ pos += toSkip;
+ return toSkip;
+ }
+ }
+
+ /** Returns a lazy integer iterator enumerating the given number of elements of an array.
+ *
+ * @param array an array.
+ * @param length the number of elements to enumerate.
+ * @return a lazy integer iterator enumerating the first <code>length</code> elements of <code>array</code>.
+ */
+
+ public static LazyIntIterator wrap(final int array[], final int length) {
+ if (length == 0) return EMPTY_ITERATOR;
+ return new ArrayLazyIntIterator(array, length);
+ }
+
+
+ private static final class LazyLongIteratorLazyIntIterator implements LazyIntIterator {
+ LazyLongIterator iterator;
+
+ private LazyLongIteratorLazyIntIterator(final LazyLongIterator iterator) {
+ this.iterator = iterator;
+ }
+
+ @Override
+ public int nextInt() {
+ return (int)iterator.nextLong();
+ }
+
+ @Override
+ public int skip(int n) {
+ return (int)iterator.skip(n);
+ }
+ }
+
+ /** Returns a lazy integer iterator enumerating the same elements of the underlying {@link LazyLongIterator} cast to integers.
+ *
+ * @param iterator an iterator.
+ * @return a lazy integer iterator enumerating the elements of <code>iterator</code> cast to integers.
+ */
+
+ public static LazyIntIterator wrap(final LazyLongIterator iterator) {
+ return new LazyLongIteratorLazyIntIterator(iterator);
+ }
+
+ /** Returns a lazy integer iterator enumerating the elements of an array.
+ *
+ * @param array an array.
+ * @return a lazy integer iterator enumerating the elements of <code>array</code>.
+ */
+
+ public static LazyIntIterator wrap(final int array[]) {
+ return wrap(array, array.length);
+ }
+
+ /** An adapter from lazy to eager iteration. */
+ private static final class LazyToEagerIntIterator implements IntIterator {
+ /** The underlying lazy iterator. */
+ private final LazyIntIterator lazyIntIterator;
+ /** Whether this iterator has been already advanced, that is, whether {@link #next} is valid. */
+ private boolean advanced;
+ /** The next value to be returned, if {@link #advanced} is true. */
+ private int next;
+
+ public LazyToEagerIntIterator(final LazyIntIterator lazyIntIterator) {
+ this.lazyIntIterator = lazyIntIterator;
+ }
+
+ @Override
+ public boolean hasNext() {
+ if (! advanced) {
+ advanced = true;
+ next = lazyIntIterator.nextInt();
+ }
+ return next != -1;
+ }
+
+ @Override
+ public int nextInt() {
+ if (! hasNext()) throw new NoSuchElementException();
+ advanced = false;
+ return next;
+ }
+
+ @Override
+ public int skip(final int n) {
+ if (n == 0) return 0;
+ final int increment = advanced ? 1 : 0;
+ advanced = false;
+ return lazyIntIterator.skip(n - increment) + increment;
+ }
+ }
+
+ /** Returns an eager {@link IntIterator} enumerating the same elements of
+ * a given lazy integer iterator.
+ *
+ * @param lazyIntIterator a lazy integer iterator.
+ * @return an eager {@link IntIterator} enumerating the same elements of
+ * <code>lazyIntIterator</code>.
+ */
+
+ public static IntIterator eager(final LazyIntIterator lazyIntIterator) {
+ return new LazyToEagerIntIterator(lazyIntIterator);
+ }
+
+
+ private static final class EagerToLazyIntIterator implements LazyIntIterator {
+ private final IntIterator underlying;
+
+
+ public EagerToLazyIntIterator(final IntIterator underlying) {
+ this.underlying = underlying;
+ }
+
+ @Override
+ public int nextInt() {
+ return underlying.hasNext() ? underlying.nextInt() : -1;
+ }
+
+ @Override
+ public int skip(final int n) {
+ return underlying.skip(n);
+ }
+
+ }
+
+ /** Returns a {@link LazyIntIterator} enumerating the same elements of
+ * a given eager integer iterator.
+ *
+ * @param eagerIntIterator an eager integer iterator.
+ * @return a lazy integer iterator enumerating the same elements of
+ * <code>eagerIntIterator</code>.
+ */
+
+ public static LazyIntIterator lazy(final IntIterator eagerIntIterator) {
+ return new EagerToLazyIntIterator(eagerIntIterator);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongIterator.java
new file mode 100644
index 0000000..17901be
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongIterator.java
@@ -0,0 +1,45 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A lazy iterator over longs.
+ *
+ * <p>An instance of this class represent a (skippable) iterator over longs.
+ * The iterator is exhausted when an implementation-dependent special marker is
+ * returned. This fully lazy architecture halves the number of method
+ * calls w.r.t. Java's eager iterators.
+ */
+
+public interface LazyLongIterator {
+ /** The next long returned by this iterator, or the special
+ * marker if this iterator is exhausted.
+ *
+ * @return next long returned by this iterator, or the special
+ * marker if this iterator is exhausted.
+ */
+ public long nextLong();
+
+ /** Skips a given number of elements.
+ *
+ * @param n the number of elements to skip.
+ * @return the number of elements actually skipped (which might
+ * be less than <code>n</code> if this iterator is exhausted).
+ */
+ public long skip(long n);
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongIterators.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongIterators.java
new file mode 100644
index 0000000..a816507
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongIterators.java
@@ -0,0 +1,343 @@
+package it.unimi.dsi.big.webgraph;
+
+import java.util.NoSuchElementException;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.fastutil.ints.IntIterator;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+
+/** A class providing static methods and objects that do useful
+ * things with {@linkplain LazyLongIterator lazy integer iterators}. */
+
+public class LazyLongIterators {
+
+ protected LazyLongIterators() {}
+
+ /** An empty lazy iterator. */
+ public final static LazyLongIterator EMPTY_ITERATOR = new LazyLongIterator() {
+ @Override
+ public long nextLong() { return -1; }
+ @Override
+ public long skip(final long n) { return 0; }
+ };
+
+ /** Unwraps the elements returned by a lazy iterator into an array.
+ *
+ * @param lazyLongIterator a lazy long iterator.
+ * @param array an array.
+ * @return the number of elements unwrapped into <code>array</code> starting from index 0.
+ */
+ public static int unwrap(final LazyLongIterator lazyLongIterator, final long array[]) {
+ int j;
+ long t;
+ final int l = array.length;
+ for(j = 0; j < l && (t = lazyLongIterator.nextLong()) != -1; j++) array[j] = t;
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into an array fragment.
+ *
+ * @param lazyLongIterator a lazy long iterator.
+ * @param array an array.
+ * @param offset the index of the first element ot <code>array</code> to be used.
+ * @param length the maximum number of elements to be unwrapped.
+ * @return the number of elements unwrapped into <code>array</code> starting from index <code>offset</code>.
+ */
+ public static int unwrap(final LazyLongIterator lazyLongIterator, final long array[], final int offset, final int length) {
+ int j;
+ long t;
+ final int l = Math.min(length, array.length - offset);
+ for(j = 0; j < l && (t = lazyLongIterator.nextLong()) != -1; j++) array[offset + j] = t;
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a big array.
+ *
+ * @param lazyLongIterator a lazy long iterator.
+ * @param array an array.
+ * @return the number of elements unwrapped into <code>array</code> starting from index 0.
+ */
+ public static long unwrap(final LazyLongIterator lazyLongIterator, final long array[][]) {
+ long j, t;
+ final long l = LongBigArrays.length(array);
+ for(j = 0; j < l && (t = lazyLongIterator.nextLong()) != -1; j++) LongBigArrays.set(array, j, t);
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a big array fragment.
+ *
+ * @param lazyLongIterator a lazy long iterator.
+ * @param array an array.
+ * @param offset the index of the first element ot <code>array</code> to be used.
+ * @param length the maximum number of elements to be unwrapped.
+ * @return the number of elements unwrapped into <code>array</code> starting from index <code>offset</code>.
+ */
+ public static long unwrap(final LazyLongIterator lazyLongIterator, final long array[][], final long offset, final long length) {
+ long j, t;
+ final long l = Math.min(length, LongBigArrays.length(array) - offset);
+ for(j = 0; j < l && (t = lazyLongIterator.nextLong()) != -1; j++) LongBigArrays.set(array, offset + j, t);
+ return j;
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a new array.
+ *
+ * <p>If you need the resulting array to contain the
+ * elements returned by <code>lazyIntIterator</code>, but some more elements set to zero
+ * would cause no harm, consider using {@link #unwrapLoosely(LazyLongIterator)}, which
+ * usually avoids a final call to {@link IntArrays#trim(int[], int)}.
+ *
+ * @param lazyLongIterator a lazy long iterator.
+ * @return an array containing the elements returned by <code>lazyIntIterator</code>.
+ * @see #unwrapLoosely(LazyLongIterator)
+ */
+ public static long[][] unwrap(final LazyLongIterator lazyLongIterator) {
+ long array[][] = LongBigArrays.newBigArray(16);
+ int j = 0;
+ long t;
+
+ while((t = lazyLongIterator.nextLong()) != -1) {
+ if (j == LongBigArrays.length(array)) array = LongBigArrays.grow(array, j + 1);
+ LongBigArrays.set(array, j++, t);
+ }
+
+ return LongBigArrays.trim(array, j);
+ }
+
+ /** Unwraps the elements returned by a lazy iterator into a new array that can contain additional entries set to zero.
+ *
+ * <p>If you need the resulting array to contain <em>exactly</em> the
+ * elements returned by <code>lazyIntIterator</code>, consider using {@link #unwrap(LazyLongIterator)}, but this
+ * method avoids a final call to {@link IntArrays#trim(int[], int)}.
+ *
+ * @param lazyLongIterator a lazy long iterator.
+ * @return an array containing the elements returned by <code>lazyIntIterator</code>; note
+ * that in general it might contains some final zeroes beyond the elements returned by <code>lazyIntIterator</code>,
+ * so the number of elements actually written into <code>array</code> must be known externally.
+ * @see #unwrap(LazyLongIterator)
+ */
+ public static long[][] unwrapLoosely(final LazyLongIterator lazyLongIterator) {
+ long array[][] = LongBigArrays.newBigArray(16);
+ int j = 0;
+ long t;
+
+ while((t = lazyLongIterator.nextLong()) != -1) {
+ if (j == LongBigArrays.length(array)) array = LongBigArrays.grow(array, j + 1);
+ LongBigArrays.set(array, j++, t);
+ }
+
+ return array;
+ }
+
+ /** A lazy iterator returning the elements of a given array. */
+
+ private static final class ArrayLazyLongIterator implements LazyLongIterator {
+ /** The underlying array. */
+ private final long[] a;
+ /** The number of valid elements in {@link #a}, starting from 0. */
+ private final int length;
+ /** The next element of {@link #a} that will be returned. */
+ private int pos;
+
+ public ArrayLazyLongIterator(final long a[], final int length) {
+ this.a = a;
+ this.length = length;
+ }
+
+ @Override
+ public long nextLong() {
+ if (pos == length) return -1;
+ return a[pos++];
+ }
+
+ @Override
+ public long skip(final long n) {
+ final long toSkip = Math.min(n, length - pos);
+ pos += toSkip;
+ return toSkip;
+ }
+ }
+
+ /** A lazy iterator returning the elements of a given big array. */
+
+ private static final class BigArrayLazyLongIterator implements LazyLongIterator {
+ /** The underlying array. */
+ private final long[][] a;
+ /** The number of valid elements in {@link #a}, starting from 0. */
+ private final long length;
+ /** The next element of {@link #a} that will be returned. */
+ private long pos;
+
+ public BigArrayLazyLongIterator(final long a[][], final long length) {
+ this.a = a;
+ this.length = length;
+ }
+
+ @Override
+ public long nextLong() {
+ if (pos == length) return -1;
+ return LongBigArrays.get(a, pos++);
+ }
+
+ @Override
+ public long skip(final long n) {
+ final long toSkip = Math.min(n, length - pos);
+ pos += toSkip;
+ return toSkip;
+ }
+ }
+
+ /** Returns a lazy long iterator enumerating the given number of elements of an array.
+ *
+ * @param array an array.
+ * @param length the number of elements to enumerate.
+ * @return a lazy integer iterator enumerating the first <code>length</code> elements of <code>array</code>.
+ */
+
+ public static LazyLongIterator wrap(final long array[], final int length) {
+ if (length == 0) return EMPTY_ITERATOR;
+ return new ArrayLazyLongIterator(array, length);
+ }
+
+ /** Returns a lazy long iterator enumerating the given number of elements of a big array.
+ *
+ * @param array an array.
+ * @param length the number of elements to enumerate.
+ * @return a lazy integer iterator enumerating the first <code>length</code> elements of <code>array</code>.
+ */
+
+ public static LazyLongIterator wrap(final long array[][], final long length) {
+ if (length == 0) return EMPTY_ITERATOR;
+ return new BigArrayLazyLongIterator(array, length);
+ }
+
+ /** Returns a lazy integer iterator enumerating the elements of an array.
+ *
+ * @param array an array.
+ * @return a lazy integer iterator enumerating the elements of <code>array</code>.
+ */
+
+ public static LazyLongIterator wrap(final long array[]) {
+ return wrap(array, array.length);
+ }
+
+ /** Returns a lazy integer iterator enumerating the elements of a big array.
+ *
+ * @param array a big array.
+ * @return a lazy integer iterator enumerating the elements of <code>array</code>.
+ */
+
+ public static LazyLongIterator wrap(final long array[][]) {
+ return wrap(array, LongBigArrays.length(array));
+ }
+
+ /** An adapter from lazy to eager iteration. */
+ private static final class LazyToEagerLongIterator implements LongIterator {
+ /** The underlying lazy iterator. */
+ private final LazyLongIterator lazyLongIterator;
+ /** Whether this iterator has been already advanced, that is, whether {@link #next} is valid. */
+ private boolean advanced;
+ /** The next value to be returned, if {@link #advanced} is true. */
+ private long next;
+
+ public LazyToEagerLongIterator(final LazyLongIterator lazyLongIterator) {
+ this.lazyLongIterator = lazyLongIterator;
+ }
+
+ @Override
+ public boolean hasNext() {
+ if (! advanced) {
+ advanced = true;
+ next = lazyLongIterator.nextLong();
+ }
+ return next != -1;
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ advanced = false;
+ return next;
+ }
+
+ @Override
+ public int skip(final int n) {
+ if (n == 0) return 0;
+ final int increment = advanced ? 1 : 0;
+ advanced = false;
+ return (int)(lazyLongIterator.skip(n - increment) + increment);
+ }
+ }
+
+ /** Returns an eager {@link IntIterator} enumerating the same elements of
+ * a given lazy integer iterator.
+ *
+ * @param lazyLongIterator a lazy integer iterator.
+ * @return an eager {@link LongIterator} enumerating the same elements of
+ * <code>lazyLongIterator</code>.
+ */
+
+ public static LongIterator eager(final LazyLongIterator lazyLongIterator) {
+ return new LazyToEagerLongIterator(lazyLongIterator);
+ }
+
+
+ private static final class EagerToLazyLongIterator implements LazyLongIterator {
+ private final LongIterator underlying;
+
+
+ public EagerToLazyLongIterator(final LongIterator underlying) {
+ this.underlying = underlying;
+ }
+
+ @Override
+ public long nextLong() {
+ return underlying.hasNext() ? underlying.nextLong() : -1;
+ }
+
+ @Override
+ public long skip(long n) {
+ long t = 0;
+ int actual;
+
+ while(n > 0) {
+ t += underlying.skip(actual = (int)Math.min(n, 1 << 30));
+ n -= actual;
+ }
+
+ return t;
+ }
+
+ }
+
+ /** Returns a {@link LazyLongIterator} enumerating the same elements of
+ * a given eager integer iterator.
+ *
+ * @param eagerLongIterator an eager integer iterator.
+ * @return a lazy integer iterator enumerating the same elements of
+ * <code>eagerIntIterator</code>.
+ */
+
+ public static LazyLongIterator lazy(final LongIterator eagerLongIterator) {
+ return new EagerToLazyLongIterator(eagerLongIterator);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongSkippableIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongSkippableIterator.java
new file mode 100644
index 0000000..69c85b3
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LazyLongSkippableIterator.java
@@ -0,0 +1,46 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A skippable {@linkplain LazyLongIterator lazy iterator over longs}.
+ *
+ * <p>An instance of this class represent an iterator over longs
+ * that returns elements in increasing order. The iterator makes it possible to {@linkplain #skip(long) skip elements
+ * by <em>value</em>}.
+ */
+
+public interface LazyLongSkippableIterator extends LazyLongIterator {
+ public static final long END_OF_LIST = Long.MAX_VALUE;
+
+ /** Skips to a given element.
+ *
+ * <p>Note that this interface is <em>fragile</em>: after {@link #END_OF_LIST}
+ * has been returned, the behavour of further calls to this method will be
+ * unpredictable.
+ *
+ * @param lowerBound a lower bound to the returned element.
+ * @return if the last returned element is greater than or equal to
+ * {@code lowerBound}, the last returned element; otherwise,
+ * the smallest element greater
+ * than or equal to <code>lowerBound</code> that would be
+ * returned by this iterator, or {@link #END_OF_LIST}
+ * if no such element exists.
+ */
+ public long skipTo(long lowerBound);
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LongIntervalSequenceIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LongIntervalSequenceIterator.java
new file mode 100644
index 0000000..50d5c5a
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/LongIntervalSequenceIterator.java
@@ -0,0 +1,98 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+/** An iterator returning the integers contained in a sequence of intervals. */
+public class LongIntervalSequenceIterator implements LazyLongIterator {
+
+ /** The left extremes. */
+ private final long left[];
+ /** The lengths. */
+ private final long len[];
+ /** The number of remaining intervals (including the current one). It is zero exactly when the iterator is exhausted. */
+ private long remaining;
+ /** The index of the current interval. */
+ private int currInterval;
+ /** The current position in the current interval: the next integer to be output is {@link #currLeft} + {@link #currIndex}. */
+ private int currIndex;
+ /** The left point of the current interval. */
+ private long currLeft;
+
+ /** Creates a new interval-sequence iterator by specifying
+ * arrays of left extremes and lengths. Note that the two arrays are <em>not</em> copied,
+ * so they are supposed not to be changed during the iteration.
+ *
+ * @param left an array containing the left extremes of the intervals generating this iterator.
+ * @param len an array (of the same length as <code>left</code>) containing the number of integers (greater than zero) in each interval.
+ */
+
+ public LongIntervalSequenceIterator(final long left[], final long len[]) {
+ this(left, len, left.length);
+ }
+
+ /** Creates a new interval-sequence iterator by specifying
+ * arrays of left extremes and lengths, and the number of valid entries. Note that the two arrays are <em>not</em> copied,
+ * so they are supposed not to be changed during the iteration.
+ *
+ * @param left an array containing the left extremes of the intervals generating this iterator.
+ * @param len an array (of the same length as <code>left</code>) containing the number of integers (greater than zero) in each interval.
+ * @param n the number of valid entries in <code>left</code> and <code>len</code>.
+ */
+
+ public LongIntervalSequenceIterator(final long left[], final long len[], final int n) {
+ this.left = left;
+ this.len = len;
+ this.remaining = n;
+ if (n != 0) currLeft = left[0];
+ }
+
+ private void advance() {
+ remaining--;
+ if (remaining != 0) currLeft = left[++currInterval];
+ currIndex = 0;
+ }
+
+ @Override
+ public long nextLong() {
+ if (remaining == 0) return -1;
+
+ final long next = currLeft + currIndex++;
+ if (currIndex == len[currInterval]) advance();
+ return next;
+ }
+
+ @Override
+ public long skip(final long n) {
+ long skipped = 0;
+
+ while(skipped < n && remaining != 0) {
+ if (n - skipped < len[currInterval] - currIndex) {
+ currIndex += (n - skipped);
+ return n;
+ }
+ else {
+ skipped += len[currInterval] - currIndex;
+ advance();
+ }
+ }
+
+ return skipped;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/MaskedLongIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/MaskedLongIterator.java
new file mode 100644
index 0000000..47c3bbe
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/MaskedLongIterator.java
@@ -0,0 +1,132 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An iterator returning the element of an underlying iterator but filters
+ * them using a inclusion-exclusion block list.
+ *
+ * <p>A <em>mask</em> is an array of integers. The sum of the values contained in the mask
+ * must not exceed the number of elements returned by the underlying iterator. Moreover, all integers in the mask
+ * must be positive, except possibly for the first one, which may be zero.
+ *
+ * <P>Mask values are interpreted as specifying inclusion-exclusion blocks.
+ * Suppose that the underlying iterator returns <var>N</var> values, and that the mask is
+ * <var>n</var><sub>0</sub>, <var>n</var><sub>1</sub>, &hellip;, <var>n</var><sub>k</sub>.
+ * Then, the first <var>n</var><sub>0</sub> values returned by the underlying iterator must be kept,
+ * the next <var>n</var><sub>1</sub> values must be ignored, the next <var>n</var><sub>2</sub> must be
+ * kept and so on. The last <var>N</var>&minus;(<var>n</var><sub>0</sub>+&hellip;+<var>n</var><sub>k</sub>)
+ * must be kept if <var>k</var> is odd, and must be ignored otherwise.
+ * An instance of this class will returns the kept values only, in increasing order.
+ */
+
+public class MaskedLongIterator implements LazyLongIterator {
+ private final static boolean ASSERTS = false;
+
+ /** The underlying iterator. */
+ private final LazyLongIterator underlying;
+ /** The mask. */
+ private final long mask[];
+ /** The mask. */
+ private final int maskLen;
+ /** This index in mask always represents an exclusion block. */
+ private int currMask;
+ /** How many integers are left in the current inclusion block. If <code>0</code> everything left must be discarded; if
+ * <code>-1</code> all remaining values must be kept. */
+ private long left;
+
+ /** Creates a new masked iterator using a given mask and underlying iterator.
+ *
+ * @param mask a mask, or <code>null</code>, meaning an empty mask (everything is copied).
+ * @param underlying an underlying iterator.
+ */
+
+
+ public MaskedLongIterator(final long mask[], final LazyLongIterator underlying) {
+ this(mask, mask == null ? 0 : mask.length, underlying);
+ }
+
+ /** Creates a new masked iterator using a given mask, mask length and underlying iterator.
+ *
+ * @param mask a mask, or <code>null</code>, meaning an empty mask (everything is copied).
+ * @param maskLen an explicit mask length.
+ * @param underlying an underlying iterator.
+ */
+ public MaskedLongIterator(final long mask[], final int maskLen, final LazyLongIterator underlying) {
+
+ this.mask = mask;
+ this.maskLen = maskLen;
+ this.underlying = underlying;
+
+ if (maskLen != 0) {
+ left = mask[currMask++];
+ advance();
+ }
+ else left = -1;
+ }
+
+ @Override
+ public long nextLong() {
+ if (left == 0) return -1;
+ final long next = underlying.nextLong();
+
+ if (left == -1 || next == -1) return next;
+ if (left > 0) {
+ left--;
+ advance();
+ }
+ return next;
+ }
+
+ private void advance() {
+ if (ASSERTS) assert left != -1;
+ if (left == 0 && currMask < maskLen) {
+ underlying.skip(mask[currMask++]);
+ if (currMask < maskLen) left = mask[currMask++];
+ else left = -1;
+ }
+ }
+
+ @Override
+ public long skip(final long n) {
+ int skipped = 0;
+
+ while(skipped < n && left != 0) {
+ if (left == -1) {
+ final long result = underlying.skip(n - skipped);
+ skipped += result;
+ if (skipped < n) break;
+ }
+ else {
+ if (n - skipped < left) {
+ underlying.skip(n - skipped);
+ left -= (n - skipped);
+ return n;
+ }
+ else {
+ underlying.skip(left);
+ skipped += left;
+ left = 0;
+ advance();
+ }
+ }
+ }
+
+ return skipped;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/MergedLongIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/MergedLongIterator.java
new file mode 100644
index 0000000..0cb2523
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/MergedLongIterator.java
@@ -0,0 +1,113 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.ints.IntIterator;
+
+/** An iterator returning the union of the integers returned by two {@link IntIterator}s.
+ * The two iterators must return integers in an increasing fashion; the resulting
+ * {@link MergedLongIterator} will do the same. Duplicates will be eliminated.
+ */
+
+public class MergedLongIterator implements LazyLongIterator {
+ /** The first component iterator. */
+ private final LazyLongIterator it0;
+ /** The second component iterator. */
+ private final LazyLongIterator it1;
+ /** The maximum number of integers to be still returned. */
+ private long n;
+ /** The last integer returned by {@link #it0}. */
+ private long curr0;
+ /** The last integer returned by {@link #it1}. */
+ private long curr1;
+
+ /** Creates a new merged iterator by merging two given iterators.
+ *
+ * @param it0 the first (monotonically nondecreasing) component iterator.
+ * @param it1 the second (monotonically nondecreasing) component iterator.
+ */
+ public MergedLongIterator(final LazyLongIterator it0, final LazyLongIterator it1) {
+ this (it0, it1, Integer.MAX_VALUE);
+ }
+
+ /** Creates a new merged iterator by merging two given iterators; the resulting iterator will not emit more than <code>n</code> integers.
+ *
+ * @param it0 the first (monotonically nondecreasing) component iterator.
+ * @param it1 the second (monotonically nondecreasing) component iterator.
+ * @param n the maximum number of integers this merged iterator will return.
+ */
+ public MergedLongIterator(final LazyLongIterator it0, final LazyLongIterator it1, final long n) {
+ this.it0 = it0;
+ this.it1 = it1;
+ this.n = n;
+ curr0 = it0.nextLong();
+ curr1 = it1.nextLong();
+ }
+
+ @Override
+ public long nextLong() {
+ if (n == 0 || curr0 == -1 && curr1 == -1) return -1;
+ n--;
+
+ final long result;
+
+ if (curr0 == -1) {
+ result = curr1;
+ curr1 = it1.nextLong();
+ }
+ else if (curr1 == -1) {
+ result = curr0;
+ curr0 = it0.nextLong();
+ }
+ else if (curr0 < curr1) {
+ result = curr0;
+ curr0 = it0.nextLong();
+ }
+ else if (curr0 > curr1) {
+ result = curr1;
+ curr1 = it1.nextLong();
+ }
+ else {
+ result = curr0;
+ curr0 = it0.nextLong();
+ curr1 = it1.nextLong();
+ }
+
+ return result;
+ }
+
+ @Override
+ public long skip(final long s) {
+ long i;
+ for(i = 0; i < s; i++) {
+ if (n == 0 || curr0 == -1 && curr1 == -1) break;
+ n--;
+
+ if (curr0 == -1) curr1 = it1.nextLong();
+ else if (curr1 == -1) curr0 = it0.nextLong();
+ else if (curr0 < curr1) curr0 = it0.nextLong();
+ else if (curr0 > curr1) curr1 = it1.nextLong();
+ else {
+ curr0 = it0.nextLong();
+ curr1 = it1.nextLong();
+ }
+ }
+ return i;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/NodeIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/NodeIterator.java
new file mode 100644
index 0000000..e3eb763
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/NodeIterator.java
@@ -0,0 +1,49 @@
+package it.unimi.dsi.big.webgraph;
+
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+
+/** This interface extends {@link LongIterator} and is used to scan a graph, that is, to read its nodes and their successor lists
+ * sequentially. The {@link #nextLong()} method returns the node that will be scanned. After a call to this method, calling
+ * {@link #successors()} or {@link #successorBigArray()} will return the list of successors.
+ *
+ * <p>Implementing subclasses can override either {@link #successors()} or
+ * {@link #successorBigArray()}, but at least one of them <strong>must</strong> be implemented.
+ */
+
+public abstract class NodeIterator implements LongIterator {
+
+ /** Returns the outdegree of the current node.
+ *
+ * @return the outdegree of the current node.
+ */
+ public abstract long outdegree();
+
+ /** Returns a lazy iterator over the successors of the current node. The iteration terminates
+ * when -1 is returned.
+ *
+ * <P>This implementation just wraps the array returned by {@link #successorBigArray()}.
+ *
+ * @return a lazy iterator over the successors of the current node.
+ */
+ public LazyLongIterator successors() {
+ return LazyLongIterators.wrap(successorBigArray(), outdegree());
+ }
+
+ /** Returns a reference to an array containing the successors of the current node.
+ *
+ * <P>The returned array may contain more entries than the outdegree of the current node.
+ * However, only those with indices from 0 (inclusive) to the outdegree of the current node (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwrap the iterator returned by {@link #successors()}.
+ *
+ * @return an array whose first elements are the successors of the current node; the array must not
+ * be modified by the caller.
+ */
+ public long[][] successorBigArray() {
+ final long[][] successor = LongBigArrays.newBigArray(outdegree());
+ LazyLongIterators.unwrap(successors(), successor);
+ return successor;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ScatteredArcsASCIIGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ScatteredArcsASCIIGraph.java
new file mode 100644
index 0000000..486175a
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ScatteredArcsASCIIGraph.java
@@ -0,0 +1,736 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static it.unimi.dsi.fastutil.HashCommon.bigArraySize;
+import static it.unimi.dsi.fastutil.HashCommon.maxFill;
+import it.unimi.dsi.fastutil.BigArrays;
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.Size64;
+import it.unimi.dsi.fastutil.booleans.BooleanBigArrays;
+import it.unimi.dsi.fastutil.bytes.ByteArrays;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.objects.Object2LongFunction;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.charset.Charset;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+/** An {@link ImmutableGraph} that corresponds to a graph stored as a scattered list of arcs.
+ *
+ * <p>A <em>scattered list of arcs</em> describes a graph in a fairly loose way. Each line
+ * contains an arc specified as two node identifiers separated by whitespace
+ * (but we suggest exactly one TAB character). Sources and targets can be in any order.
+ *
+ * <p>In the <em>standard</em> description, node
+ * identifiers can be in the range [-2<sup>63</sup>..2<sup>63</sup>): they will be remapped
+ * in a compact identifier space by assigning to each newly seen identifier a new node number. The
+ * list of identifiers in order of appearance is available in {@link #ids}.
+ * Lines can be empty, or comments starting with <code>#</code>. Characters following the
+ * target will be discarded with a warning.
+ *
+ * <p><strong>Warning:</strong> Lines not conforming the above specification
+ * will cause an error to be logged, but will be otherwise ignored.
+ *
+ * <p>Alternatively, you can {@linkplain #ScatteredArcsASCIIGraph(InputStream, Object2LongFunction, Charset, long, boolean, boolean, int, File, ProgressLogger) provide}
+ * an {@link Object2LongFunction Object2LongFunction&lt;String>} with default return
+ * value -1 that will be used to map identifiers to node numbers, along with a {@link Charset} to parse lines and
+ * the number of nodes of the graph (which must be a strict upper bound for the largest value returned by the function).
+ *
+ * <p>Additionally, the resulting graph can be symmetrized, and its loops be removed, using
+ * {@linkplain #ScatteredArcsASCIIGraph(InputStream, boolean, boolean, int, File, ProgressLogger) suitable constructor options}.
+ *
+ * <P>This class has no load method, and its main method converts an scattered-arcs representation
+ * directly into a {@link BVGraph}.
+ *
+ * <h2>Using {@link ScatteredArcsASCIIGraph} to convert your data</h2>
+ *
+ * <p>A simple (albeit rather inefficient) way to import data into WebGraph is using ASCII graphs specified by scattered arcs. Suppose you
+ * create the following file, named <code>example.arcs</code>:
+ * <pre>
+ * # My graph
+ * -1 15
+ * 15 2
+ * 2 -1 This will cause a warning to be logged
+ * OOPS! (This will cause an error to be logged)
+ * -1 2
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.ScatteredArcsASCIIGraph example &lt;example.arcs
+ * </pre>
+ * will produce a compressed graph in {@link it.unimi.dsi.webgraph.BVGraph} format
+ * with basename <code>example</code>. The file <code>example.ids</code> will contain
+ * the list of longs -1, 15, 2. The node with identifer -1 will be the node 0 in the
+ * output graph, the node with identifier 15 will be node 1, and the node with identifier 2 will be node 2.
+ * The graph <code>example</code> will thus have three nodes and four arcs (viz., &lt;0,1&gt;, &lt;0,2&gt;, &lt;1,2&gt; and &lt;2,0&gt;).
+ *
+ * <h2>Memory requirements</h2>
+ *
+ * <p>To convert node identifiers to node numbers, instances of this class use a custom map that in the
+ * worst case will require 25.5&times;2<sup><big>&lceil;</big>log(4<var>n</var>/3)<big>&rceil;</big></sup>&nbsp;&le;&nbsp;68<var>n</var> bytes,
+ * where <var>n</var> is the number of distinct identifiers. Storing batches of arc in memory requires 16 bytes per arc.
+ */
+
+
+public class ScatteredArcsASCIIGraph extends ImmutableSequentialGraph {
+ private static final Logger LOGGER = LoggerFactory.getLogger(ScatteredArcsASCIIGraph.class);
+ private final static boolean DEBUG = false;
+
+ /** The default batch size. */
+ public static final int DEFAULT_BATCH_SIZE = 10000000;
+ /** The extension of the identifier file (a binary list of longs). */
+ private static final String IDS_EXTENSION = ".ids";
+ /** The batch graph used to return node iterators. */
+ private final Transform.BatchGraph batchGraph;
+ /** The big-array list of identifiers in order of appearance. */
+ public long[][] ids;
+
+ private static final class Long2LongOpenHashBigMap implements java.io.Serializable, Cloneable, Hash {
+ public static final long serialVersionUID = 0L;
+
+ /** The big array of keys. */
+ public transient long[][] key;
+
+ /** The big array of values. */
+ public transient long[][] value;
+
+ /** The big array telling whether a position is used. */
+ protected transient boolean[][] used;
+
+ /** The acceptable load factor. */
+ protected final float f;
+
+ /** The current table size (always a power of 2). */
+ protected transient long n;
+
+ /** Threshold after which we rehash. It must be the table size times {@link #f}. */
+ protected transient long maxFill;
+
+ /** The mask for wrapping a position counter. */
+ protected transient long mask;
+
+ /** The mask for wrapping a segment counter. */
+ protected transient int segmentMask;
+
+ /** The mask for wrapping a base counter. */
+ protected transient int baseMask;
+
+ /** Number of entries in the set. */
+ protected long size;
+
+ /** Initialises the mask values. */
+ private void initMasks() {
+ mask = n - 1;
+ /*
+ * Note that either we have more than one segment, and in this case all segments are
+ * BigArrays.SEGMENT_SIZE long, or we have exactly one segment whose length is a power of
+ * two.
+ */
+ segmentMask = key[0].length - 1;
+ baseMask = key.length - 1;
+ }
+
+ /**
+ * Creates a new hash big set.
+ *
+ * <p>The actual table size will be the least power of two greater than
+ * <code>expected</code>/<code>f</code>.
+ *
+ * @param expected the expected number of elements in the set.
+ * @param f the load factor.
+ */
+ public Long2LongOpenHashBigMap(final long expected, final float f) {
+ if (f <= 0 || f > 1) throw new IllegalArgumentException("Load factor must be greater than 0 and smaller than or equal to 1");
+ if (n < 0) throw new IllegalArgumentException("The expected number of elements must be nonnegative");
+ this.f = f;
+ n = bigArraySize(expected, f);
+ maxFill = maxFill(n, f);
+ key = LongBigArrays.newBigArray(n);
+ value = LongBigArrays.newBigArray(n);
+ used = BooleanBigArrays.newBigArray(n);
+ initMasks();
+ }
+
+ /**
+ * Creates a new hash big set with initial expected {@link Hash#DEFAULT_INITIAL_SIZE} elements
+ * and {@link Hash#DEFAULT_LOAD_FACTOR} as load factor.
+ */
+
+ public Long2LongOpenHashBigMap() {
+ this(DEFAULT_INITIAL_SIZE, DEFAULT_LOAD_FACTOR);
+ }
+
+ public long put(final long k, final long v) {
+ final long h = it.unimi.dsi.fastutil.HashCommon.murmurHash3(k);
+
+ // The starting point.
+ int displ = (int)(h & segmentMask);
+ int base = (int)((h & mask) >>> BigArrays.SEGMENT_SHIFT);
+
+ // There's always an unused entry.
+ while (used[base][displ]) {
+ if (k == key[base][displ]) {
+ final long oldValue = value[base][displ];
+ value[base][displ] = v;
+ return oldValue;
+ }
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ }
+
+ used[base][displ] = true;
+ key[base][displ] = k;
+ value[base][displ] = v;
+
+ if (++size >= maxFill) rehash(2 * n);
+ return -1;
+ }
+
+ public long get(final long k) {
+ final long h = it.unimi.dsi.fastutil.HashCommon.murmurHash3(k);
+
+ // The starting point.
+ int displ = (int)(h & segmentMask);
+ int base = (int)((h & mask) >>> BigArrays.SEGMENT_SHIFT);
+
+ // There's always an unused entry.
+ while (used[base][displ]) {
+ if (k == key[base][displ]) return value[base][displ];
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ }
+
+ return -1;
+ }
+
+ protected void rehash(final long newN) {
+ final boolean used[][] = this.used;
+ final long key[][] = this.key;
+ final long[][] value = this.value;
+ final boolean newUsed[][] = BooleanBigArrays.newBigArray(newN);
+ final long newKey[][] = LongBigArrays.newBigArray(newN);
+ final long newValue[][] = LongBigArrays.newBigArray(newN);
+ final long newMask = newN - 1;
+ final int newSegmentMask = newKey[0].length - 1;
+ final int newBaseMask = newKey.length - 1;
+
+ int base = 0, displ = 0;
+ long h;
+ long k;
+
+ for (long i = size; i-- != 0;) {
+
+ while (!used[base][displ])
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0));
+
+ k = key[base][displ];
+ h = it.unimi.dsi.fastutil.HashCommon.murmurHash3(k);
+
+ // The starting point.
+ int d = (int)(h & newSegmentMask);
+ int b = (int)((h & newMask) >>> BigArrays.SEGMENT_SHIFT);
+
+ while (newUsed[b][d])
+ b = (b + ((d = (d + 1) & newSegmentMask) == 0 ? 1 : 0)) & newBaseMask;
+
+ newUsed[b][d] = true;
+ newKey[b][d] = k;
+ newValue[b][d] = value[base][displ];
+
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0));
+ }
+
+ this.n = newN;
+ this.key = newKey;
+ this.value = newValue;
+ this.used = newUsed;
+ initMasks();
+ maxFill = maxFill(n, f);
+ }
+
+ public void compact() {
+ int base = 0, displ = 0, b = 0, d = 0;
+ for(long i = size; i-- != 0;) {
+ while (! used[base][displ]) base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ key[b][d] = key[base][displ];
+ value[b][d] = value[base][displ];
+ base = (base + ((displ = (displ + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ b = (b + ((d = (d + 1) & segmentMask) == 0 ? 1 : 0)) & baseMask;
+ }
+ }
+
+ public long size() {
+ return size;
+ }
+ }
+
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is) throws IOException {
+ this(is, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize) throws IOException {
+ this(is, symmetrize, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops) throws IOException {
+ this(is, symmetrize, noLoops, DEFAULT_BATCH_SIZE);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops, final int batchSize) throws IOException {
+ this(is, symmetrize, noLoops, batchSize, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir) throws IOException {
+ this(is, symmetrize, noLoops, batchSize, tempDir, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a standard scattered list of arcs.
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ this(is, null, null, -1, symmetrize, noLoops, batchSize, tempDir, pl);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final long n) throws IOException {
+ this(is, function, charset, n, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final long n, final boolean symmetrize) throws IOException {
+ this(is, function, charset, n, symmetrize, false);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final long n, final boolean symmetrize, final boolean noLoops) throws IOException {
+ this(is, function, charset, n, symmetrize, noLoops, DEFAULT_BATCH_SIZE);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final long n, final boolean symmetrize, final boolean noLoops, final int batchSize) throws IOException {
+ this(is, function, charset, n, symmetrize, noLoops, batchSize, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, final Charset charset, final long n, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir) throws IOException {
+ this(is, function, charset, n, symmetrize, noLoops, batchSize, tempDir, null);
+ }
+
+ /** Creates a scattered-arcs ASCII graph.
+ *
+ * @param is an input stream containing a scattered list of arcs.
+ * @param function an explicitly provided function from string representing nodes to node numbers, or <code>null</code> for the standard behaviour.
+ * @param charset a character set that will be used to read the identifiers passed to <code>function</code>, or <code>null</code> for ISO-8859-1 (used only if <code>function</code> is not <code>null</code>).
+ * @param n the number of nodes of the graph (used only if <code>function</code> is not <code>null</code>).
+ * @param symmetrize the new graph will be forced to be symmetric.
+ * @param noLoops the new graph will have no loops.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ScatteredArcsASCIIGraph(final InputStream is, final Object2LongFunction<? extends CharSequence> function, Charset charset, final long n, final boolean symmetrize, final boolean noLoops, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ @SuppressWarnings("resource")
+ final FastBufferedInputStream fbis = new FastBufferedInputStream(is);
+ Long2LongOpenHashBigMap map = new Long2LongOpenHashBigMap();
+
+ long numNodes = -1;
+ if (charset == null) charset = Charset.forName("ISO-8859-1");
+
+ int j;
+ long[] source = new long[batchSize], target = new long[batchSize];
+ ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ if (pl != null) {
+ pl.itemsName = "arcs";
+ pl.start("Creating sorted batches...");
+ }
+
+ j = 0;
+ long pairs = 0; // Number of pairs
+ byte[] array = new byte[1024];
+ for(long line = 1; ; line++) {
+ int start = 0, len;
+ while((len = fbis.readLine(array, start, array.length - start, FastBufferedInputStream.ALL_TERMINATORS)) == array.length - start) {
+ start += len;
+ array = ByteArrays.grow(array, array.length + 1);
+ };
+
+ if (len == -1) break; // EOF
+
+ final int lineLength = start + len;
+
+ if (DEBUG) System.err.println("Reading line " + line + "... (" + new String(array, 0, lineLength, charset) + ")");
+
+ // Skip whitespace at the start of the line.
+ int offset = 0;
+ while(offset < lineLength && array[offset] >= 0 && array[offset] <= ' ') offset++;
+
+ if (offset == lineLength) {
+ if (DEBUG) System.err.println("Skipping line " + line + "...");
+ continue; // Whitespace line
+ }
+
+ if (array[0] == '#') continue;
+
+ // Scan source id.
+ start = offset;
+ while(offset < lineLength && (array[offset] < 0 || array[offset] > ' ')) offset++;
+
+ long s;
+
+ if (function == null) {
+ final long sl;
+ try {
+ sl = getLong(array, start, offset - start);
+ }
+ catch(RuntimeException e) {
+ // Discard up to the end of line
+ LOGGER.error("Error at line " + line + ": " + e.getMessage());
+ continue;
+ }
+
+ s = map.get(sl);
+ if (s == -1) map.put(sl, s = (int)map.size());
+
+ if (DEBUG) System.err.println("Parsed source at line " + line + ": " + sl + " => " + s);
+ }
+ else {
+ final String ss = new String(array, start, offset - start, charset);
+ s = function.getLong(ss);
+ if (s == -1) {
+ LOGGER.warn("Unknown source identifier " + ss + " at line " + line);
+ continue;
+ }
+ if (s < 0 || s >= n) throw new IllegalArgumentException("Source node number out of range for node " + ss + ": " + s);
+ if (DEBUG) System.err.println("Parsed target at line " + line + ": " + ss + " => " + s);
+ }
+
+ if (DEBUG) System.err.println("Parsed source at line " + line + ": " + s + " => " + s);
+
+ // Skip whitespace between identifiers.
+ while(offset < lineLength && array[offset] >= 0 && array[offset] <= ' ') offset++;
+
+ if (offset == lineLength) {
+ LOGGER.error("Error at line " + line + ": no target");
+ continue;
+ }
+
+ // Scan target id.
+ start = offset;
+ while(offset < lineLength && (array[offset] < 0 || array[offset] > ' ')) offset++;
+
+ long t;
+
+ if (function == null) {
+ final long tl;
+ try {
+ tl = getLong(array, start, offset - start);
+ }
+ catch(RuntimeException e) {
+ // Discard up to the end of line
+ LOGGER.error("Error at line " + line + ": " + e.getMessage());
+ continue;
+ }
+
+ t = map.get(tl);
+ if (t == -1) map.put(tl, t = (int)map.size());
+
+ if (DEBUG) System.err.println("Parsed target at line " + line + ": " + tl + " => " + t);
+ }
+ else {
+ final String ts = new String(array, start, offset - start, charset);
+ t = function.getLong(ts);
+ if (t == -1) {
+ LOGGER.warn("Unknown target identifier " + ts + " at line " + line);
+ continue;
+ }
+
+ if (t < 0 || t >= n) throw new IllegalArgumentException("Target node number out of range for node " + ts + ": " + t);
+ if (DEBUG) System.err.println("Parsed target at line " + line + ": " + ts + " => " + t);
+ }
+
+ // Skip whitespace after target.
+ while(offset < lineLength && array[offset] >= 0 && array[offset] <= ' ') offset++;
+
+ if (offset < lineLength) LOGGER.warn("Trailing characters ignored at line " + line);
+
+ if (s != t || ! noLoops) {
+ source[j] = s;
+ target[j++] = t;
+
+ if (j == batchSize) {
+ pairs += Transform.processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+
+ if (symmetrize && s != t) {
+ source[j] = t;
+ target[j++] = s;
+ if (j == batchSize) {
+ pairs += Transform.processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+
+ if (pl != null) pl.lightUpdate();
+ }
+ }
+
+ if (j != 0) pairs += Transform.processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ Transform.logBatches(batches, pairs, pl);
+ }
+
+ numNodes = (int)map.size();
+ source = null;
+ target = null;
+
+ map.compact();
+
+ final File keyFile = File.createTempFile(ScatteredArcsASCIIGraph.class.getSimpleName(), "keys", tempDir);
+ keyFile.deleteOnExit();
+ final File valueFile = File.createTempFile(ScatteredArcsASCIIGraph.class.getSimpleName(), "values", tempDir);
+ valueFile.deleteOnExit();
+
+ BinIO.storeLongs(map.key, 0, map.size(), keyFile);
+ BinIO.storeLongs(map.value, 0, map.size(), valueFile);
+
+ map = null;
+
+ long[][] key = BinIO.loadLongsBig(keyFile);
+ keyFile.delete();
+ long[][] value = BinIO.loadLongsBig(valueFile);
+ valueFile.delete();
+
+ if (function == null) {
+ long[][] result = LongBigArrays.newBigArray(numNodes);
+ for(int s = value.length; s-- != 0;) {
+ final long[] k = key[s];
+ final long[] v = value[s];
+ for(int d = k.length; d-- != 0;) LongBigArrays.set(result, v[d], k[d]);
+ }
+ ids = result;
+ }
+
+ key = null;
+ value = null;
+
+ batchGraph = new Transform.BatchGraph(function == null ? numNodes : n, pairs, batches);
+ }
+
+ private final static long getLong(final byte[] array, int offset, int length) {
+ if (length == 0) throw new NumberFormatException("Empty number");
+ int sign = 1;
+ if(array[offset] == '-') {
+ sign = -1;
+ offset++;
+ length--;
+ }
+
+ long value = 0;
+ for(int i = 0; i < length; i++) {
+ final byte digit = array[offset + i];
+ if (digit < '0' || digit > '9') throw new NumberFormatException("Not a digit: " + (char)digit);
+ value *= 10;
+ value += digit - '0';
+ }
+
+ return sign * value;
+ }
+
+ @Override
+ public long numNodes() {
+ if (batchGraph == null) throw new UnsupportedOperationException("The number of nodes is unknown (you need to exhaust the input)");
+ return batchGraph.numNodes();
+ }
+
+ @Override
+ public long numArcs() {
+ if (batchGraph == null) throw new UnsupportedOperationException("The number of arcs is unknown (you need to exhaust the input)");
+ return batchGraph.numArcs();
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final long from) {
+ return batchGraph.nodeIterator(from);
+ }
+
+ @SuppressWarnings("unchecked")
+ public static void main(String args[]) throws IllegalArgumentException, SecurityException, IOException, JSAPException, ClassNotFoundException {
+ String basename;
+ SimpleJSAP jsap = new SimpleJSAP(ScatteredArcsASCIIGraph.class.getName(), "Converts a scattered list of arcs from standard input into a BVGraph. The list of" +
+ "identifiers in order of appearance will be saved with extension \"" + IDS_EXTENSION + "\", unless a translation function has been specified.",
+ new Parameter[] {
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new FlaggedOption("batchSize", JSAP.INTSIZE_PARSER, Integer.toString(DEFAULT_BATCH_SIZE), JSAP.NOT_REQUIRED, 's', "batch-size", "The maximum size of a batch, in arcs."),
+ new FlaggedOption("tempDir", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'T', "temp-dir", "A directory for all temporary batch files."),
+ new Switch("symmetrize", 'S', "symmetrize", "Force the output graph to be symmetric."),
+ new Switch("noLoops", 'L', "no-loops", "Remove loops from the output graph."),
+ new FlaggedOption("function", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "function", "A serialised function from strings to longs that will be used to translate identifiers to node numbers."),
+ new FlaggedOption("charset", JSAP.STRING_PARSER, "ISO-8859-1", JSAP.NOT_REQUIRED, 'C', "charset", "The charset used to read the list of arcs."),
+ new FlaggedOption("n", JSAP.LONGSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "n", "The number of nodes of the graph (only if you specified a function)."),
+ new FlaggedOption("comp", JSAP.STRING_PARSER, null, JSAP.NOT_REQUIRED, 'c', "comp", "A compression flag (may be specified several times).").setAllowMultipleDeclarations(true),
+ new FlaggedOption("windowSize", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_WINDOW_SIZE), JSAP.NOT_REQUIRED, 'w', "window-size", "Reference window size (0 to disable)."),
+ new FlaggedOption("maxRefCount", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_MAX_REF_COUNT), JSAP.NOT_REQUIRED, 'm', "max-ref-count", "Maximum number of backward references (-1 for ∞)."),
+ new FlaggedOption("minIntervalLength", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_MIN_INTERVAL_LENGTH), JSAP.NOT_REQUIRED, 'i', "min-interval-length", "Minimum length of an interval (0 to disable)."),
+ new FlaggedOption("zetaK", JSAP.INTEGER_PARSER, String.valueOf(BVGraph.DEFAULT_ZETA_K), JSAP.NOT_REQUIRED, 'k', "zeta-k", "The k parameter for zeta-k codes."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the output graph"),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ basename = jsapResult.getString("basename");
+
+ int flags = 0;
+ for(String compressionFlag: jsapResult.getStringArray("comp")) {
+ try {
+ flags |= BVGraph.class.getField(compressionFlag).getInt(BVGraph.class);
+ }
+ catch (Exception notFound) {
+ throw new JSAPException("Compression method " + compressionFlag + " unknown.");
+ }
+ }
+
+ final int windowSize = jsapResult.getInt("windowSize");
+ final int zetaK = jsapResult.getInt("zetaK");
+ int maxRefCount = jsapResult.getInt("maxRefCount");
+ if (maxRefCount == -1) maxRefCount = Integer.MAX_VALUE;
+ final int minIntervalLength = jsapResult.getInt("minIntervalLength");
+
+ Object2LongFunction<String> function = null;
+ Charset charset = null;
+ long n = -1;
+ if (jsapResult.userSpecified("function")) {
+ function = (Object2LongFunction<String>)BinIO.loadObject(jsapResult.getString("function"));
+ charset = Charset.forName(jsapResult.getString("charset"));
+ if (function.size() == -1) {
+ if (! jsapResult.userSpecified("n")) throw new IllegalArgumentException("You must specify a graph size if you specify a translation function that does not return the size of the key set.");
+ n = jsapResult.getLong("n");
+ }
+ else n = function instanceof Size64 ? ((Size64)function).size64() : function.size();
+ }
+
+ File tempDir = null;
+ if (jsapResult.userSpecified("tempDir")) tempDir = new File(jsapResult.getString("tempDir"));
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ ScatteredArcsASCIIGraph graph = new ScatteredArcsASCIIGraph(System.in, function, charset, n, jsapResult.userSpecified("symmetrize"), jsapResult.userSpecified("noLoops"), jsapResult.getInt("batchSize"), tempDir, pl);
+ BVGraph.store(graph, basename, windowSize, maxRefCount, minIntervalLength, zetaK, flags, pl);
+ if (function == null) BinIO.storeLongs(graph.ids, basename + IDS_EXTENSION);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ShiftedByOneArcListASCIIGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ShiftedByOneArcListASCIIGraph.java
new file mode 100644
index 0000000..ed48c30
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/ShiftedByOneArcListASCIIGraph.java
@@ -0,0 +1,100 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.io.InputStream;
+
+/** An {@link ArcListASCIIGraph} with fixed shift -1. Very useful to read
+ * graphs specified as pairs of arcs with node numbering starting from one.
+ *
+ * <h2>Using {@link ArcListASCIIGraph} with MatLab-like sparse matrix files</h2>
+ *
+ * <p>The main intended usage of this class is that of interfacing easily with MatLab-like
+ * sparse matrix files. Note that for this to happen it is necessary to shift by one all
+ * indices. Assume you have a file named <code>example.arcs</code>:
+ * <pre>
+ * 1 2
+ * 2 3
+ * 3 2
+ * </pre>
+ * Then, the command
+ * <pre>
+ * java it.unimi.dsi.webgraph.BVGraph -1 -g ShiftedByOneArcListASCIIGraph dummy bvexample &lt;example.arcs
+ * </pre>
+ * will generate a {@link BVGraph} as expected (e.g, there is an arc from 0 to 1).
+ */
+
+public final class ShiftedByOneArcListASCIIGraph extends ArcListASCIIGraph {
+
+ protected ShiftedByOneArcListASCIIGraph(InputStream is, int shift) throws NumberFormatException, IOException {
+ super(is, shift);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadOffline(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename) throws IOException {
+ return load(basename);
+ }
+
+ public static ImmutableGraph loadMapped(CharSequence basename, ProgressLogger unused) throws IOException {
+ return load(basename);
+ }
+
+ public static ArcListASCIIGraph loadOnce(final InputStream is) throws IOException {
+ return new ArcListASCIIGraph(is, -1);
+ }
+
+ public static ImmutableGraph load(CharSequence basename) throws IOException {
+ return load(basename, null);
+ }
+
+ public static ImmutableGraph load(CharSequence basename, ProgressLogger unused) throws IOException {
+ return ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(loadOnce(new FastBufferedInputStream(new FileInputStream(basename.toString()))))).immutableView());
+ }
+
+ public static void store(ImmutableGraph graph, CharSequence basename, ProgressLogger unused) throws IOException {
+ store(graph, basename, 1);
+ }
+
+ public static void main(final String arg[]) throws NoSuchMethodException {
+ throw new NoSuchMethodException("Please use the main method of " + ArcListASCIIGraph.class.getSimpleName() + ".");
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/Stats.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/Stats.java
new file mode 100644
index 0000000..565acfb
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/Stats.java
@@ -0,0 +1,312 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.big.webgraph.algo.StronglyConnectedComponents;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.BufferedWriter;
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.lang.reflect.InvocationTargetException;
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.math.RoundingMode;
+import java.util.Arrays;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Computes basic statistical data about a given graph.
+ *
+ * <p>This class loads a graph of given basename, and computes the following data:
+ * <ol>
+ * <li>an ASCII file containing the <em>outdegree distribution</em>; line <var>n</var> contains the number of nodes with outdegree <var>n</var> (starting from 0);
+ * <li>an ASCII file containing the <em>indegree distribution</em>; line <var>n</var> contains the number of nodes with indegree <var>n</var> (starting from 0);
+ * <li>a property file containing several self-descriptive data, such as the average indegree/outdegree (which should be identical), sample nodes with minimum
+ * or maximum indegree/outdegree, and so on; additional data will be computed if files produced by {@link StronglyConnectedComponents} are present
+ * with the same basename (in particular, buckets and component sizes);
+ * <li>if files produced by {@link StronglyConnectedComponents} are present with the same basename, an ASCII file containing the <em>distribution
+ * of strongly connected components</em>, specified as a sequence of lines each containing a pair of integer &lt;<var>size</var>, <var>count</var>&gt;.
+ * </ol>
+ *
+ * <p>The graph is loaded {@linkplain ImmutableGraph#loadOffline(CharSequence) offline}: the only memory allocated is for indegree count (one integer
+ * per node) and for storing the actual counts (one integer per indegree/outdegree value).
+ */
+
+public class Stats {
+
+ private Stats() {}
+
+ /** Computes stats for the given graph using a single traversal, storing the results in files with given basename.
+ *
+ * @param graph the graph to be examined.
+ * @param buckets the set of buckets of this graph, or <code>null</code> if this information is not available.
+ * @param sccsize the sizes of strongly connected components as a big array, or <code>null</code> if this information is not available.
+ * @param resultsBasename the basename for result files (see the {@linkplain Stats class description}).
+ * @param pl a progress logger.
+ */
+
+ public static void run(final ImmutableGraph graph, final LongArrayBitVector buckets, final long[][] sccsize, final CharSequence resultsBasename, final ProgressLogger pl) throws IOException {
+ run(graph, buckets, sccsize, resultsBasename, false, pl);
+ }
+
+ /** Computes stats for the given graph using a single traversal, storing the results in files with given basename.
+ *
+ * @param graph the graph to be examined.
+ * @param buckets the set of buckets of this graph, or <code>null</code> if this information is not available.
+ * @param sccsize the sizes of strongly connected components as a big array, or <code>null</code> if this information is not available.
+ * @param resultsBasename the basename for result files (see the {@linkplain Stats class description}).
+ * @param saveDegrees if true, indegrees and outdegrees will be saved.
+ * @param pl a progress logger.
+ */
+
+ public static void run(final ImmutableGraph graph, final LongArrayBitVector buckets, final long[][] sccsize, final CharSequence resultsBasename, final boolean saveDegrees, final ProgressLogger pl) throws IOException {
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ long[] count = LongArrays.EMPTY_ARRAY;
+ long[][] successor;
+ long[][] indegree = LongBigArrays.newBigArray(graph.numNodes());
+ long curr, d, maxd = 0, maxNode = 0, mind = Long.MAX_VALUE, minNode = 0, dangling = 0, terminal = 0, loops = 0;
+ long numArcs = 0, numGaps = 0;
+ BigInteger totLoc = BigInteger.ZERO, totGap = BigInteger.ZERO;
+
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Scanning...");
+ }
+
+ final PrintWriter outdegreesPrintWriter = saveDegrees ? new PrintWriter(new BufferedWriter(new FileWriter(resultsBasename + ".outdegrees"))) : null;
+
+ /** Statistics for the gap width of successor lists (exponentially binned). */
+ final long[] successorDeltaStats = new long[64];
+
+ for(long i = graph.numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextLong();
+ d = nodeIterator.outdegree();
+ if (d > Integer.MAX_VALUE) throw new IllegalArgumentException("The current implementation of " + Stats.class.getSimpleName() + " cannot handle outdegrees beyond 2^31.");
+ if (saveDegrees) outdegreesPrintWriter.println(d);
+ successor = nodeIterator.successorBigArray();
+
+ if (d > 1) {
+ totGap = totGap.add(BigInteger.valueOf(LongBigArrays.get(successor, d - 1) - LongBigArrays.get(successor, 0)));
+ totGap = totGap.add(BigInteger.valueOf(Fast.int2nat(LongBigArrays.get(successor, 0) - curr)));
+ numGaps += d ;
+ }
+ for(long s = d; s-- != 0;) {
+ totLoc = totLoc.add(BigInteger.valueOf(Math.abs(LongBigArrays.get(successor, s) - curr)));
+
+ if (LongBigArrays.get(successor, s) != curr) successorDeltaStats[Fast.mostSignificantBit(Math.abs(curr - LongBigArrays.get(successor, s)))]++;
+ else loops++;
+
+ LongBigArrays.incr(indegree, LongBigArrays.get(successor, s));
+ }
+
+ if (d == 0) {
+ dangling++;
+ terminal++;
+ }
+
+ if (d == 1 && LongBigArrays.get(successor, 0) == curr) terminal++;
+
+ if (d < mind) {
+ mind = d;
+ minNode = curr;
+ }
+
+ if (d > maxd){
+ maxd = d;
+ maxNode = curr;
+ }
+
+ numArcs += d;
+
+ if (d >= count.length) count = LongArrays.grow(count, (int)d + 1);
+ count[(int)d]++;
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+
+ if (saveDegrees) {
+ outdegreesPrintWriter.close();
+ TextIO.storeLongs(indegree, resultsBasename + ".indegrees");
+ }
+
+ @SuppressWarnings("resource")
+ PrintWriter properties = new PrintWriter(new FileWriter(resultsBasename + ".stats"));
+ properties.println("nodes=" + graph.numNodes());
+ properties.println("arcs=" + numArcs);
+ properties.println("loops=" + loops);
+ properties.println("successoravggap=" + new BigDecimal(totGap).divide(BigDecimal.valueOf(Math.max(1, numGaps)), 3, RoundingMode.HALF_EVEN));
+ properties.println("avglocality=" + new BigDecimal(totLoc).divide(BigDecimal.valueOf(Math.max(1, numArcs)), 3, RoundingMode.HALF_EVEN));
+ properties.println("minoutdegree=" + mind);
+ properties.println("maxoutdegree=" + maxd);
+ properties.println("minoutdegreenode=" + minNode);
+ properties.println("maxoutdegreenode=" + maxNode);
+ properties.println("dangling=" + dangling);
+ properties.println("terminal=" + terminal);
+ properties.println("percdangling=" + 100.0 * dangling / graph.numNodes());
+ properties.println("avgoutdegree=" + (double)numArcs/graph.numNodes());
+
+ int l;
+ for(l = successorDeltaStats.length; l-- != 0;) if (successorDeltaStats[l] != 0) break;
+ StringBuilder s = new StringBuilder();
+ double totLogDelta = 0;
+ long numDelta = 0;
+
+ long g = 1;
+ for(int i = 0; i <= l; i++) {
+ if (i != 0) s.append(',');
+ s.append(successorDeltaStats[i]);
+ numDelta += successorDeltaStats[i];
+ totLogDelta += (Fast.log2(g * 2 + g + 1) - 1) * successorDeltaStats[i];
+ g *= 2;
+ }
+
+ properties.println("successorlogdeltastats=" + s.toString());
+ properties.println("successoravglogdelta=" + (numDelta == 0 ? "0" : new BigDecimal(totLogDelta).divide(BigDecimal.valueOf(Math.max(1, numDelta * 2)), 3, RoundingMode.HALF_EVEN).toString()));
+
+ TextIO.storeLongs(count, 0, (int)(maxd + 1), resultsBasename + ".outdegree");
+
+ Arrays.fill(count, 0);
+
+ maxd = maxNode = minNode = 0;
+ mind = Long.MAX_VALUE;
+ long n = graph.numNodes();
+ for(int i = indegree.length; i-- != 0;) {
+ final long[] t = indegree[i];
+ for(int p = t.length; p-- != 0;) {
+ n--;
+ d = t[p];
+ if (d > Integer.MAX_VALUE) throw new IllegalArgumentException("The current implementation of " + Stats.class.getSimpleName() + " cannot handle indegrees beyond 2^31.");
+ if (d >= count.length) count = LongArrays.grow(count, (int)(d + 1));
+ if (d < mind) {
+ mind = d;
+ minNode = n;
+ }
+
+ if (d > maxd){
+ maxd = d;
+ maxNode = n;
+ }
+
+ count[(int)d]++;
+ }
+ }
+
+ TextIO.storeLongs(count, 0, (int)(maxd + 1), resultsBasename + ".indegree");
+
+ properties.println("minindegree=" + mind);
+ properties.println("maxindegree=" + maxd);
+ properties.println("minindegreenode=" + minNode);
+ properties.println("maxindegreenode=" + maxNode);
+ properties.println("avgindegree=" + (double)numArcs/graph.numNodes());
+
+ if (buckets != null) {
+ final long numBuckets = buckets.count();
+ properties.println("buckets=" + numBuckets);
+ properties.println("percbuckets=" + 100.0 * numBuckets / graph.numNodes());
+ }
+
+ if (sccsize != null) {
+ LongBigArrays.quickSort(sccsize);
+ final long m = LongBigArrays.length(sccsize);
+ long maxSize = LongBigArrays.get(sccsize, m - 1);
+ long minSize = LongBigArrays.get(sccsize, 0);
+
+ properties.println("sccs=" + m);
+ properties.println("maxsccsize=" + maxSize);
+ properties.println("percmaxscc=" + 100.0 * maxSize / graph.numNodes());
+ properties.println("minsccsize=" + minSize);
+ properties.println("percminscc=" + 100.0 * minSize / graph.numNodes());
+
+ PrintWriter pw = new PrintWriter(resultsBasename + ".sccdistr");
+ long current = maxSize;
+ int c = 0;
+ for(int i = sccsize.length; i-- != 0;) {
+ final long[] t = sccsize[i];
+ for(int j = t.length; j-- != 0;) {
+ if(t[j] != current) {
+ pw.println(current + "\t" + c);
+ current = t[j];
+ c = 0;
+ }
+ c++;
+ }
+ }
+
+ pw.println(current + "\t" + c);
+
+ pw.flush();
+ pw.close();
+ }
+
+ properties.close();
+ }
+
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException, IOException, ClassNotFoundException {
+ SimpleJSAP jsap = new SimpleJSAP(Stats.class.getName(), "Computes statistical data of a given graph.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("saveDegrees", 's', "save-degrees", "Save indegrees and outdegrees in text format."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final String basename = jsapResult.getString("basename");
+ final String resultsBasename = jsapResult.userSpecified("resultsBasename") ? jsapResult.getString("resultsBasename") : basename;
+
+ final ProgressLogger pl = new ProgressLogger();
+ pl.logInterval = jsapResult.getLong("logInterval");
+
+ final ImmutableGraph graph;
+
+ if (graphClass != null) graph = (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class).invoke(null, basename);
+ else graph = ImmutableGraph.loadOffline(basename, pl);
+
+ final LongArrayBitVector buckets = (LongArrayBitVector)(new File(basename + ".buckets").exists() ? BinIO.loadObject(basename + ".buckets") : null);
+ final long[][] sccsize = new File(basename + ".sccsizes").exists() ? BinIO.loadLongsBig(basename + ".sccsizes") : null;
+
+ run(graph, buckets, sccsize, resultsBasename, jsapResult.getBoolean("saveDegrees"), pl);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/Transform.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/Transform.java
new file mode 100644
index 0000000..94eda9f
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/Transform.java
@@ -0,0 +1,2060 @@
+package it.unimi.dsi.big.webgraph;
+
+import java.io.DataInput;
+import java.io.File;
+import java.io.IOException;
+import java.lang.reflect.Field;
+import java.lang.reflect.InvocationTargetException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi, Massimo Santini and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableSequentialGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.big.webgraph.labelling.BitStreamArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.Label;
+import it.unimi.dsi.big.webgraph.labelling.LabelMergeStrategy;
+import it.unimi.dsi.big.webgraph.labelling.LabelSemiring;
+import it.unimi.dsi.big.webgraph.labelling.Labels;
+import it.unimi.dsi.big.webgraph.labelling.UnionArcLabelledImmutableGraph;
+import it.unimi.dsi.fastutil.BigSwapper;
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.Swapper;
+import it.unimi.dsi.fastutil.ints.IntComparator;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.Long2ObjectOpenHashMap;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongComparator;
+import it.unimi.dsi.fastutil.longs.LongHeapSemiIndirectPriorityQueue;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+import it.unimi.dsi.fastutil.objects.ObjectArrays;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XorShift1024StarRandom;
+
+/** Static methods that manipulate immutable graphs.
+ *
+ * <P>Most methods take an {@link
+ * it.unimi.dsi.big.webgraph.ImmutableGraph} (along with some other data, that
+ * depend on the kind of transformation), and return another {@link
+ * it.unimi.dsi.big.webgraph.ImmutableGraph} that represents the transformed
+ * version.
+ */
+
+public class Transform {
+ private static final Logger LOGGER = LoggerFactory.getLogger(Transform.class);
+
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ private static final boolean ASSERTS = false;
+
+ private Transform() {}
+
+
+ /** Provides a method to accept or reject an arc.
+ *
+ * <P>Note that arc filters are usually stateless. Thus, their declaration
+ * should comprise a static singleton (e.g., {@link Transform#NO_LOOPS}).
+ */
+ public interface ArcFilter {
+
+ /**
+ * Tells if the arc <code>(i,j)</code> has to be accepted or not.
+ *
+ * @param i the source of the arc.
+ * @param t the destination of the arc.
+ * @return if the arc has to be accepted.
+ */
+ public boolean accept(long i, long t);
+ }
+
+ /** Provides a method to accept or reject a labelled arc.
+ *
+ * <P>Note that arc filters are usually stateless. Thus, their declaration
+ * should comprise a static singleton (e.g., {@link Transform#NO_LOOPS}).
+ */
+ public interface LabelledArcFilter {
+
+ /**
+ * Tells if the arc <code>(i,j)</code> with label <code>label</code> has to be accepted or not.
+ *
+ * @param i the source of the arc.
+ * @param j the destination of the arc.
+ * @param label the label of the arc.
+ * @return if the arc has to be accepted.
+ */
+ public boolean accept(long i, long j, Label label);
+ }
+
+ /** An arc filter that rejects loops. */
+ final static private class NoLoops implements ArcFilter, LabelledArcFilter {
+ private NoLoops() {}
+ /** Returns true if the two arguments differ.
+ *
+ * @return <code>i != j</code>.
+ */
+ @Override
+ public boolean accept(final long i, final long j) {
+ return i != j;
+ }
+ @Override
+ public boolean accept(long i, long j, Label label) {
+ return i != j;
+ }
+ }
+
+ /** An arc filter that only accepts arcs whose endpoints belong to the same
+ * (if the parameter <code>keepOnlySame</code> is true) or to
+ * different (if <code>keepOnlySame</code> is false) classes.
+ * Classes are specified by one long per node, read from a given file in {@link DataInput} format. */
+ public final static class NodeClassFilter implements ArcFilter, LabelledArcFilter {
+ private final boolean keepOnlySame;
+ private final long[][] nodeClass;
+
+ /** Creates a new instance.
+ *
+ * @param classFile name of the class file.
+ * @param keepOnlySame whether to keep nodes in the same class.
+ */
+ public NodeClassFilter(final String classFile, final boolean keepOnlySame) {
+ try {
+ nodeClass = BinIO.loadLongsBig(classFile);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ this.keepOnlySame = keepOnlySame;
+ }
+
+ /** Creates a new instance.
+ *
+ * <p>This constructor has the same arguments as {@link #NodeClassFilter(String,boolean)},
+ * but it can be used with an {@link ObjectParser}.
+ *
+ * @param classFile name of the class file.
+ * @param keepOnlySame whether to keep nodes in the same class.
+ */
+ public NodeClassFilter(String classFile, String keepOnlySame) {
+ this(classFile, Boolean.parseBoolean(keepOnlySame));
+ }
+
+ @Override
+ public boolean accept(final long i, final long j) {
+ return keepOnlySame == (LongBigArrays.get(nodeClass, i) == LongBigArrays.get(nodeClass, j));
+ }
+
+ @Override
+ public boolean accept(long i, long j, Label label) {
+ return keepOnlySame == (LongBigArrays.get(nodeClass, i) == LongBigArrays.get(nodeClass, j));
+ }
+ }
+
+ /** An arc filter that rejects arcs whose well-known attribute has a value smaller than a given threshold. */
+ final static public class LowerBound implements LabelledArcFilter {
+ private final int lowerBound;
+
+ public LowerBound(final int lowerBound) {
+ this.lowerBound = lowerBound;
+ }
+
+ public LowerBound(String lowerBound) {
+ this(Integer.parseInt(lowerBound));
+ }
+ /** Returns true if the integer value associated to the well-known attribute of the label is larger than the threshold.
+ *
+ * @return true if <code>label.{@link Label#getInt()}</code> is larger than the threshold.
+ */
+ @Override
+ public boolean accept(long i, long j, Label label) {
+ return label.getInt() >= lowerBound;
+ }
+ }
+
+
+ /** A singleton providing an arc filter that rejects loops. */
+ final static public NoLoops NO_LOOPS = new NoLoops();
+
+ /** A class that exposes an immutable graph viewed through a filter. */
+ private static final class FilteredImmutableGraph extends ImmutableGraph {
+ private final ArcFilter filter;
+ private final ImmutableGraph graph;
+ private long[][] succ;
+ private long currentNode = -1;
+
+ private FilteredImmutableGraph(ArcFilter filter, ImmutableGraph graph) {
+ this.filter = filter;
+ this.graph = graph;
+ }
+
+ @Override
+ public long numNodes() {
+ return graph.numNodes();
+ }
+
+ @Override
+ public FilteredImmutableGraph copy() {
+ return new FilteredImmutableGraph(filter, graph.copy());
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return graph.randomAccess();
+ }
+
+ @Override
+ public LazyLongIterator successors(final long x) {
+ return new AbstractLazyLongIterator() {
+
+ @SuppressWarnings("hiding")
+ private final LazyLongIterator succ = graph.successors(x);
+
+ @Override
+ public long nextLong() {
+ long t;
+ while ((t = succ.nextLong()) != -1) if (filter.accept(x, t)) return t;
+ return -1;
+ }
+ };
+ }
+
+ @Override
+ public long[][] successorBigArray(long x) {
+ if (currentNode != x) {
+ succ = LazyLongIterators.unwrap(successors(x));
+ currentNode = x ;
+ }
+ return succ;
+ }
+
+ @Override
+ public long outdegree(long x) {
+ if (currentNode != x) {
+ succ = successorBigArray(x);
+ currentNode = x;
+ }
+ return LongBigArrays.length(succ);
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ return new NodeIterator() {
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ @SuppressWarnings("hiding")
+ long[][] succ = LongBigArrays.EMPTY_BIG_ARRAY;
+ long outdegree = -1;
+
+ @Override
+ public long outdegree() {
+ if (outdegree == -1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ public long nextLong() {
+ final long currNode = nodeIterator.nextLong();
+ final long oldOutdegree = nodeIterator.outdegree();
+ LazyLongIterator oldSucc = nodeIterator.successors();
+ succ = LongBigArrays.ensureCapacity(succ, oldOutdegree, 0);
+ outdegree = 0;
+ for(long i = 0; i < oldOutdegree; i++) {
+ final long s = oldSucc.nextLong();
+ if (filter.accept(currNode, s)) LongBigArrays.set(succ, outdegree++, s);
+ }
+ return currNode;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (outdegree == -1) throw new IllegalStateException();
+ return succ;
+ }
+
+
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+ };
+ }
+ }
+
+ /** A class that exposes an arc-labelled immutable graph viewed through a filter. */
+ private static final class FilteredArcLabelledImmutableGraph extends ArcLabelledImmutableGraph {
+ private final LabelledArcFilter filter;
+ private final ArcLabelledImmutableGraph graph;
+ private long[][] succ;
+ private Label[][] label;
+ private long currentNode = -1;
+
+ private final class FilteredLabelledArcIterator extends AbstractLazyLongIterator implements LabelledArcIterator {
+ private final long x;
+
+ private final LabelledArcIterator successors;
+
+ private FilteredLabelledArcIterator(final long x, final LabelledArcIterator successors) {
+ this.x = x;
+ this.successors = successors;
+ }
+
+ @Override
+ public long nextLong() {
+ long t;
+ while ((t = successors.nextLong()) != -1) if (filter.accept(x, t, successors.label())) return t;
+ return -1;
+ }
+
+ @Override
+ public Label label() {
+ return successors.label();
+ }
+ }
+
+ private FilteredArcLabelledImmutableGraph(LabelledArcFilter filter, ArcLabelledImmutableGraph graph) {
+ this.filter = filter;
+ this.graph = graph;
+ }
+
+ @Override
+ public long numNodes() {
+ return graph.numNodes();
+ }
+
+ @Override
+ public ArcLabelledImmutableGraph copy() {
+ return new FilteredArcLabelledImmutableGraph(filter, graph.copy());
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return graph.randomAccess();
+ }
+
+ @Override
+ public Label prototype() {
+ return graph.prototype();
+ }
+
+ @Override
+ public LabelledArcIterator successors(final long x) {
+ return new FilteredLabelledArcIterator(x, graph.successors(x));
+ }
+
+ @Override
+ public long[][] successorBigArray(final long x) {
+ if (currentNode != x) outdegree(x); // Just to fill the cache
+ return succ;
+ }
+
+ @Override
+ public Label[][] labelBigArray(final long x) {
+ if (currentNode != x) outdegree(x); // Just to fill the cache
+ return label;
+ }
+
+ @Override
+ public long outdegree(final long x) {
+ if (currentNode != x) {
+ succ = super.successorBigArray(x);
+ label = super.labelBigArray(x);
+ currentNode = x;
+ }
+ return LongBigArrays.length(succ);
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return new ArcLabelledNodeIterator() {
+ final ArcLabelledNodeIterator nodeIterator = graph.nodeIterator();
+ private long currNode = -1;
+ private long outdegree = -1;
+
+ @Override
+ public long outdegree() {
+ if (currNode == -1) throw new IllegalStateException();
+ if (outdegree == -1) {
+ long d = 0;
+ LabelledArcIterator successors = successors();
+ while(successors.nextLong() != -1) d++;
+ outdegree = d;
+ }
+ return outdegree;
+ }
+
+ @Override
+ public long nextLong() {
+ outdegree = -1;
+ return currNode = nodeIterator.nextLong();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return nodeIterator.hasNext();
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ return new FilteredLabelledArcIterator(currNode, nodeIterator.successors());
+ }
+ };
+ }
+
+ }
+
+ /** Returns a graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @param ignored a progress logger, which will be ignored.
+ * @return the filtered graph.
+ */
+ public static ImmutableGraph filterArcs(final ImmutableGraph graph, final ArcFilter filter, final ProgressLogger ignored) {
+ return filterArcs(graph, filter);
+ }
+
+ /** Returns a labelled graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a labelled graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @param ignored a progress logger, which will be ignored.
+ * @return the filtered graph.
+ */
+ public static ArcLabelledImmutableGraph filterArcs(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final ProgressLogger ignored) {
+ return filterArcs(graph, filter);
+ }
+
+ /** Returns a graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @return the filtered graph.
+ */
+ public static ImmutableGraph filterArcs(final ImmutableGraph graph, final ArcFilter filter) {
+ return new FilteredImmutableGraph(filter, graph);
+ }
+
+ /** Returns a labelled graph with some arcs eventually stripped, according to the given filter.
+ *
+ * @param graph a labelled graph.
+ * @param filter the filter (telling whether each arc should be kept or not).
+ * @return the filtered graph.
+ */
+ public static ArcLabelledImmutableGraph filterArcs(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter) {
+ return new FilteredArcLabelledImmutableGraph(filter, graph);
+ }
+
+
+ /** Returns a symmetrized graph using an offline transposition.
+ *
+ * @param g the source graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @return the symmetrized graph.
+ * @see #symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ImmutableGraph symmetrizeOffline(final ImmutableGraph g, final int batchSize) throws IOException {
+ return symmetrizeOffline(g, batchSize, null, null);
+ }
+
+ /** Returns a symmetrized graph using an offline transposition.
+ *
+ * @param g the source graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return the symmetrized graph.
+ * @see #symmetrizeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ImmutableGraph symmetrizeOffline(final ImmutableGraph g, final int batchSize, final File tempDir) throws IOException {
+ return symmetrizeOffline(g, batchSize, tempDir, null);
+ }
+
+ /** Returns a symmetrized graph using an offline transposition.
+ *
+ * <P>The symmetrized graph is the union of a graph and of its transpose. This method will
+ * compute the transpose on the fly using {@link #transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)}.
+ *
+ * @param g the source graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return the symmetrized graph.
+ */
+ public static ImmutableGraph symmetrizeOffline(final ImmutableGraph g, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+ return union(g, transposeOffline(g, batchSize, tempDir, pl));
+ }
+
+ /* Provides a sequential immutable graph by merging batches on the fly. */
+ public final static class BatchGraph extends ImmutableSequentialGraph {
+ private final ObjectArrayList<File> batches;
+ private final long n;
+ private final long numArcs;
+
+ public BatchGraph(final long n, final long m, final ObjectArrayList<File> batches) {
+ this.batches = batches;
+ this.n = n;
+ this.numArcs = m;
+ }
+
+ @Override
+ public long numNodes() { return n; }
+ @Override
+ public long numArcs() {
+ if (numArcs == -1) throw new UnsupportedOperationException();
+ return numArcs;
+ }
+
+ @Override
+ public BatchGraph copy() {
+ return this;
+ }
+
+ @Override
+ public NodeIterator nodeIterator(long from) {
+ if (from != 0) throw new UnsupportedOperationException();
+ else return nodeIterator();
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ final long[] refArray = new long[batches.size()];
+ final InputBitStream[] batchIbs = new InputBitStream[refArray.length];
+ final int[] inputStreamLength = new int[refArray.length];
+ final long[] prevTarget = new long[refArray.length];
+ Arrays.fill(prevTarget, -1);
+ // The indirect queue used to merge the batches.
+ final LongHeapSemiIndirectPriorityQueue queue = new LongHeapSemiIndirectPriorityQueue(refArray);
+
+ try {
+ // We open all files and load the first element into the reference array.
+ for(int i = 0; i < refArray.length; i++) {
+ batchIbs[i] = new InputBitStream(batches.get(i));
+ try {
+ inputStreamLength[i] = batchIbs[i].readDelta();
+ refArray[i] = batchIbs[i].readLongDelta();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ queue.enqueue(i);
+ }
+
+ return new NodeIterator() {
+ /** The last returned node. */
+ private long last = -1;
+ /** The outdegree of the current node (valid if {@link #last} is not -1). */
+ private long outdegree;
+ /** The successors of the current node (valid if {@link #last} is not -1);
+ * only the first {@link #outdegree} entries are meaningful. */
+ private long[][] successor = LongBigArrays.EMPTY_BIG_ARRAY;
+
+ @Override
+ public long outdegree() {
+ if (last == -1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return last < n - 1;
+ }
+
+ @Override
+ public long nextLong() {
+ last++;
+ long d = 0;
+ int i;
+
+ try {
+ /* We extract elements from the queue as long as their target is equal
+ * to last. If during the process we exhaust a batch, we close it. */
+
+ while(! queue.isEmpty() && refArray[i = queue.first()] == last) {
+ successor = LongBigArrays.grow(successor, d + 1);
+ LongBigArrays.set(successor, d, prevTarget[i] += batchIbs[i].readLongDelta() + 1);
+ if (--inputStreamLength[i] == 0) {
+ queue.dequeue();
+ batchIbs[i].close();
+ batchIbs[i] = null;
+ }
+ else {
+ // We read a new source and update the queue.
+ final long sourceDelta = batchIbs[i].readLongDelta();
+ if (sourceDelta != 0) {
+ refArray[i] += sourceDelta;
+ prevTarget[i] = -1;
+ queue.changed();
+ }
+ }
+ d++;
+ }
+ // Neither quicksort nor heaps are stable, so we reestablish order here.
+ LongBigArrays.quickSort(successor, 0, d);
+ if (d != 0) {
+ long p = 0;
+ long pSuccessor = LongBigArrays.get(successor, p);
+
+ for(long j = 1; j < d; j++) {
+ final long s = LongBigArrays.get(successor, j);
+ if (pSuccessor != s) {
+ LongBigArrays.set(successor, ++p, s);
+ pSuccessor = s;
+ }
+ }
+ d = p + 1;
+ }
+ }
+ catch(IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ outdegree = d;
+ return last;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (last == -1) throw new IllegalStateException();
+ return successor;
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(InputBitStream ibs: batchIbs) if (ibs != null) ibs.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+
+ };
+ }
+ catch(IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(File f : batches) f.delete();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ };
+
+ /** Sorts the given source and target arrays w.r.t. the target and stores them in a temporary file.
+ *
+ * @param n the index of the last element to be sorted (exclusive).
+ * @param source the source array.
+ * @param target the target array.
+ * @param tempDir a temporary directory where to store the sorted arrays, or <code>null</code>
+ * @param batches a list of files to which the batch file will be added.
+ * @return the number of pairs in the batch (might be less than <code>n</code> because duplicates are eliminated).
+ */
+
+ public static int processBatch(final int n, final long[] source, final long[] target, final File tempDir, final List<File> batches) throws IOException {
+
+ LongArrays.parallelQuickSort(source, target, 0, n);
+
+ final File batchFile = File.createTempFile("batch", ".bitstream", tempDir);
+ batchFile.deleteOnExit();
+ batches.add(batchFile);
+ final OutputBitStream batch = new OutputBitStream(batchFile);
+ int u = 0;
+ if (n != 0) {
+ // Compute unique pairs
+ u = 1;
+ for(int i = n - 1; i-- != 0;) if (source[i] != source[i + 1] || target[i] != target[i + 1]) u++;
+ batch.writeDelta(u);
+ long prevSource = source[0];
+ batch.writeLongDelta(prevSource);
+ batch.writeLongDelta(target[0]);
+
+ for(int i = 1; i < n; i++) {
+ if (source[i] != prevSource) {
+ batch.writeLongDelta(source[i] - prevSource);
+ batch.writeLongDelta(target[i]);
+ prevSource = source[i];
+ }
+ else if (target[i] != target[i - 1]) {
+ // We don't write duplicate pairs
+ batch.writeDelta(0);
+ if (ASSERTS) assert target[i] > target[i - 1] : target[i] + "<=" + target[i - 1];
+ batch.writeLongDelta(target[i] - target[i - 1] - 1);
+ }
+ }
+ }
+ else batch.writeDelta(0);
+
+ batch.close();
+ return u;
+ }
+
+ /** Sorts the given source and target arrays w.r.t. the target and stores them in two temporary files.
+ * An additional positionable input bit stream is provided that contains labels, starting at given positions.
+ * Labels are also written onto the appropriate file.
+ *
+ * @param n the index of the last element to be sorted (exclusive).
+ * @param source the source array.
+ * @param target the target array.
+ * @param start the array containing the bit position (within the given input stream) where the label of the arc starts.
+ * @param labelBitStream the positionable bit stream containing the labels.
+ * @param tempDir a temporary directory where to store the sorted arrays.
+ * @param batches a list of files to which the batch file will be added.
+ * @param labelBatches a list of files to which the label batch file will be added.
+ */
+
+ private static void processTransposeBatch(final int n, final long[] source, final long[] target, final long[] start,
+ final InputBitStream labelBitStream, final File tempDir, final List<File> batches, final List<File> labelBatches,
+ final Label prototype) throws IOException {
+ it.unimi.dsi.fastutil.Arrays.quickSort(0, n, new IntComparator() {
+ @Override
+ public int compare(int x, int y) {
+ long t = source[x] - source[y];
+ if (t != 0) return t < 0 ? -1 : 1;
+ t = target[x] - target[y];
+ return t == 0 ? 0 : t < 0 ? -1 : 1;
+ }
+ }, new Swapper() {
+ @Override
+ public void swap(int x, int y) {
+ long t = source[x];
+ source[x] = source[y];
+ source[y] = t;
+ t = target[x];
+ target[x] = target[y];
+ target[y] = t;
+ long u = start[x];
+ start[x] = start[y];
+ start[y] = u;
+ }
+ });
+
+ final File batchFile = File.createTempFile("batch", ".bitstream", tempDir);
+ batchFile.deleteOnExit();
+ batches.add(batchFile);
+ final OutputBitStream batch = new OutputBitStream(batchFile);
+
+ if (n != 0) {
+ // Compute unique pairs
+ batch.writeDelta(n);
+ long prevSource = source[0];
+ batch.writeLongDelta(prevSource);
+ batch.writeLongDelta(target[0]);
+
+ for(int i = 1; i < n; i++) {
+ if (source[i] != prevSource) {
+ batch.writeLongDelta(source[i] - prevSource);
+ batch.writeLongDelta(target[i]);
+ prevSource = source[i];
+ }
+ else if (target[i] != target[i - 1]) {
+ // We don't write duplicate pairs
+ batch.writeDelta(0);
+ batch.writeLongDelta(target[i] - target[i - 1] - 1);
+ }
+ }
+ }
+ else batch.writeDelta(0);
+
+ batch.close();
+
+ final File labelFile = File.createTempFile("label-", ".bits", tempDir);
+ labelFile.deleteOnExit();
+ labelBatches.add(labelFile);
+ final OutputBitStream labelObs = new OutputBitStream(labelFile);
+ for (int i = 0; i < n; i++) {
+ labelBitStream.position(start[i]);
+ prototype.fromBitStream(labelBitStream, source[i]);
+ prototype.toBitStream(labelObs, target[i]);
+ }
+ labelObs.close();
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+
+ public static ImmutableSequentialGraph transposeOffline(final ImmutableGraph g, final int batchSize) throws IOException {
+ return transposeOffline(g, batchSize, null);
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ImmutableGraph, int, File, ProgressLogger)
+ */
+
+ public static ImmutableSequentialGraph transposeOffline(final ImmutableGraph g, final int batchSize, final File tempDir) throws IOException {
+ return transposeOffline(g, batchSize, tempDir, null);
+ }
+
+ /** Returns an immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * <p>This method creates a number of sorted batches on disk containing arcs
+ * represented by a pair of gap-compressed long integers ordered by target
+ * and returns an {@link ImmutableGraph}
+ * that can be accessed only using a {@link ImmutableGraph#nodeIterator() node iterator}. The node iterator
+ * merges on the fly the batches, providing a transposed graph. The files are marked with
+ * {@link File#deleteOnExit()}, so they should disappear when the JVM exits. An additional safety-net
+ * finaliser tries to delete the batches, too.
+ *
+ * <p>Note that each {@link NodeIterator} returned by the transpose requires opening all batches at the same time.
+ * The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case,
+ * another safety-net finaliser closes all files when the iterator is collected.
+ *
+ * <P>This method can process {@linkplain ImmutableGraph#loadOffline(CharSequence) offline graphs}.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ */
+
+ public static ImmutableSequentialGraph transposeOffline(final ImmutableGraph g, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+
+ long i, currNode;
+ int j;
+ final long[] source = new long[batchSize] , target = new long[batchSize];
+ final ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ final long n = g.numNodes();
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start("Creating sorted batches...");
+ }
+
+ final NodeIterator nodeIterator = g.nodeIterator();
+
+ // Phase one: we scan the graph, accumulating pairs <source,target> and dumping them on disk.
+ LazyLongIterator succ;
+ long m = 0; // Number of arcs, computed on the fly.
+ j = 0;
+ for(i = n; i-- != 0;) {
+ currNode = nodeIterator.nextLong();
+ final long d = nodeIterator.outdegree();
+ succ = nodeIterator.successors();
+ m += d;
+
+ for(long k = 0; k < d; k++) {
+ target[j] = currNode;
+ source[j++] = succ.nextLong();
+
+ if (j == batchSize) {
+ processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (j != 0) processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, m, pl);
+ }
+
+ return new BatchGraph(n, m, batches);
+ }
+
+ protected static void logBatches(final ObjectArrayList<File> batches, final long pairs, final ProgressLogger pl) {
+ long length = 0;
+ for(File f : batches) length += f.length();
+ pl.logger().info("Created " + batches.size() + " batches using " + Util.format((double)Byte.SIZE * length / pairs) + " bits/arc.");
+ }
+
+ /** Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via a big array.
+ *
+ * @param g an immutable graph.
+ * @param map the transformation map.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @return an immutable, sequentially accessible graph obtained by transforming <code>g</code>.
+ * @see #mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)
+ */
+ public static ImmutableSequentialGraph mapOffline(final ImmutableGraph g, final long map[][], final int batchSize) throws IOException {
+ return mapOffline(g, map, batchSize, null);
+ }
+
+ /** Returns an immutable graph obtained by remapping offline the graph nodes through a partial function specified via a big array.
+ *
+ * @param g an immutable graph.
+ * @param map the transformation map.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return an immutable, sequentially accessible graph obtained by transforming <code>g</code>.
+ * @see #mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)
+ */
+ public static ImmutableSequentialGraph mapOffline(final ImmutableGraph g, final long map[][], final int batchSize, final File tempDir) throws IOException {
+ return mapOffline(g, map, batchSize, tempDir, null);
+ }
+
+ /** Remaps the the graph nodes through a partial function specified via
+ * a big array, using an offline method.
+ *
+ * <p>More specifically, <code>LongBigArrays.length(map)=g.numNodes()</code>,
+ * and <code>LongBigArrays.get(map, i)</code> is the new name of node <code>i</code>, or -1 if the node
+ * should not be mapped. If some
+ * index appearing in <code>map</code> is larger than or equal to the
+ * number of nodes of <code>g</code>, the resulting graph is enlarged correspondingly.
+ *
+ * <P>Arcs are mapped in the obvious way; in other words, there is
+ * an arc from <code>LongBigArrays.get(map, i)</code> to <code>LongBigArrays.get(map, j)</code> (both nonnegative)
+ * in the transformed
+ * graph iff there was an arc from <code>i</code> to <code>j</code>
+ * in the original graph.
+ *
+ * <P>Note that if <code>map</code> is bijective, the returned graph
+ * is simply a permutation of the original graph.
+ * Otherwise, the returned graph is obtained by deleting nodes mapped
+ * to -1, quotienting nodes w.r.t. the equivalence relation induced by the fibres of <code>map</code>
+ * and renumbering the result, always according to <code>map</code>.
+ *
+ * See {@link #transposeOffline(ImmutableGraph, int, File, ProgressLogger)} for
+ * implementation and performance-related details.
+ *
+ * @param g an immutable graph.
+ * @param map the transformation map.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an immutable, sequentially accessible graph obtained by transforming <code>g</code>.
+ */
+ public static ImmutableSequentialGraph mapOffline(final ImmutableGraph g, final long map[][], final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+
+ int j;
+ long d, mappedCurrNode;
+ final long[] source = new long[batchSize] , target = new long[batchSize];
+ final ObjectArrayList<File> batches = new ObjectArrayList<>();
+
+ long max = -1;
+ for(int i = map.length; i-- != 0;) {
+ final long[] t = map[i];
+ for(int k = t.length; k-- != 0;) max = Math.max(max, t[k]);
+ }
+
+ final long mapLength = LongBigArrays.length(map);
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = mapLength;
+ pl.start("Creating sorted batches...");
+ }
+
+ final NodeIterator nodeIterator = g.nodeIterator();
+
+ // Phase one: we scan the graph, accumulating pairs <map[source],map[target]> (if we have to) and dumping them on disk.
+ LazyLongIterator succ;
+ j = 0;
+ long pairs = 0; // Number of pairs
+ for(long i = g.numNodes(); i-- != 0;) {
+ mappedCurrNode = LongBigArrays.get(map, nodeIterator.nextLong());
+ if (mappedCurrNode != -1) {
+ d = nodeIterator.outdegree();
+ succ = nodeIterator.successors();
+
+ for(long k = 0; k < d; k++) {
+ final long s = succ.nextLong();
+ if (LongBigArrays.get(map, s) != -1) {
+ source[j] = mappedCurrNode;
+ target[j++] = LongBigArrays.get(map, s);
+
+ if (j == batchSize) {
+ pairs += processBatch(batchSize, source, target, tempDir, batches);
+ j = 0;
+ }
+ }
+ }
+ }
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ // At this point the number of nodes is always known (a traversal has been completed).
+ if (g.numNodes() != mapLength) throw new IllegalArgumentException("Mismatch between number of nodes (" + g.numNodes() + ") and map length (" + mapLength + ")");
+
+ if (j != 0) pairs += processBatch(j, source, target, tempDir, batches);
+
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, pairs, pl);
+ }
+
+ return new BatchGraph(max + 1, -1, batches);
+ }
+
+
+
+ /** Returns an arc-labelled immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method,
+ * plus an additional {@link FastByteArrayOutputStream} needed to store all the labels for a batch.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ArcLabelledImmutableGraph transposeOffline(final ArcLabelledImmutableGraph g, final int batchSize) throws IOException {
+ return transposeOffline(g, batchSize, null);
+ }
+
+ /** Returns an arc-labelled immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method,
+ * plus an additional {@link FastByteArrayOutputStream} needed to store all the labels for a batch.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ * @see #transposeOffline(ArcLabelledImmutableGraph, int, File, ProgressLogger)
+ */
+ public static ArcLabelledImmutableGraph transposeOffline(final ArcLabelledImmutableGraph g, final int batchSize, final File tempDir) throws IOException {
+ return transposeOffline(g, batchSize, tempDir, null);
+ }
+
+
+ /** Returns an arc-labelled immutable graph obtained by reversing all arcs in <code>g</code>, using an offline method.
+ *
+ * <p>This method creates a number of sorted batches on disk containing arcs
+ * represented by a pair of long integers in {@link java.io.DataInput} format ordered by target
+ * and returns an {@link ImmutableGraph}
+ * that can be accessed only using a {@link ImmutableGraph#nodeIterator() node iterator}. The node iterator
+ * merges on the fly the batches, providing a transposed graph. The files are marked with
+ * {@link File#deleteOnExit()}, so they should disappear when the JVM exits. An additional safety-net
+ * finaliser tries to delete the batches, too. As far as labels are concerned, they are temporarily stored in
+ * an in-memory bit stream, that is permuted when it is stored on the disk
+ *
+ * <p>Note that each {@link NodeIterator} returned by the transpose requires opening all batches at the same time.
+ * The batches are closed when they are exhausted, so a complete scan of the graph closes them all. In any case,
+ * another safety-net finaliser closes all files when the iterator is collected.
+ *
+ * <P>This method can process {@linkplain ArcLabelledImmutableGraph#loadOffline(CharSequence) offline graphs}. Note that
+ * no method to transpose on-line arc-labelled graph is provided currently.
+ *
+ * @param g an immutable graph.
+ * @param batchSize the number of integers in a batch; two arrays of integers of this size will be allocated by this method,
+ * plus an additional {@link FastByteArrayOutputStream} needed to store all the labels for a batch.
+ * @param tempDir a temporary directory for the batches, or <code>null</code> for {@link File#createTempFile(java.lang.String, java.lang.String)}'s choice.
+ * @param pl a progress logger.
+ * @return an immutable, sequentially accessible graph obtained by transposing <code>g</code>.
+ */
+
+ public static ArcLabelledImmutableGraph transposeOffline(final ArcLabelledImmutableGraph g, final int batchSize, final File tempDir, final ProgressLogger pl) throws IOException {
+
+ int j;
+ long d, currNode;
+ final long[] source = new long[batchSize] , target = new long[batchSize];
+ final long[] start = new long[batchSize];
+ FastByteArrayOutputStream fbos = new FastByteArrayOutputStream();
+ OutputBitStream obs = new OutputBitStream(fbos);
+ final ObjectArrayList<File> batches = new ObjectArrayList<>(), labelBatches = new ObjectArrayList<>();
+ final Label prototype = g.prototype().copy();
+
+ final long n = g.numNodes();
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.start("Creating sorted batches...");
+ }
+
+ final ArcLabelledNodeIterator nodeIterator = g.nodeIterator();
+
+ // Phase one: we scan the graph, accumulating pairs <source,target> and dumping them on disk.
+ LabelledArcIterator succ;
+ Label[][] label = null;
+ long m = 0; // Number of arcs, computed on the fly.
+ j = 0;
+ for(long i = n; i-- != 0;) {
+ currNode = nodeIterator.nextLong();
+ d = nodeIterator.outdegree();
+ succ = nodeIterator.successors();
+ label = nodeIterator.labelBigArray();
+ m += d;
+
+ for(long k = 0; k < d; k++) {
+ source[j] = succ.nextLong();
+ target[j] = currNode;
+ start[j] = obs.writtenBits();
+ ObjectBigArrays.get(label, k).toBitStream(obs, currNode);
+ j++;
+
+ if (j == batchSize) {
+ obs.flush();
+ processTransposeBatch(batchSize, source, target, start, new InputBitStream(fbos.array), tempDir, batches, labelBatches, prototype);
+ fbos = new FastByteArrayOutputStream();
+ obs = new OutputBitStream(fbos); //ALERT here we should re-use
+ j = 0;
+ }
+ }
+
+
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (j != 0) {
+ obs.flush();
+ processTransposeBatch(j, source, target, start, new InputBitStream(fbos.array), tempDir, batches, labelBatches, prototype);
+ }
+
+ if (pl != null) {
+ pl.done();
+ logBatches(batches, m, pl);
+ }
+
+ final long numArcs = m;
+
+ // Now we return an immutable graph whose nodeIterator() merges the batches on the fly.
+ return new ArcLabelledImmutableSequentialGraph() {
+ @Override
+ public long numNodes() { return n; }
+ @Override
+ public long numArcs() { return numArcs; }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ final long[] refArray = new long[batches.size()];
+ final InputBitStream[] batchIbs = new InputBitStream[refArray.length];
+ final InputBitStream[] labelInputBitStream = new InputBitStream[refArray.length];
+ final int[] inputStreamLength = new int[refArray.length];
+ final long[] prevTarget = new long[refArray.length];
+ Arrays.fill(prevTarget, -1);
+ // The indirect queue used to merge the batches.
+ final LongHeapSemiIndirectPriorityQueue queue = new LongHeapSemiIndirectPriorityQueue(refArray);
+
+ try {
+ // We open all files and load the first element into the reference array.
+ for(int i = 0; i < refArray.length; i++) {
+ batchIbs[i] = new InputBitStream(batches.get(i));
+ labelInputBitStream[i] = new InputBitStream(labelBatches.get(i));
+ try {
+ inputStreamLength[i] = batchIbs[i].readDelta();
+ refArray[i] = batchIbs[i].readLongDelta();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ queue.enqueue(i);
+ }
+
+ return new ArcLabelledNodeIterator() {
+ /** The last returned node. */
+ private long last = -1;
+ /** The outdegree of the current node (valid if {@link #last} is not -1). */
+ private long outdegree;
+ /** The successors of the current node (valid if {@link #last} is not -1);
+ * only the first {@link #outdegree} entries are meaningful. */
+ private long[][] successor = LongBigArrays.EMPTY_BIG_ARRAY;
+ /** The labels of the arcs going out of the current node (valid if {@link #last} is not -1);
+ * only the first {@link #outdegree} entries are meaningful. */
+ @SuppressWarnings("hiding")
+ private Label[][] label = new Label[0][0];
+
+ @Override
+ public long outdegree() {
+ if (last == -1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return last < n - 1;
+ }
+
+ @Override
+ public long nextLong() {
+ last++;
+ int d = 0;
+ int i;
+
+ try {
+ /* We extract elements from the queue as long as their target is equal
+ * to last. If during the process we exhaust a batch, we close it. */
+
+ while(! queue.isEmpty() && refArray[i = queue.first()] == last) {
+ successor = LongBigArrays.grow(successor, d + 1);
+ LongBigArrays.set(successor, d, prevTarget[i] += batchIbs[i].readLongDelta() + 1);
+ label = ObjectBigArrays.grow(label, d + 1);
+ final Label l = prototype.copy();
+ ObjectBigArrays.set(label, d, l);
+ l.fromBitStream(labelInputBitStream[i], last);
+
+ if (--inputStreamLength[i] == 0) {
+ queue.dequeue();
+ batchIbs[i].close();
+ labelInputBitStream[i].close();
+ batchIbs[i] = null;
+ labelInputBitStream[i] = null;
+ }
+ else {
+ // We read a new source and update the queue.
+ final long sourceDelta = batchIbs[i].readLongDelta();
+ if (sourceDelta != 0) {
+ refArray[i] += sourceDelta;
+ prevTarget[i] = -1;
+ queue.changed();
+ }
+ }
+ d++;
+ }
+ // Neither quicksort nor heaps are stable, so we reestablish order here.
+ it.unimi.dsi.fastutil.BigArrays.quickSort(0, d, new LongComparator() {
+ @Override
+ public int compare(long x, long y) {
+ final long t = LongBigArrays.get(successor, x) - LongBigArrays.get(successor, y);
+ return t == 0 ? 0 : t < 0 ? -1 : 1;
+ }
+ }, new BigSwapper() {
+ @Override
+ public void swap(long x, long y) {
+ final long t = LongBigArrays.get(successor, x);
+ LongBigArrays.set(successor, x, LongBigArrays.get(successor, y));
+ LongBigArrays.set(successor, y, t);
+ final Label l = ObjectBigArrays.get(label, x);
+ ObjectBigArrays.set(label, x, ObjectBigArrays.get(label, y));
+ ObjectBigArrays.set(label, y, l);
+ }
+
+ });
+ }
+ catch(IOException e) {
+ throw new RuntimeException(e);
+ }
+
+ outdegree = d;
+ return last;
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(InputBitStream ibs: batchIbs) if (ibs != null) ibs.close();
+ for(InputBitStream ibs: labelInputBitStream) if (ibs != null) ibs.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ if (last == -1) throw new IllegalStateException();
+ return new LabelledArcIterator() {
+ @SuppressWarnings("hiding")
+ int last = -1;
+
+ @Override
+ public Label label() {
+ return ObjectBigArrays.get(label, last);
+ }
+
+ @Override
+ public long nextLong() {
+ if (last + 1 == outdegree) return -1;
+ return LongBigArrays.get(successor, ++last);
+ }
+
+ @Override
+ public long skip(long k) {
+ long toSkip = Math.min(k, outdegree - last - 1);
+ last += toSkip;
+ return toSkip;
+ }
+ };
+ }
+ };
+ }
+ catch(IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ for(File f : batches) f.delete();
+ for(File f : labelBatches) f.delete();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+ };
+ }
+
+
+ /** Returns the union of two arc-labelled immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @param labelMergeStrategy the strategy used to merge labels when the same arc
+ * is present in both graphs; if <code>null</code>, {@link Labels#KEEP_FIRST_MERGE_STRATEGY}
+ * is used.
+ * @return the union of the two graphs.
+ */
+ public static ArcLabelledImmutableGraph union(final ArcLabelledImmutableGraph g0, final ArcLabelledImmutableGraph g1, final LabelMergeStrategy labelMergeStrategy) {
+ return new UnionArcLabelledImmutableGraph(g0, g1, labelMergeStrategy == null? Labels.KEEP_FIRST_MERGE_STRATEGY : labelMergeStrategy);
+ }
+
+ /** Returns the union of two immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @return the union of the two graphs.
+ */
+ public static ImmutableGraph union(final ImmutableGraph g0, final ImmutableGraph g1) {
+ return g0 instanceof ArcLabelledImmutableGraph && g1 instanceof ArcLabelledImmutableGraph
+ ? union((ArcLabelledImmutableGraph)g0, (ArcLabelledImmutableGraph)g1, (LabelMergeStrategy)null)
+ : new UnionImmutableGraph(g0, g1);
+ }
+
+
+ private static final class ComposedGraph extends ImmutableSequentialGraph {
+ private final ImmutableGraph g0;
+
+ private final ImmutableGraph g1;
+
+ private ComposedGraph(ImmutableGraph g0, ImmutableGraph g1) {
+ this.g0 = g0;
+ this.g1 = g1;
+ }
+
+ @Override
+ public long numNodes() {
+ return Math.max(g0.numNodes(), g1.numNodes());
+ }
+
+ @Override
+ public ImmutableSequentialGraph copy() {
+ // Note that only the second graph needs duplication.
+ return new ComposedGraph(g0, g1.copy());
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+
+ return new NodeIterator() {
+ private final NodeIterator it0 = g0.nodeIterator();
+ private long[][] succ = LongBigArrays.EMPTY_BIG_ARRAY;
+ private final LongOpenHashSet successors = new LongOpenHashSet(Hash.DEFAULT_INITIAL_SIZE, Hash.FAST_LOAD_FACTOR);
+ private int outdegree = -1; // -1 means that the cache is empty
+
+ @Override
+ public long nextLong() {
+ outdegree = -1;
+ return it0.nextLong();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return it0.hasNext();
+ }
+
+
+ @Override
+ public long outdegree() {
+ if (outdegree < 0) successorBigArray();
+ return outdegree;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (outdegree < 0) {
+ final int d = (int)it0.outdegree();
+ final long[][] s = it0.successorBigArray();
+ successors.clear();
+ for (int i = 0; i < d; i++) {
+ LazyLongIterator s1 = g1.successors(LongBigArrays.get(s, i));
+ long x;
+ while ((x = s1.nextLong()) >= 0) successors.add(x);
+ }
+ outdegree = successors.size();
+ succ = LongBigArrays.ensureCapacity(succ, outdegree, 0);
+ succ = LongBigArrays.newBigArray(outdegree);
+ final LongIterator iterator = successors.iterator();
+ for(long i = 0; i < outdegree; i++) LongBigArrays.set(succ, i, iterator.nextLong());
+ LongBigArrays.quickSort(succ, 0, outdegree);
+ }
+ return succ;
+ }
+ };
+ }
+ }
+
+ /** Returns the composition (a.k.a. matrix product) of two immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @return the composition of the two graphs.
+ */
+ public static ImmutableGraph compose(final ImmutableGraph g0, final ImmutableGraph g1) {
+ return new ComposedGraph(g0, g1);
+ }
+
+
+ /** Returns the composition (a.k.a. matrix product) of two arc-labelled immutable graphs.
+ *
+ * <P>The two arguments may differ in the number of nodes, in which case the
+ * resulting graph will be large as the larger graph.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @param strategy a label semiring.
+ * @return the composition of the two graphs.
+ */
+ public static ArcLabelledImmutableGraph compose(final ArcLabelledImmutableGraph g0, final ArcLabelledImmutableGraph g1, final LabelSemiring strategy) {
+ if (g0.prototype().getClass() != g1.prototype().getClass()) throw new IllegalArgumentException("The two graphs have different label classes (" + g0.prototype().getClass().getSimpleName() + ", " +g1.prototype().getClass().getSimpleName() + ")");
+
+ return new ArcLabelledImmutableSequentialGraph() {
+
+ @Override
+ public Label prototype() {
+ return g0.prototype();
+ }
+
+ @Override
+ public long numNodes() {
+ return Math.max(g0.numNodes(), g1.numNodes());
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+
+ return new ArcLabelledNodeIterator() {
+ private final ArcLabelledNodeIterator it0 = g0.nodeIterator();
+ private long[] succ = LongArrays.EMPTY_ARRAY;
+ private Label[] label = new Label[0];
+ private int maxOutDegree;
+ private int smallCount;
+ private Long2ObjectOpenHashMap<Label> successors = new Long2ObjectOpenHashMap<>(Hash.DEFAULT_INITIAL_SIZE, Hash.FAST_LOAD_FACTOR);
+ {
+ successors.defaultReturnValue(strategy.zero());
+ }
+ private int outdegree = -1; // -1 means that the cache is empty
+
+ @Override
+ public long nextLong() {
+ outdegree = -1;
+ return it0.nextLong();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return it0.hasNext();
+ }
+
+
+ @Override
+ public long outdegree() {
+ if (outdegree < 0) successorBigArray();
+ return outdegree;
+ }
+
+ private void ensureCache() {
+ if (outdegree < 0) {
+ final long d = it0.outdegree();
+ final LabelledArcIterator s = it0.successors();
+ if (successors.size() < maxOutDegree / 2 && smallCount++ > 100) {
+ smallCount = 0;
+ maxOutDegree = 0;
+ successors = new Long2ObjectOpenHashMap<>(Hash.DEFAULT_INITIAL_SIZE, Hash.FAST_LOAD_FACTOR);
+ successors.defaultReturnValue(strategy.zero());
+ }
+ else successors.clear();
+
+ for (int i = 0; i < d; i++) {
+ LabelledArcIterator s1 = g1.successors(s.nextLong());
+ long x;
+ while ((x = s1.nextLong()) >= 0) successors.put(x, strategy.add(strategy.multiply(s.label(), s1.label()), successors.get(x)));
+ }
+ outdegree = successors.size();
+ succ = LongArrays.ensureCapacity(succ, outdegree, 0);
+ label = ObjectArrays.ensureCapacity(label, outdegree, 0);
+ successors.keySet().toArray(succ);
+ LongArrays.quickSort(succ, 0, outdegree);
+ for(int i = outdegree; i-- != 0;) label[i] = successors.get(succ[i]);
+ if (outdegree > maxOutDegree) maxOutDegree = outdegree;
+ }
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ ensureCache();
+ return LongBigArrays.wrap(succ);
+ }
+
+ @Override
+ public Label[][] labelBigArray() {
+ ensureCache();
+ return ObjectBigArrays.wrap(label);
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ ensureCache();
+ return new LabelledArcIterator() {
+ int i = -1;
+ @Override
+ public Label label() {
+ return label[i];
+ }
+
+ @Override
+ public long nextLong() {
+ return i < outdegree - 1 ? succ[++i] : -1;
+ }
+
+ @Override
+ public long skip(final long n) {
+ final int incr = (int)Math.min(n, outdegree - i - 1);
+ i += incr;
+ return incr;
+ }
+ };
+ }
+ };
+ }
+ };
+ }
+
+
+ /** Returns a permutation that would make the given graph adjacency lists in Gray-code order.
+ *
+ * <P>Gray codes list all sequences of <var>n</var> zeros and ones in such a way that
+ * adjacent lists differ by exactly one bit. If we assign to each row of the adjacency matrix of
+ * a graph its index as a Gray code, we obtain a permutation that will make similar lines
+ * nearer.
+ *
+ * <P>Note that since a graph permutation permutes <em>both</em> rows and columns, this transformation is
+ * not idempotent: the Gray-code permutation produced from a matrix that has been Gray-code sorted will
+ * <em>not</em> be, in general, the identity.
+ *
+ * <P>The important feature of Gray-code ordering is that it is completely endogenous (e.g., determined
+ * by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge
+ * of the URL associated to each node).
+ *
+ * @param g an immutable graph.
+ * @return the permutation that would order the graph adjacency lists by Gray order
+ * (you can just pass it to {@link #mapOffline(ImmutableGraph, long[][], int, File, ProgressLogger)}).
+ */
+ public static long[][] grayCodePermutation(final ImmutableGraph g) {
+ final long n = g.numNodes();
+ final long[][] perm = LongBigArrays.newBigArray(n);
+ long i = n;
+ while(i-- != 0) LongBigArrays.set(perm, i, i);
+
+ final LongComparator grayComparator = new LongComparator() {
+ /* Remember that given a Gray code G (expressed as a 0-based sequence
+ * of n bits G[i]), the corresponding binary code B if defined as
+ * follows: B[n-1]=G[n-1], and B[i] = B[i+1] ^ G[i].
+ *
+ * Translating the formula above to our case (where we just have the increasing
+ * list of indices j such that G[i]=1), we see that the binary code
+ * corresponding to the Gray code of an adjacency list is
+ * made of alternating blocks of zeroes and ones; the alternation
+ * happens at each successor.
+ *
+ * Said that, the code below requires some reckoning to be fully
+ * understood (but it works!).
+ */
+
+ @Override
+ public int compare(final long x, final long y) {
+ final LazyLongIterator i = g.successors(x), j = g.successors(y);
+ long a;
+ long b;
+
+ /* This code duplicates eagerly of the behaviour of the lazy comparator
+ below. It is here for documentation and debugging purposes.
+
+ byte[] g1 = new byte[g.numNodes()], g2 = new byte[g.numNodes()];
+ while(i.hasNext()) g1[g.numNodes() - 1 - i.nextInt()] = 1;
+ while(j.hasNext()) g2[g.numNodes() - 1 - j.nextInt()] = 1;
+ for(int k = g.numNodes() - 2; k >= 0; k--) {
+ g1[k] ^= g1[k + 1];
+ g2[k] ^= g2[k + 1];
+ }
+ for(int k = g.numNodes() - 1; k >= 0; k--) if (g1[k] != g2[k]) return g1[k] - g2[k];
+ return 0;
+ */
+
+ boolean parity = false; // Keeps track of the parity of number of arcs before the current ones.
+ for(;;) {
+ a = i.nextLong();
+ b = j.nextLong();
+ if (a == -1 && b == -1) return 0;
+ if (a == -1) return parity ? 1 : -1;
+ if (b == -1) return parity ? -1 : 1;
+ if (a != b) return parity ^ (a < b) ? 1 : -1;
+ parity = ! parity;
+ }
+ }
+ };
+
+ LongBigArrays.quickSort(perm, 0, n, grayComparator);
+
+ return Util.invertPermutationInPlace(perm);
+ }
+
+ /** Returns a random permutation for a given graph.
+ *
+ * @param g an immutable graph.
+ * @param seed for {@link XorShift1024StarRandom}.
+ * @return a random permutation for the given graph
+ */
+ public static long[][] randomPermutation(final ImmutableGraph g, final long seed) {
+ return LongBigArrays.shuffle(Util.identity(g.numNodes()), new XorShift1024StarRandom(seed));
+ }
+
+
+
+ /** Returns a permutation that would make the given graph adjacency lists in lexicographical order.
+ *
+ * <P>Note that since a graph permutation permutes <em>both</em> rows and columns, this transformation is
+ * not idempotent: the lexicographical permutation produced from a matrix that has been
+ * lexicographically sorted will
+ * <em>not</em> be, in general, the identity.
+ *
+ * <P>The important feature of lexicographical ordering is that it is completely endogenous (e.g., determined
+ * by the graph itself), contrarily to, say, lexicographic URL ordering (which relies on the knowledge
+ * of the URL associated to each node).
+ *
+ * <p><strong>Warning</strong>: rows are numbered from zero <em>from the left</em>. This means,
+ * for instance, that nodes with an arc towards node zero are lexicographically smaller
+ * than nodes without it.
+ *
+ * @param g an immutable graph.
+ * @return the permutation that would order the graph adjacency lists by lexicographical order
+ * (you can just pass it to {@link #mapOffline(ImmutableGraph, long[][], int)}).
+ */
+ public static long[][] lexicographicalPermutation(final ImmutableGraph g) {
+ final long n = g.numNodes();
+ final long[][] perm = Util.identity(n);
+
+ final LongComparator lexicographicalComparator = new LongComparator() {
+ @Override
+ public int compare(final long x, final long y) {
+ final LazyLongIterator i = g.successors(x), j = g.successors(y);
+ long a;
+ long b;
+ for(;;) {
+ a = i.nextLong();
+ b = j.nextLong();
+ if (a == -1 && b == -1) return 0;
+ if (a == -1) return -1;
+ if (b == -1) return 1;
+ if (a != b) {
+ final long t = b - a;
+ return t == 0 ? 0 : t < 0 ? -1 : 1;
+ }
+ }
+ }
+ };
+
+ LongBigArrays.quickSort(perm, 0, n, lexicographicalComparator);
+
+ return Util.invertPermutationInPlace(perm);
+ }
+
+
+
+ /** Ensures that the arguments are exactly <code>n</code>, if <code>n</code> is nonnegative, or
+ * at least -<code>n</code>, otherwise.
+ */
+
+ private static boolean ensureNumArgs(String param[], int n) {
+ if (n >= 0 && param.length != n || n < 0 && param.length < -n) {
+ System.err.println("Wrong number (" + param.length + ") of arguments.");
+ return false;
+ }
+ return true;
+ }
+
+ /** Loads a graph with given data and returns it.
+ *
+ * @param graphClass the class of the graph to be loaded.
+ * @param baseName the graph basename.
+ * @param offline whether the graph is to be loaded in an offline fashion.
+ * @param pl a progress logger.
+ * @return the loaded graph.
+ */
+ private static ImmutableGraph load(Class<?> graphClass, String baseName, boolean offline, ProgressLogger pl) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, IOException {
+ ImmutableGraph graph = null;
+
+ if (graphClass != null) {
+ if (offline) graph = (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class).invoke(null, baseName);
+ else graph = (ImmutableGraph)graphClass.getMethod("load", CharSequence.class, ProgressLogger.class).invoke(null, baseName, pl);
+ }
+ else graph = offline ?
+ ImmutableGraph.loadOffline(baseName) :
+ ImmutableGraph.load(baseName, pl);
+
+ return graph;
+ }
+
+
+
+ public static void main(String args[]) throws IOException, IllegalArgumentException, SecurityException, InstantiationException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, ClassNotFoundException, JSAPException {
+ Class<?> sourceGraphClass = null, destGraphClass = BVGraph.class;
+ boolean offline = true, ascii = false;
+
+ final Field[] field = Transform.class.getDeclaredFields();
+ final List<String> filterList = new ArrayList<>();
+ final List<String> labelledFilterList = new ArrayList<>();
+
+ for(Field f: field) {
+ if (ArcFilter.class.isAssignableFrom(f.getType())) filterList.add(f.getName());
+ if (LabelledArcFilter.class.isAssignableFrom(f.getType())) labelledFilterList.add(f.getName());
+ }
+
+ SimpleJSAP jsap = new SimpleJSAP(Transform.class.getName(),
+ "Transforms one or more graphs. All transformations require, after the name,\n" +
+ "some parameters specified below:\n" +
+ "\n" +
+ "identity sourceBasename destBasename\n" +
+ "mapOffline sourceBasename destBasename map [batchSize] [tempDir] [cutoff]\n" +
+ "transposeOffline sourceBasename destBasename [batchSize] [tempDir]\n" +
+ "symmetrizeOffline sourceBasename destBasename [batchSize] [tempDir]\n" +
+ "union source1Basename source2Basename destBasename [strategy]\n" +
+ "compose source1Basename source2Basename destBasename [semiring]\n" +
+ "gray sourceBasename destBasename [batchSize] [tempDir]\n" +
+ "grayPerm sourceBasename dest\n" +
+ "lex sourceBasename destBasename [batchSize] [tempDir]\n" +
+ "lexPerm sourceBasename dest\n" +
+ "random sourceBasename destBasename [seed] [batchSize] [tempDir]\n" +
+ "arcfilter sourceBasename destBasename arcFilter (available filters: " + filterList + ")\n" +
+ "larcfilter sourceBasename destBasename arcFilter (available filters: " + labelledFilterList + ")\n" +
+ "\n" +
+ "Please consult the Javadoc documentation for more information on each transform.",
+ new Parameter[] {
+ new FlaggedOption("sourceGraphClass", GraphClassParser.getParser(), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 's', "source-graph-class", "Forces a Java class to load the source graph."),
+ new FlaggedOption("destGraphClass", GraphClassParser.getParser(), BVGraph.class.getName(), JSAP.NOT_REQUIRED, 'd', "dest-graph-class", "Forces a Java class to store the destination graph."),
+ new FlaggedOption("destArcLabelledGraphClass", GraphClassParser.getParser(), BitStreamArcLabelledImmutableGraph.class.getName(), JSAP.NOT_REQUIRED, 'L', "dest-arc-labelled-graph-class", "Forces a Java class to store the labels of the destination graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("ascii", 'a', "ascii", "Maps are in ASCII form (one integer per line)."),
+ new UnflaggedOption("transform", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The transformation to be applied."),
+ new UnflaggedOption("param", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.GREEDY, "The remaining parameters."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(args);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ sourceGraphClass = jsapResult.getClass("sourceGraphClass");
+ destGraphClass = jsapResult.getClass("destGraphClass");
+ ascii = jsapResult.getBoolean("ascii");
+ String transform = jsapResult.getString("transform");
+ String[] param = jsapResult.getStringArray("param");
+
+ String source[] = null, dest = null, map = null;
+ ArcFilter arcFilter = null;
+ LabelledArcFilter labelledArcFilter = null;
+ LabelSemiring labelSemiring = null;
+ LabelMergeStrategy labelMergeStrategy = null;
+ int batchSize = 1000000;
+ long cutoff = -1;
+ long seed = 0;
+ File tempDir = null;
+
+ if (! ensureNumArgs(param, -2)) return;
+
+ if (transform.equals("identity") || transform.equals("grayPerm") || transform.equals("lexPerm")) {
+ source = new String[] { param[0] };
+ dest = param[1];
+ if (! ensureNumArgs(param, 2)) return;
+ }
+ else if (transform.equals("mapOffline")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ map = param[2];
+ if (param.length >= 4) {
+ batchSize = ((Integer)JSAP.INTSIZE_PARSER.parse(param[3])).intValue();
+ if (param.length >= 5) {
+ tempDir = new File(param[4]);
+ if (param.length == 6) cutoff = Long.parseLong(param[5]);
+ else if (! ensureNumArgs(param, 5)) return;
+ }
+ else if (! ensureNumArgs(param, 4)) return;
+ }
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("random")) {
+ if (! ensureNumArgs(param, -2)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ if (param.length >= 3) {
+ seed = Long.parseLong(param[2]);
+ if (param.length >= 4) {
+ batchSize = ((Integer)JSAP.INTSIZE_PARSER.parse(param[3])).intValue();
+ if (param.length == 5) tempDir = new File(param[4]);
+ else if (! ensureNumArgs(param, 4)) return;
+ }
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (! ensureNumArgs(param, 2)) return;
+ }
+ else if (transform.equals("arcfilter")) {
+ if (ensureNumArgs(param, 3)) {
+ try {
+ // First try: a public field
+ arcFilter = (ArcFilter) Transform.class.getField(param[2]).get(null);
+ }
+ catch(NoSuchFieldException e) {
+ // No chance: let's try with a class
+ arcFilter = ObjectParser.fromSpec(param[2], ArcFilter.class, GraphClassParser.PACKAGE);
+ }
+ source = new String[] { param[0], null };
+ dest = param[1];
+ }
+ else return;
+ }
+ else if (transform.equals("larcfilter")) {
+ if (ensureNumArgs(param, 3)) {
+ try {
+ // First try: a public field
+ labelledArcFilter = (LabelledArcFilter) Transform.class.getField(param[2]).get(null);
+ }
+ catch(NoSuchFieldException e) {
+ // No chance: let's try with a class
+ labelledArcFilter = ObjectParser.fromSpec(param[2], LabelledArcFilter.class, GraphClassParser.PACKAGE);
+ }
+ source = new String[] { param[0], null };
+ dest = param[1];
+ }
+ else return;
+ }
+ else if (transform.equals("union")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0], param[1] };
+ dest = param[2];
+ if (param.length == 4) labelMergeStrategy = ObjectParser.fromSpec(param[3], LabelMergeStrategy.class, GraphClassParser.PACKAGE);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("compose")) {
+ if (! ensureNumArgs(param, -3)) return;
+ source = new String[] { param[0], param[1] };
+ dest = param[2];
+ if (param.length == 4) labelSemiring = ObjectParser.fromSpec(param[3], LabelSemiring.class, GraphClassParser.PACKAGE);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (transform.equals("transposeOffline") || transform.equals("symmetrizeOffline") || transform.equals("removeDangling") || transform.equals("gray") || transform.equals("lex")) {
+ if (! ensureNumArgs(param, -2)) return;
+ source = new String[] { param[0] };
+ dest = param[1];
+ if (param.length >= 3) {
+ batchSize = ((Integer)JSAP.INTSIZE_PARSER.parse(param[2])).intValue();
+ if (param.length == 4) tempDir = new File(param[3]);
+ else if (! ensureNumArgs(param, 3)) return;
+ }
+ else if (! ensureNumArgs(param, 2)) return;
+ }
+ else {
+ System.err.println("Unknown transform: " + transform);
+ return;
+ }
+
+ final ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+ final ImmutableGraph[] graph = new ImmutableGraph[source.length];
+ final ImmutableGraph result;
+ final Class<?> destLabelledGraphClass = jsapResult.getClass("destArcLabelledGraphClass");
+ if (! ArcLabelledImmutableGraph.class.isAssignableFrom(destLabelledGraphClass)) throw new IllegalArgumentException("The arc-labelled destination class " + destLabelledGraphClass.getName() + " is not an instance of ArcLabelledImmutableGraph");
+
+ // Check for transformations that require random access on the source graph.
+ if (transform.equals("grayPerm") || transform.equals("lexPerm") || transform.equals("gray") || transform.equals("lex")) offline = false;
+
+ for (int i = 0; i < source.length; i++)
+ // Note that composition requires the second graph to be always random access.
+ if (source[i] == null) graph[i] = null;
+ else graph[i] = load(sourceGraphClass, source[i], offline && ! (i == 1 && transform.equals("compose")), pl);
+
+ final boolean graph0IsLabelled = graph[0] instanceof ArcLabelledImmutableGraph;
+ final ArcLabelledImmutableGraph graph0Labelled = graph0IsLabelled ? (ArcLabelledImmutableGraph)graph[0] : null;
+ final boolean graph1IsLabelled = graph.length > 1 && graph[1] instanceof ArcLabelledImmutableGraph;
+
+ String notForLabelled = "This transformation will just apply to the unlabelled graph; label information will be absent";
+
+ if (transform.equals("identity")) result = graph[0];
+ else if (transform.equals("mapOffline")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ pl.start("Reading map...");
+
+ final long n = graph[0].numNodes();
+ final long[][] f = LongBigArrays.newBigArray(n);
+ final long loaded;
+ if (ascii) loaded = TextIO.loadLongs(map, f);
+ else loaded = BinIO.loadLongs(map, f);
+
+ if(n != loaded) throw new IllegalArgumentException("The source graph has " + n + " nodes, but the permutation contains " + loaded + " longs");
+
+ // Delete from the graph all nodes whose index is above the cutoff, if any.
+ if (cutoff != -1)
+ for(int i = f.length; i-- != 0;) {
+ final long[] t = f[i];
+ for(int d = t.length; d-- != 0;) if (t[d] >= cutoff) t[d] = -1;
+ }
+
+ pl.count = n;
+ pl.done();
+
+ result = mapOffline(graph[0], f, batchSize, tempDir, pl);
+ LOGGER.info("Transform computation completed.");
+ }
+ else if (transform.equals("arcfilter")) {
+ if (graph0IsLabelled && ! (arcFilter instanceof LabelledArcFilter)) {
+ LOGGER.warn(notForLabelled);
+ result = filterArcs(graph[0], arcFilter, pl);
+ }
+ else result = filterArcs(graph[0], arcFilter, pl);
+ }
+ else if (transform.equals("larcfilter")) {
+ if (! graph0IsLabelled) throw new IllegalArgumentException("Filtering on labelled arcs requires a labelled graph");
+ result = filterArcs(graph0Labelled, labelledArcFilter, pl);
+ }
+ else if (transform.equals("symmetrizeOffline")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = symmetrizeOffline(graph[0], batchSize, tempDir, pl);
+ }
+ else if (transform.equals("removeDangling")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+
+ final long n = graph[0].numNodes();
+ LOGGER.info("Finding dangling nodes...");
+
+ final long[][] f = LongBigArrays.newBigArray(n);
+ NodeIterator nodeIterator = graph[0].nodeIterator();
+ int c = 0;
+ for(long i = 0; i < n; i++) {
+ nodeIterator.nextLong();
+ LongBigArrays.set(f, i, nodeIterator.outdegree() != 0 ? c++ : -1);
+ }
+ result = mapOffline(graph[0], f, batchSize, tempDir, pl);
+ }
+ else if (transform.equals("transposeOffline")) {
+ result = graph0IsLabelled ? transposeOffline(graph0Labelled, batchSize, tempDir, pl) : transposeOffline(graph[0], batchSize, tempDir, pl);
+ }
+ else if (transform.equals("union")) {
+ if (graph0IsLabelled && graph1IsLabelled) {
+ if (labelMergeStrategy == null) throw new IllegalArgumentException("Uniting labelled graphs requires a merge strategy");
+ result = union(graph0Labelled, (ArcLabelledImmutableGraph)graph[1], labelMergeStrategy);
+ }
+ else {
+ if (graph0IsLabelled || graph1IsLabelled) LOGGER.warn(notForLabelled);
+ result = union(graph[0], graph[1]);
+ }
+ }
+ else if (transform.equals("compose")) {
+ if (graph0IsLabelled && graph1IsLabelled) {
+ if (labelSemiring == null) throw new IllegalArgumentException("Composing labelled graphs requires a composition strategy");
+ result = compose(graph0Labelled, (ArcLabelledImmutableGraph)graph[1], labelSemiring);
+ }
+ else {
+ if (graph0IsLabelled || graph1IsLabelled) LOGGER.warn(notForLabelled);
+ result = compose(graph[0], graph[1]);
+ }
+ }
+ else if (transform.equals("gray")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = mapOffline(graph[0], grayCodePermutation(graph[0]), batchSize, tempDir, pl);
+ }
+ else if (transform.equals("grayPerm")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ BinIO.storeLongs(grayCodePermutation(graph[0]), param[1]);
+ return;
+ }
+ else if (transform.equals("lex")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = mapOffline(graph[0], lexicographicalPermutation(graph[0]), batchSize, tempDir, pl);
+ }
+ else if (transform.equals("lexPerm")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ BinIO.storeLongs(lexicographicalPermutation(graph[0]), param[1]);
+ return;
+ }
+ else if (transform.equals("random")) {
+ if (graph0IsLabelled) LOGGER.warn(notForLabelled);
+ result = mapOffline(graph[0], randomPermutation(graph[0], seed), batchSize, tempDir, pl);
+ } else result = null;
+
+ if (result instanceof ArcLabelledImmutableGraph) {
+ // Note that we derelativise non-absolute pathnames to build the underlying graph name.
+ LOGGER.info("The result is a labelled graph (class: " + destLabelledGraphClass.getName() + ")");
+ final File destFile = new File(dest);
+ final String underlyingName = (destFile.isAbsolute() ? dest : destFile.getName()) + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX;
+ destLabelledGraphClass.getMethod("store", ArcLabelledImmutableGraph.class, CharSequence.class, CharSequence.class, ProgressLogger.class).invoke(null, result, dest, underlyingName, pl);
+ ImmutableGraph.store(destGraphClass, result, dest + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX, pl);
+ }
+ else ImmutableGraph.store(destGraphClass, result, dest, pl);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/UnionImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/UnionImmutableGraph.java
new file mode 100644
index 0000000..860e24e
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/UnionImmutableGraph.java
@@ -0,0 +1,169 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/** An immutable graph representing the union of two given graphs. Here by &ldquo;union&rdquo;
+ * we mean that an arc will belong to the union iff it belongs to at least one of the two graphs (the number of
+ * nodes of the union is taken to be the maximum among the number of nodes of each graph).
+ */
+public class UnionImmutableGraph extends ImmutableGraph {
+ @SuppressWarnings("unused")
+ private static final Logger LOGGER = LoggerFactory.getLogger(Transform.class);
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ @SuppressWarnings("unused")
+ private static final boolean ASSERTS = false;
+
+ private final ImmutableGraph g0, g1;
+ private final long n0, n1, numNodes;
+
+ /** The node whose successors are cached, or -1 if no successors are currently cached. */
+ private final long cachedNode = -1;
+
+ /** The outdegree of the cached node, if any. */
+ private long outdegree ;
+
+ /** The successors of the cached node, if any; note that the array might be larger. */
+ private long[][] cache = LongBigArrays.EMPTY_BIG_ARRAY;
+
+ /** Creates the union of two given graphs.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ */
+ public UnionImmutableGraph(ImmutableGraph g0, ImmutableGraph g1) {
+ this.g0 = g0;
+ this.g1 = g1;
+ n0 = g0.numNodes();
+ n1 = g1.numNodes();
+ numNodes = Math.max(n0, n1);
+ }
+
+ @Override
+ public UnionImmutableGraph copy() {
+ return new UnionImmutableGraph(g0.copy(), g1.copy());
+ }
+
+ @Override
+ public NodeIterator nodeIterator(final long from) {
+
+ return new NodeIterator() {
+ /** If outdegree is nonnegative, the successors of the current node (this array may be, however, larger). */
+ @SuppressWarnings("hiding")
+ private long[][] cache = LongBigArrays.EMPTY_BIG_ARRAY;
+ /** The outdegree of the current node, or -1 if the successor array for the current node has not been computed yet. */
+ @SuppressWarnings("hiding")
+ private long outdegree = -1;
+ private NodeIterator i0 = from < n0? g0.nodeIterator(from) : null;
+ private NodeIterator i1 = from < n1? g1.nodeIterator(from) : null;
+
+ @Override
+ public boolean hasNext() {
+ return i0 != null && i0.hasNext() || i1 != null && i1.hasNext();
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ outdegree = -1;
+ long result = -1;
+ if (i0 != null) {
+ if (i0.hasNext()) result = i0.nextLong();
+ else i0 = null;
+ }
+ if (i1 != null) {
+ if (i1.hasNext()) result = i1.nextLong();
+ else i1 = null;
+ }
+ return result;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (outdegree != -1) return cache;
+ if (i0 == null) {
+ outdegree = i1.outdegree();
+ return cache = i1.successorBigArray();
+ }
+ if (i1 == null) {
+ outdegree = i0.outdegree();
+ return cache = i0.successorBigArray();
+ }
+
+ MergedLongIterator merge = new MergedLongIterator(i0.successors(), i1.successors());
+ outdegree = LazyLongIterators.unwrap(merge, cache);
+ long upto, t;
+ while ((t = merge.nextLong()) != -1) {
+ upto = LongBigArrays.length(cache);
+ cache = LongBigArrays.grow(cache, upto + 1);
+ LongBigArrays.set(cache, upto++, t);
+ outdegree++;
+ outdegree += LazyLongIterators.unwrap(merge, cache, upto, LongBigArrays.length(cache) - upto);
+ }
+ return cache;
+ }
+
+ @Override
+ public long outdegree() {
+ successorBigArray(); // So that the cache is filled up
+ return outdegree;
+ }
+
+ };
+
+ }
+
+ @Override
+ public long numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return g0.randomAccess() && g1.randomAccess();
+ }
+
+ @Override
+ public long[][] successorBigArray(final long x) {
+ if (x == cachedNode) return cache;
+ final MergedLongIterator merge = new MergedLongIterator(x < n0? g0.successors(x) : LazyLongIterators.EMPTY_ITERATOR, x < n1? g1.successors(x) : LazyLongIterators.EMPTY_ITERATOR);
+ outdegree = LazyLongIterators.unwrap(merge, cache);
+ long upto, t;
+ while ((t = merge.nextLong()) != -1) {
+ upto = LongBigArrays.length(cache);
+ cache = LongBigArrays.grow(cache, upto + 1);
+ LongBigArrays.set(cache, upto++, t);
+ outdegree++;
+ outdegree += LazyLongIterators.unwrap(merge, cache, upto, LongBigArrays.length(cache) - upto);
+ }
+ return cache;
+ }
+
+ @Override
+ public long outdegree(final long x) {
+ successorBigArray(x); // So the cache gets filled
+ return outdegree;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/ConnectedComponents.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/ConnectedComponents.java
new file mode 100644
index 0000000..62806b8
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/ConnectedComponents.java
@@ -0,0 +1,182 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicLongArray;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.UnionImmutableGraph;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+
+/**
+ * Computes the connected components of a <em>symmetric</em> (a.k.a&#46; <em>undirected</em>) graph
+ * using a {@linkplain ParallelBreadthFirstVisit parallel breadth-first visit}.
+ *
+ * <p>The {@link #compute(ImmutableGraph, int, ProgressLogger)} method of this class will return an
+ * instance that contains the data computed by visiting the graph (using an instance of
+ * {@link ParallelBreadthFirstVisit}). Note that it is your responsibility to pass a symmetric graph
+ * to {@link #compute(ImmutableGraph, int, ProgressLogger)}. Otherwise, results will be
+ * unpredictable.
+ *
+ * <p>After getting an instance, it is possible to run the {@link #computeSizes()} and
+ * {@link #sortBySize(long[][])} methods to obtain further information. This scheme has been devised to
+ * exploit the available memory as much as possible&mdash;after the components have been computed,
+ * the returned instance keeps no track of the graph, and the related memory can be freed by the
+ * garbage collector.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class uses an instance of {@link ParallelBreadthFirstVisit} to ensure a high degree of
+ * parallelism (see its documentation for memory requirements).
+ */
+
+public class ConnectedComponents {
+ private static final Logger LOGGER = LoggerFactory.getLogger(ConnectedComponents.class);
+
+ /** The number of connected components. */
+ public final long numberOfComponents;
+
+ /** The component of each node. */
+ public final long[][] component;
+
+ protected ConnectedComponents(final long numberOfComponents, final long[][] component) {
+ this.numberOfComponents = numberOfComponents;
+ this.component = component;
+ }
+
+ /**
+ * Computes the connected components of a symmetric graph.
+ *
+ * @param symGraph a symmetric graph.
+ * @param threads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static ConnectedComponents compute(final ImmutableGraph symGraph, final int threads, final ProgressLogger pl) {
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(symGraph, threads, false, pl);
+ visit.visitAll();
+ final AtomicLongArray[] visited = visit.marker;
+ final long numberOfComponents = visit.round + 1;
+ visit = null;
+ final long[][] component = LongBigArrays.newBigArray(symGraph.numNodes());
+ for (int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int j = t.length; j-- != 0;) t[j] = visited[i].get(j);
+ }
+ return new ConnectedComponents(numberOfComponents, component);
+ }
+
+ /** Returns the size big array for this set of strongly connected components.
+ *
+ * @return the size big array for this set of strongly connected components.
+ */
+ public long[][] computeSizes() {
+ final long[][] size = LongBigArrays.newBigArray(numberOfComponents);
+ for(int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int d = t.length; d-- != 0;) LongBigArrays.incr(size, t[d]);
+ }
+ return size;
+ }
+
+ /** Renumbers by decreasing size the components of this set.
+ *
+ * <p>After a call to this method, both the internal status of this class and the argument
+ * big array are permuted so that the sizes of strongly connected components are decreasing
+ * in the component index.
+ *
+ * @param size the components sizes, as returned by {@link #computeSizes()}.
+ */
+ public void sortBySize(final long[][] size) {
+ final long[][] perm = Util.identity(LongBigArrays.length(size));
+ LongBigArrays.quickSort(perm, 0, LongBigArrays.length(perm), (x,y) -> {
+ final long t = LongBigArrays.get(size, y) - LongBigArrays.get(size, x);
+ return t == 0 ? 0 : t < 0 ? -1 : 1;
+ });
+ final long[][] copy = LongBigArrays.copy(size);
+
+ for(int i = size.length; i-- != 0;) {
+ final long[] t = size[i];
+ final long[] u = perm[i];
+ for(int d = t.length; d-- != 0;) t[d] = LongBigArrays.get(copy, u[d]);
+ }
+ Util.invertPermutationInPlace(perm);
+
+ for(int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int d = t.length; d-- != 0;) t[d] = LongBigArrays.get(perm, t[d]);
+ }
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(ConnectedComponents.class.getName(),
+ "Computes the connected components of a symmetric graph of given basename. The resulting data is saved " +
+ "in files stemmed from the given basename with extension .wcc (a list of binary integers specifying the " +
+ "component of each node) and .wccsizes (a list of binary integer specifying the size of each component). " +
+ "The symmetric graph can also be specified using a generic (non-symmetric) graph and its transpose.",
+ new Parameter[] {
+ new Switch("sizes", 's', "sizes", "Compute component sizes."),
+ new Switch("renumber", 'r', "renumber", "Renumber components in decreasing-size order."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new Switch("mapped", 'm', "mapped", "Do not load the graph in main memory, but rather memory-map it."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("basenamet", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 't', "transpose", "The basename of the transpose, in case the graph is not symmetric."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of a symmetric graph (or of a generic graph, if the transpose is provided, too)."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final String basenamet = jsapResult.getString("basenamet");
+ final String resultsBasename = jsapResult.getString("resultsBasename", basename);
+ final int threads = jsapResult.getInt("threads");
+ ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ ImmutableGraph graph = jsapResult.userSpecified("mapped") ? ImmutableGraph.loadMapped(basename) : ImmutableGraph.load(basename, pl);
+ ImmutableGraph grapht = basenamet == null ? null : jsapResult.userSpecified("mapped") ? ImmutableGraph.loadMapped(basenamet) : ImmutableGraph.load(basenamet, pl);
+ final ConnectedComponents components = ConnectedComponents.compute(basenamet != null ? new UnionImmutableGraph(graph, grapht) : graph, threads, pl);
+
+ if (jsapResult.getBoolean("sizes") || jsapResult.getBoolean("renumber")) {
+ final long[][] size = components.computeSizes();
+ if (jsapResult.getBoolean("renumber")) components.sortBySize(size);
+ if (jsapResult.getBoolean("sizes")) BinIO.storeLongs(size, resultsBasename + ".wccsizes");
+ }
+ BinIO.storeLongs(components.component, resultsBasename + ".wcc");
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/EliasFanoCumulativeOutdegreeList.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/EliasFanoCumulativeOutdegreeList.java
new file mode 100644
index 0000000..424f377
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/EliasFanoCumulativeOutdegreeList.java
@@ -0,0 +1,153 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.bits.BitVector;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.sux4j.bits.SimpleSelectZero;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+
+/**<p>A content-addressable representation of the cumulative function of outdegrees that uses a stripped-down
+ * implementation of Elias&ndash;Fano's representation of monotone sequences partially taken from {@link EliasFanoMonotoneLongBigList}.
+ *
+ * <p>The purpose of this class is that of storing quasi-succinctly the outdegrees of a graph so that it
+ * is easy to find quickly a batch of nodes whose overall outdegree is a given quantity. It is most effective
+ * in multicore computations depending on the outdegree, as usually the transposed graph has some very high-degree nodes, and often
+ * in web graphs, due to crawling artifacts, these nodes are very close. As a result, a node-based job assignment
+ * ends up in creating batches of nodes that are incredibly expensive, which in turns produced an unbalanced
+ * iteration (e.g., in the last part few processors are actually working).
+ *
+ * <p>The main access method is {@link #skipTo(long)}, which will return a value of the cumulative function larger than or equal to
+ * its argument. At that point, {@link #currentIndex()} returns the index of the node that realize that value.
+ */
+
+public final class EliasFanoCumulativeOutdegreeList {
+ /** The number of lower bits. */
+ private final int l;
+ /** The mask used to round up returned {@link #currentIndex} values when {@link HyperLogLogCounterArray#m} &lt; 64, 0 otherwise. */
+ private final long roundingMask;
+ /** The upper-bits array. */
+ private final long[] upperBits;
+ /** The lower-bits list. */
+ private final LongBigList lowerBits;
+ /** The number of nodes, cached. */
+ private final long numNodes;
+ /** The 64-bit window. */
+ private long window;
+ /** The current word position in the list of upper bits. */
+ private int curr;
+ /** The index of the current prefix sum. */
+ private long currentIndex;
+ /** A zero-selection structure on {@link #upperBits}. */
+ private SimpleSelectZero simpleSelectZero;
+
+ /** Creates a cumulative outdegree list with no rounding mask.
+ *
+ * @param graph a graph.
+ */
+ public EliasFanoCumulativeOutdegreeList(final ImmutableGraph graph) {
+ this(graph, graph.numArcs());
+ }
+
+ /** Creates a cumulative outdegree list with no rounding mask.
+ *
+ * @param graph a graph.
+ * @param numArcs the number of arcs in the graph (this parameter can be useful as some {@link ImmutableGraph} implementations
+ * do not support {@link ImmutableGraph#numArcs()}).
+ */
+ public EliasFanoCumulativeOutdegreeList(final ImmutableGraph graph, final long numArcs) {
+ this(graph, numArcs, 0);
+ }
+
+ /** Creates a cumulative outdegree list with specified rounding mask.
+ *
+ * @param graph a graph.
+ * @param numArcs the number of arcs in the graph (this parameter can be useful as some {@link ImmutableGraph} implementations
+ * do not support {@link ImmutableGraph#numArcs()}).
+ * @param roundingMask a number of the form 2<sup><var>k</var></sup> &minus; 1. After each call to {@link #skipTo(long)},
+ * {@link #currentIndex()} is guaranteed to return a multiple of 2<sup><var>k</var></sup>, unless {@link #currentIndex()} is
+ * equal to the number of nodes in {@code graph}.
+ */
+ public EliasFanoCumulativeOutdegreeList(final ImmutableGraph graph, final long numArcs, final long roundingMask) {
+ if (roundingMask + 1 != Long.highestOneBit(roundingMask + 1)) throw new IllegalArgumentException("Illegal rounding mask: " + roundingMask);
+ this.roundingMask = roundingMask;
+ final long length = numNodes = graph.numNodes();
+ final long upperBound = numArcs;
+ l = length == 0 ? 0 : Math.max(0, Fast.mostSignificantBit(upperBound / length));
+ final long lowerBitsMask = (1L << l) - 1;
+ final LongBigList lowerBitsList = LongArrayBitVector.getInstance().asLongBigList(l);
+ lowerBitsList.size(length);
+ final BitVector upperBitsVector = LongArrayBitVector.getInstance().length(length + (upperBound >>> l) + 1);
+ for(long i = 0, v = 0; i < length; i++) {
+ v += graph.outdegree(i);
+ if (v > upperBound) throw new IllegalArgumentException("Too large value: " + v + " > " + upperBound);
+ if (l != 0) lowerBitsList.set(i, v & lowerBitsMask);
+ upperBitsVector.set((v >>> l) + i);
+ }
+
+ lowerBits = lowerBitsList;
+ upperBits = upperBitsVector.bits();
+ simpleSelectZero = new SimpleSelectZero(upperBitsVector);
+ currentIndex = -1;
+ }
+
+ private long getNextUpperBits() {
+ assert currentIndex < numNodes;
+ while(window == 0) window = upperBits[++curr];
+ final long upperBits = curr * (long)Long.SIZE + Long.numberOfTrailingZeros(window) - currentIndex++;
+ window &= window - 1;
+ return upperBits;
+ }
+
+ /** Returns the index realizing the last value returned by {@link #skipTo(long)}, that is,
+ * an index <var>x</var> such that the sum of the outdegrees of the nodes of index (strictly) smaller
+ * than <var>x</var> is equal to the last value returned by {@link #skipTo(long)}.
+ *
+ * @return the index of the node realizing the last value returned by {@link #skipTo(long)}, or -1 if {@link #skipTo(long)} has never been called.
+ */
+ public long currentIndex() {
+ return currentIndex;
+ }
+
+ /** Returns the first value of the cumulative function of outdegrees that is larger than or equal to the provided bound and
+ * that respect the rounding mask provided at construction time.
+ *
+ * @param lowerBound a lower bound on the returned value.
+ * @return the first value of the cumulative function of outdegrees that is larger than or equal to {@code lowerBound} and
+ * that respect the rounding mask provided at construction time.
+ */
+
+ public long skipTo(final long lowerBound) {
+ final long zeroesToSkip = (lowerBound >>> l) - 1;
+ final long position = zeroesToSkip == -1 ? 0 : simpleSelectZero.selectZero(zeroesToSkip);
+ window = upperBits[curr = (int)(position / Long.SIZE)];
+ window &= -1L << (position % Long.SIZE);
+ currentIndex = zeroesToSkip == -1 ? 0 : position - zeroesToSkip;
+
+ for(;;) {
+ final long lower = lowerBits.getLong(currentIndex);
+ final long last = getNextUpperBits() << l | lower;
+ if (last >= lowerBound && (currentIndex & roundingMask) == 0 || currentIndex == numNodes) return last;
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/HyperBall.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/HyperBall.java
new file mode 100644
index 0000000..4acf2b8
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/HyperBall.java
@@ -0,0 +1,1387 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.GraphClassParser;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.big.webgraph.Transform;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.Hash;
+import it.unimi.dsi.fastutil.booleans.BooleanBigArrays;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.doubles.DoubleIterator;
+import it.unimi.dsi.fastutil.floats.FloatBigArrays;
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.longs.LongArrays;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import it.unimi.dsi.fastutil.longs.LongSet;
+import it.unimi.dsi.fastutil.longs.LongSets;
+import it.unimi.dsi.io.SafelyCloseable;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+import it.unimi.dsi.util.KahanSummation;
+
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.NotSerializableException;
+import java.io.ObjectOutputStream;
+import java.io.PrintStream;
+import java.io.RandomAccessFile;
+import java.io.Serializable;
+import java.lang.reflect.InvocationTargetException;
+import java.math.BigDecimal;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.Arrays;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.ReentrantLock;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** <p>Computes an approximation of the neighbourhood function, of the size of the reachable sets,
+ * and of (discounted) positive geometric centralities of a graph using HyperBall.
+ *
+ * <p>HyperBall is an algorithm computing by dynamic programming an approximation
+ * of the sizes of the balls of growing radius around the nodes of a graph. Starting from
+ * these data, it can approximate the <em>neighbourhood function</em> of a graph, that is, the function returning
+ * for each <var>t</var> the number of pairs of nodes at distance at most <var>t</var>,
+ * the number of nodes reachable from each node, Bavelas's closeness centrality, Lin's index, and
+ * <em>harmonic centrality</em> (studied by Paolo Boldi and Sebastiano Vigna in &ldquo;<a href ="http://vigna.di.unimi.it/papers.php#BoVAC">Axioms for Centrality</a>&rdquo;, <i>Internet Math.</i>, 2014).
+ * HyperBall can also compute <em>discounted centralities</em>, in which the weight assigned to a node is some
+ * specified function of its distance. All centralities are computed in their <em>positive</em> version (i.e.,
+ * using distance <em>from</em> the source: see below how to compute the more usual, and useful, <em>negative</em> version).
+ *
+ * <p>HyperBall has been described by Paolo Boldi and Sebastiano Vigna in
+ * &ldquo;In-Core Computation of Geometric Centralities with HyperBall: A Hundred Billion Nodes and Beyond&rdquo;,
+ * <i>Proc. of 2013 IEEE 13th International Conference on Data Mining Workshops (ICDMW 2013)</i>, IEEE, 2013,
+ * and it is a generalization of the method described in &ldquo;HyperANF: Approximating the Neighbourhood Function of Very Large Graphs
+ * on a Budget&rdquo;, by Paolo Boldi, Marco Rosa and Sebastiano Vigna,
+ * <i>Proceedings of the 20th international conference on World Wide Web</i>, pages 625&minus;634, ACM, (2011).
+ *
+ * <p>Incidentally, HyperBall (actually, HyperANF) has been used to show that Facebook has just <a href="http://vigna.dsi.unimi.it/papers.php#BBRFDS">four degrees of separation</a>.
+ *
+ * <p>At step <var>t</var>, for each node we (approximately) keep track (using {@linkplain HyperLogLogCounterArray HyperLogLog counters})
+ * of the set of nodes at distance at most <var>t</var>. At each iteration, the sets associated with the successors of each node are merged,
+ * thus obtaining the new sets. A crucial component in making this process efficient and scalable is the usage of
+ * <em>broadword programming</em> to implement the join (merge) phase, which requires maximising in parallel the list of registers associated with
+ * each successor (the implementation is geared towards 64-bits processors).
+ *
+ * <p>Using the approximate sets, for each <var>t</var> we estimate the number of pairs of nodes (<var>x</var>,<var>y</var>) such
+ * that the distance from <var>x</var> to <var>y</var> is at most <var>t</var>. Since during the computation we are also
+ * in possession of the number of nodes at distance <var>t</var> &minus; 1, we can also perform computations
+ * using the number of nodes at distance <em>exactly</em> <var>t</var> (e.g., centralities).
+ *
+ * <p>To use this class, you must first create an instance.
+ * Then, you call {@link #init()} (once) and then {@link #iterate()} as much as needed (you can init/iterate several times, if you want so).
+ * A {@linkplain #run(long, double) commodity method} will do everything for you.
+ * Finally, you <strong>must</strong> {@link #close()} the instance. The method {@link #modified()} will tell you whether the internal state of
+ * the algorithm has changed.
+ *
+ * <p>If you additionally pass to the constructor (or on the command line) the <em>transpose</em> of your graph (you can compute it using {@link Transform#transposeOffline(ImmutableGraph,int)}
+ * or {@link Transform#transposeOffline(ImmutableGraph, int)}), when three quarters of the nodes stop changing their value
+ * HyperBall will switch to a <em>systolic</em> computation: using the transpose, when a node changes it will signal back
+ * to its predecessors that at the next iteration they could change. At the next scan, only the successors of
+ * signalled nodes will be scanned. In particular,
+ * when a very small number of nodes is modified by an iteration, HyperBall will switch to a systolic <em>local</em> mode,
+ * in which all information about modified nodes is kept in (traditional) dictionaries, rather than being represented as arrays of booleans.
+ * This strategy makes the last phases of the computation orders of magnitude faster, and makes
+ * in practice the running time of HyperBall proportional to the theoretical bound
+ * <i>O</i>(<var>m</var> log <var>n</var>), where <var>n</var>
+ * is the number of nodes and <var>m</var> is the number of the arcs of the graph. Note that
+ * graphs with a large diameter require a correspondingly large number of iterations, and these iterations will have to
+ * pass over all nodes if you do not provide the tranpose.
+ *
+ * <p>Deciding when to stop iterating is a rather delicate issue. The only safe way is to iterate until {@link #modified()} is zero,
+ * and systolic (local) computation makes this goal easily attainable.
+ * However, in some cases one can assume that the graph is not pathological, and stop when the relative increment of the number of pairs goes below
+ * some threshold.
+ *
+ * <h2>Computing Centralities</h2>
+ *
+ * <p>Note that usually one is interested in the <em>negative</em> version of a centrality measure, that is, the version
+ * that depends on the <em>incoming</em> arcs. HyperBall can compute only <em>positive</em> centralities: if you are
+ * interested (as it usually happens) in the negative version, you must pass to HyperBall the <em>transpose</em> of the graph
+ * (and if you want to run in systolic mode, the original graph, which is the transpose of the transpose). Note that the
+ * neighbourhood function of the transpose is identical to the neighbourhood function of the original graph, so the exchange
+ * does not alter its computation.
+ *
+ * <h2>Configuring the JVM</h2>
+ *
+ * <p>HyperBall computations go against all basic assumptions of Java garbage collection. It is thus
+ * essential that you reconfigure your JVM properly. A good starting point is the following command line:
+ * <pre>
+ * java -server -Xss256K -Xms100G -XX:PretenureSizeThreshold=512M -XX:MaxNewSize=4G \
+ * -XX:+UseNUMA -XX:+UseTLAB -XX:+ResizeTLAB \
+ * -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=99 -XX:+UseCMSInitiatingOccupancyOnly \
+ * -verbose:gc -Xloggc:gc.log ...
+ * </pre>
+ *
+ * <ul>
+ * <li><code>-Xss256K</code> reduces the stack memory used by each thread.
+ * <li><code>-Xms100G</code> size the heap: the more memory, the more counter per registers
+ * you can use (the amount, of course, depends on your hardware); please note that we set the
+ * <em>starting</em> heap size as expansion of large heaps is very expensive.
+ * <li><code>-XX:PretenureSizeThreshold=512M</code> forces the allocation of registers directly into the old generation.
+ * <li><code>-XX:MaxNewSize=4G</code> leaves almost all memory for the old generation.
+ * <li><code>-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=99 -XX:+UseCMSInitiatingOccupancyOnly</code>
+ * set the concurrent garbage collector, and impose that no collection is performed until 99% of the permanent
+ * generation is filled.
+ * <li><code>-XX:+UseNUMA -XX:+UseTLAB -XX:+ResizeTLAB</code> usually improve performance, but your mileage may vary.
+ * </ul>
+ * <p>Check the garbage collector logs (<code>gc.log</code>) to be sure that your
+ * minor and major collections are very infrequent (as they should be).
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>To use HyperBall effectively, you should aim at filling a large percentage of the available core memory. This requires,
+ * of course, to size properly the heap, but also to configure some parameters.
+ *
+ * <p>Most of the memory goes into storing HyperLogLog registers. By tuning the number of registers per counter, you can
+ * modify the memory allocated for them. The amount of memory is logged, and you should check that the number of registers you
+ * chose almost fills up the heap memory you allocated, possibly leaving space for the graph(s) (but read below).
+ * Note that you can only choose a number of registers per counter that is
+ * a power of two, so your latitude in adjusting the memory used for registers is somewhat limited.
+ *
+ * <p>If you have little memory, this class can perform <em>external</em> computations: instead of keeping in core memory
+ * an old and a new copy of the counters, it can dump on disk an <em>update list</em> containing pairs &lt;<var>node</var>,&nbsp;<var>counter</var>&gt;.
+ * At the end of an iteration, the update list is loaded and applied to the counters in memory.
+ * The process is of course slower, but the core memory used is halved.
+ *
+ * <p>Then, some memory is necessary to load the graph (and possibly its tranpose). We suggest to check the offline
+ * option, which will map the graph into memory, rather than loading it. If you map the graph into memory, take care of
+ * leaving some free memory, beside that allocated for the heap, as the operating system will need some space to buffer
+ * the memory-mapped graph(s).
+ *
+ * <p>If there are several available cores, the runs of {@link #iterate()} will be <em>decomposed</em> into relatively
+ * small tasks (small blocks of nodes) and each task will be assigned to the first available core. Since all tasks are completely
+ * independent, this behaviour ensures a very high degree of parallelism. Be careful, however, because this feature requires a graph with
+ * a reasonably fast random access (e.g., in the case of a {@link BVGraph}, short reference chains), as many
+ * calls to {@link ImmutableGraph#nodeIterator(long)} will be made. The <em>granularity</em> of the decomposition
+ * is the number of nodes assigned to each task.
+ *
+ * <p>In any case, when attacking very large graphs (in particular, in external mode) some system tuning (e.g.,
+ * increasing the filesystem commit time) is a good idea. Also experimenting with granularity and buffer sizes
+ * can be useful. Smaller buffers reduce the waits on I/O calls, but increase the time spent in disk seeks.
+ * Large buffers improve I/O, but they use a lot of memory. The best possible setup is the one in which
+ * the cores are 100% busy during the graph scan, and the I/O time
+ * logged at the end of a scan is roughly equal to the time that is necessary to reload the counters from disk:
+ * in such a case, essentially, you are computing as fast as possible.
+ *
+ * @author Sebastiano Vigna
+ * @author Paolo Boldi
+ * @author Marco Rosa
+ */
+
+public class HyperBall extends HyperLogLogCounterArray implements SafelyCloseable {
+ private static final Logger LOGGER = LoggerFactory.getLogger(HyperBall.class);
+ public static final boolean ASSERTS = false;
+ private static final long serialVersionUID = 1L;
+
+ /** The default granularity of a task. */
+ public static final int DEFAULT_GRANULARITY = 16 * 1024;
+ /** The default size of a buffer in bytes. */
+ public static final int DEFAULT_BUFFER_SIZE = 4 * 1024 * 1024;
+ /** True if we have the transpose graph. */
+ protected final boolean gotTranspose;
+ /** True if we started a systolic computation. */
+ protected boolean systolic;
+ /** True if we are preparing a local computation (we are {@link #systolic} and less than 1% nodes were modified). */
+ protected boolean preLocal;
+ /** True if we started a local computation. */
+ protected boolean local;
+ /** Whether the sum of distances from each node (inverse of <strong>positive</strong> closeness centrality) should be computed; if false, {@link #sumOfDistances} is <code>null</code>. */
+ protected final boolean doSumOfDistances;
+ /** Whether the sum of inverse distances from each node (<strong>positive</strong> harmonic centrality) should be computed; if false, {@link #sumOfInverseDistances} is <code>null</code>. */
+ protected boolean doSumOfInverseDistances;
+ /** The neighbourhood function, if requested. */
+ public final DoubleArrayList neighbourhoodFunction;
+ /** The sum of the distances from every given node, if requested. */
+ public final float[][] sumOfDistances;
+ /** The sum of inverse distances from each given node, if requested. */
+ public final float[][] sumOfInverseDistances;
+ /** A number of discounted centralities to be computed, possibly none. */
+ public final Int2DoubleFunction[] discountFunction;
+ /** The overall discounted centrality, for every {@link #discountFunction}. */
+ public final float[][][] discountedCentrality;
+ /** The number of nodes of the graph, cached. */
+ protected final long numNodes;
+ /** The number of arcs of the graph, cached. */
+ protected long numArcs;
+ /** The square of {@link #numNodes}, cached. */
+ protected final double squareNumNodes;
+ /** The number of cores used in the computation. */
+ protected final int numberOfThreads;
+ /** The size of an I/O buffer, in counters. */
+ protected final int bufferSize;
+ /** The number of actually scanned nodes per task in a multithreaded environment. <strong>Must</strong> be a multiple of {@link Long#SIZE}. */
+ protected final long granularity;
+ /** The number of nodes per task (obtained by adapting {@link #granularity} to the current ratio of modified nodes). <strong>Must</strong> be a multiple of {@link Long#SIZE}. */
+ protected long adaptiveGranularity;
+ /** The value computed by the last iteration. */
+ protected double last;
+ /** The value computed by the current iteration. */
+ protected double current;
+ /** The current iteration. */
+ protected int iteration;
+ /** If {@link #external} is true, the name of the temporary file that will be used to write the update list. */
+ protected final File updateFile;
+ /** If {@link #external} is true, a file channel used to write to the update list. */
+ protected final FileChannel fileChannel;
+ /** If {@link #external} is true, the random-access file underlying {@link #fileChannel}. */
+ protected RandomAccessFile randomAccessFile;
+ /** The cumulative list of outdegrees. */
+ protected final EliasFanoCumulativeOutdegreeList cumulativeOutdegrees;
+ /** A progress logger, or <code>null</code>. */
+ protected final ProgressLogger pl;
+ /** The lock protecting all critical sections. */
+ protected final ReentrantLock lock;
+ /** A condition that is notified when all iteration threads are waiting to be started. */
+ protected final Condition allWaiting;
+ /** The condition on which all iteration threads wait before starting a new phase. */
+ protected final Condition start;
+ /** The current computation phase. */
+ public int phase;
+ /** Whether this approximator has been already closed. */
+ protected boolean closed;
+ /** The threads performing the computation. */
+ protected final IterationThread thread[];
+ /** An atomic integer keeping track of the number of node processed so far. */
+ protected final AtomicLong nodes;
+ /** An atomic integer keeping track of the number of arcs processed so far. */
+ protected final AtomicLong arcs;
+ /** A variable used to wait for all threads to complete their iteration. */
+ protected volatile int aliveThreads;
+ /** True if the computation is over. */
+ protected volatile boolean completed;
+ /** Total number of write operation performed on {@link #fileChannel}. */
+ protected volatile long numberOfWrites;
+ /** Total wait time in milliseconds of I/O activity on {@link #fileChannel}. */
+ protected volatile long totalIoMillis;
+ /** The starting node of the next chunk of nodes to be processed. */
+ protected long nextNode;
+ /** The number of arcs before {@link #nextNode}. */
+ protected long nextArcs;
+ /** The number of register modified by the last call to {@link #iterate()}. */
+ protected final AtomicLong modified;
+ /** Counts the number of unwritten entries when {@link #external} is true, or
+ * the number of counters that did not change their value. */
+ protected final AtomicLong unwritten;
+ /** The relative increment of the neighbourhood function for the last iteration. */
+ protected double relativeIncrement;
+ /** Whether we should used an update list on disk, instead of computing results in core memory. */
+ protected boolean external;
+ /** If {@link #external} is false, the arrays where results are stored. */
+ protected final long[][] resultBits;
+ /** If {@link #external} is false, a {@link #registerSize}-bit views of {@link #resultBits}. */
+ protected final LongBigList resultRegisters[];
+ /** For each counter, whether it has changed its value. We use an array of boolean (instead of a {@link LongArrayBitVector}) just for access speed. */
+ protected boolean[][] modifiedCounter;
+ /** For each newly computed counter, whether it has changed its value. {@link #modifiedCounter}
+ * will be updated with the content of this bit vector by the end of the iteration. */
+ protected boolean[][] modifiedResultCounter;
+ /** For each counter, whether it has changed its value. We use an array of boolean (instead of a {@link LongArrayBitVector}) just for access speed. */
+ protected boolean[][] nextMustBeChecked;
+ /** For each newly computed counter, whether it has changed its value. {@link #modifiedCounter}
+ * will be updated with the content of this bit vector by the end of the iteration. */
+ protected boolean[][] mustBeChecked;
+ /** If {@link #local} is true, the sorted list of nodes that should be scanned. */
+ protected long[] localCheckList;
+ /** If {@link #preLocal} is true, the list of nodes that should be scanned on the next iteration. Note that this set is synchronized. */
+ protected final LongSet localNextMustBeChecked;
+ /** One of the throwables thrown by some of the threads, if at least one thread has thrown a throwable. */
+ protected volatile Throwable threadThrowable;
+
+ protected final static int ensureRegisters(final int log2m) {
+ if (log2m < 4) throw new IllegalArgumentException("There must be at least 16 registers per counter");
+ if (log2m > 60) throw new IllegalArgumentException("There can be at most 2^60 registers per counter");
+ return log2m;
+ }
+
+ /** Computes the number of threads.
+ *
+ * <p>If the specified number of threads is zero, {@link Runtime#availableProcessors()} will be returned.
+ *
+ * @param suggestedNumberOfThreads
+ * @return the actual number of threads.
+ */
+ private final static int numberOfThreads(final int suggestedNumberOfThreads) {
+ if (suggestedNumberOfThreads != 0) return suggestedNumberOfThreads;
+ return Runtime.getRuntime().availableProcessors();
+ }
+
+ /** Creates a new HyperBall instance.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code> in case you want to perform systolic computations, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ * @param numberOfThreads the number of threads to be used (0 for automatic sizing).
+ * @param bufferSize the size of an I/O buffer in bytes (0 for {@link #DEFAULT_BUFFER_SIZE}).
+ * @param granularity the number of node per task in a multicore environment (it will be rounded to the next multiple of 64), or 0 for {@link #DEFAULT_GRANULARITY}.
+ * @param external if true, results of an iteration will be stored on disk.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl, final int numberOfThreads, final int bufferSize, final int granularity, final boolean external) throws IOException {
+ this(g, gt, log2m, pl, numberOfThreads, bufferSize, granularity, external, false, false, null, Util.randomSeed());
+ }
+
+ /** Creates a new HyperBall instance using default values.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code> in case you want to perform systolic computations, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m) throws IOException {
+ this(g, gt, log2m, null, 0, 0, 0, false);
+ }
+
+ /** Creates a new HyperBall instance using default values.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code> in case you want to perform systolic computations, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl) throws IOException {
+ this(g, null, log2m, pl, 0, 0, 0, false);
+ }
+
+ /** Creates a new HyperBall instance using default values and disabling systolic computation.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ */
+ public HyperBall(final ImmutableGraph g, final int log2m) throws IOException {
+ this(g, null, log2m);
+ }
+
+ /** Creates a new HyperBall instance using default values and disabling systolic computation.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public HyperBall(final ImmutableGraph g, final int log2m, final long seed) throws IOException {
+ this(g, null, log2m, null, 0, 0, 0, false, false, false, null, seed);
+ }
+
+ /** Creates a new HyperBall instance using default values and disabling systolic computation.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public HyperBall(final ImmutableGraph g, final int log2m, final ProgressLogger pl) throws IOException {
+ this(g, null, log2m, pl);
+ }
+
+
+ /** Creates a new HyperBall instance.
+ *
+ * @param g the graph whose neighbourhood function you want to compute.
+ * @param gt the transpose of <code>g</code>, or <code>null</code>.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ * @param numberOfThreads the number of threads to be used (0 for automatic sizing).
+ * @param bufferSize the size of an I/O buffer in bytes (0 for {@link #DEFAULT_BUFFER_SIZE}).
+ * @param granularity the number of node per task in a multicore environment (it will be rounded to the next multiple of 64), or 0 for {@link #DEFAULT_GRANULARITY}.
+ * @param external if true, results of an iteration will be stored on disk.
+ * @param doSumOfDistances whether the sum of distances from each node should be computed.
+ * @param doSumOfInverseDistances whether the sum of inverse distances from each node should be computed.
+ * @param discountFunction an array (possibly <code>null</code>) of discount functions.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public HyperBall(final ImmutableGraph g, final ImmutableGraph gt, final int log2m, final ProgressLogger pl, final int numberOfThreads, final int bufferSize, final int granularity, final boolean external, boolean doSumOfDistances, boolean doSumOfInverseDistances, final Int2DoubleFunction[] discountFunction, final long seed) throws IOException {
+ super(g.numNodes(), g.numNodes(), ensureRegisters(log2m), seed);
+
+ info("Seed : " + Long.toHexString(seed));
+
+ gotTranspose = gt != null;
+ localNextMustBeChecked = gotTranspose ? LongSets.synchronize(new LongOpenHashSet(Hash.DEFAULT_INITIAL_SIZE, Hash.VERY_FAST_LOAD_FACTOR)) : null;
+
+ numNodes = g.numNodes();
+ try {
+ numArcs = g.numArcs();
+ }
+ catch(UnsupportedOperationException e) {
+ // No number of arcs. We have to enumerate.
+ long a = 0;
+ final NodeIterator nodeIterator = g.nodeIterator();
+ for(long i = g.numNodes(); i-- != 0;) {
+ nodeIterator.nextLong();
+ a += nodeIterator.outdegree();
+ }
+ numArcs = a;
+ }
+ squareNumNodes = (double)numNodes * numNodes;
+
+ cumulativeOutdegrees = new EliasFanoCumulativeOutdegreeList(g, numArcs, Math.max(0, 64 / m - 1));
+
+ modifiedCounter = BooleanBigArrays.newBigArray(numNodes);
+ modifiedResultCounter = external ? null : BooleanBigArrays.newBigArray(numNodes);
+ if (gt != null) {
+ mustBeChecked = BooleanBigArrays.newBigArray(numNodes);
+ nextMustBeChecked = BooleanBigArrays.newBigArray(numNodes);
+ if (gt.numNodes() != g.numNodes()) throw new IllegalArgumentException("The graph and its transpose have a different number of nodes");
+ if (gt.numArcs() != g.numArcs()) throw new IllegalArgumentException("The graph and its transpose have a different number of arcs");
+ }
+
+ this.pl = pl;
+ this.external = external;
+ this.doSumOfDistances = doSumOfDistances;
+ this.doSumOfInverseDistances = doSumOfInverseDistances;
+ this.discountFunction = discountFunction == null ? new Int2DoubleFunction[0] : discountFunction;
+ this.numberOfThreads = numberOfThreads(numberOfThreads);
+ this.granularity = numberOfThreads == 1 ? numNodes : granularity == 0 ? DEFAULT_GRANULARITY : ((granularity + Long.SIZE - 1) & ~(Long.SIZE - 1));
+ this.bufferSize = Math.max(1, (bufferSize == 0 ? DEFAULT_BUFFER_SIZE : bufferSize) / ((Long.SIZE / Byte.SIZE) * (counterLongwords + 1)));
+
+ info("Relative standard deviation: " + Util.format(100 * HyperLogLogCounterArray.relativeStandardDeviation(log2m)) + "% (" + m + " registers/counter, " + registerSize + " bits/register, " + Util.format(m * registerSize / 8.) + " bytes/counter)");
+ if (external) info("Running " + this.numberOfThreads + " threads with a buffer of " + Util.formatSize(this.bufferSize) + " counters");
+ else info("Running " + this.numberOfThreads + " threads");
+
+ thread = new IterationThread[this.numberOfThreads];
+
+ if (external) {
+ info("Creating update list...");
+ updateFile = File.createTempFile(HyperBall.class.getName(), "-temp");
+ updateFile.deleteOnExit();
+ fileChannel = (randomAccessFile = new RandomAccessFile(updateFile, "rw")).getChannel();
+ }
+ else {
+ updateFile = null;
+ fileChannel = null;
+ }
+
+ nodes = new AtomicLong();
+ arcs = new AtomicLong();
+ modified = new AtomicLong();
+ unwritten = new AtomicLong();
+
+ neighbourhoodFunction = new DoubleArrayList();
+ sumOfDistances = doSumOfDistances ? FloatBigArrays.newBigArray(numNodes) : null;
+ sumOfInverseDistances = doSumOfInverseDistances ? FloatBigArrays.newBigArray(numNodes) : null;
+ discountedCentrality = new float[this.discountFunction.length][][];
+ for (int i = 0; i < this.discountFunction.length; i++) discountedCentrality[i] = FloatBigArrays.newBigArray(numNodes);
+
+ info("HyperBall memory usage: " + Util.formatSize2(usedMemory()) + " [not counting graph(s)]");
+
+ if (! external) {
+ info("Allocating result bit vectors...");
+ // Allocate vectors that will store the result.
+ resultBits = new long[bits.length][];
+ resultRegisters = new LongBigList[bits.length];
+ for(int i = bits.length; i-- != 0;) resultRegisters[i] = (LongArrayBitVector.wrap(resultBits[i] = new long[bits[i].length])).asLongBigList(registerSize);
+ }
+ else {
+ resultBits = null;
+ resultRegisters = null;
+ }
+
+ lock = new ReentrantLock();
+ allWaiting = lock.newCondition();
+ start = lock.newCondition();
+ aliveThreads = this.numberOfThreads;
+
+ if (this.numberOfThreads == 1) (thread[0] = new IterationThread(g, gt, 0)).start();
+ else for(int i = 0; i < this.numberOfThreads; i++) (thread[i] = new IterationThread(g.copy(), gt != null ? gt.copy() : null, i)).start();
+
+ // We wait for all threads being ready to start.
+ lock.lock();
+ try {
+ while(aliveThreads != 0) allWaiting.await();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ finally {
+ lock.unlock();
+ }
+ }
+
+ private void info(String s) {
+ if (pl != null) pl.logger().info(s);
+ }
+
+ private long usedMemory() {
+ long bytes = 0;
+ for(long[] a: bits) bytes += a.length * ((long)Long.SIZE / Byte.SIZE);
+ if (! external) bytes *= 2;
+ if (sumOfDistances != null) bytes += sumOfDistances.length * ((long)Float.SIZE / Byte.SIZE);
+ if (sumOfInverseDistances != null) bytes += sumOfInverseDistances.length * ((long)Float.SIZE / Byte.SIZE);
+ for (int i = discountFunction.length; i-- != 0;) bytes += discountedCentrality[i].length * ((long)Float.SIZE / Byte.SIZE);
+ if (modifiedCounter != null) bytes += modifiedCounter.length;
+ if (modifiedResultCounter != null) bytes += modifiedResultCounter.length;
+ if (nextMustBeChecked != null) bytes += nextMustBeChecked.length;
+ if (mustBeChecked != null) bytes += mustBeChecked.length;
+ return bytes;
+ }
+
+ private void ensureOpen() {
+ if (closed) throw new IllegalStateException("This " + HyperBall.class.getSimpleName() + " has been closed.");
+ }
+
+ /** Initialises the approximator.
+ *
+ * <p>This method must be call before a series of {@linkplain #iterate() iterations}.
+ * Note that it will <em>not</em> change the seed used by the underlying {@link HyperLogLogCounterArray}.
+ *
+ * @see #init(long)
+ */
+ public void init() {
+ init(seed);
+ }
+
+ /** Initialises the approximator, providing a new seed to the underlying {@link HyperLogLogCounterArray}.
+ *
+ * <p>This method must be call before a series of {@linkplain #iterate() iterations}.
+ * @param seed passed to {@link #clear(long)}.
+ */
+ public void init(final long seed) {
+ ensureOpen();
+ info("Clearing all registers...");
+ clear(seed);
+
+ // We load the counter i with node i.
+ for(long i = numNodes; i-- != 0;) add(i, i);
+
+ iteration = -1;
+ completed = systolic = local = preLocal = false;
+
+ if (! external) for(long[] a: resultBits) Arrays.fill(a, 0);
+
+ if (sumOfDistances != null) FloatBigArrays.fill(sumOfDistances, 0);
+ if (sumOfInverseDistances != null) FloatBigArrays.fill(sumOfInverseDistances, 0);
+ for (int i = 0; i < discountFunction.length; i++) FloatBigArrays.fill(discountedCentrality[i], 0);
+
+ // The initial value (the iteration for this value does not actually happen).
+ neighbourhoodFunction.add(last = numNodes);
+
+ BooleanBigArrays.fill(modifiedCounter, true); // Initially, all counters are modified.
+
+ if (pl != null) {
+ pl.displayFreeMemory = true;
+ pl.itemsName = "iterates";
+ pl.start("Iterating...");
+ }
+ }
+
+ @Override
+ public void close() throws IOException {
+ if (closed) return;
+ closed = true;
+
+ lock.lock();
+ try {
+ completed = true;
+ for(IterationThread t: thread) t.threadShouldWait = false;
+ start.signalAll();
+ }
+ finally {
+ lock.unlock();
+ }
+
+ for(Thread t: thread)
+ try {
+ t.join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (external) {
+ randomAccessFile.close();
+ fileChannel.close();
+ updateFile.delete();
+ }
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ if (! closed) {
+ LOGGER.warn("This " + this.getClass().getName() + " [" + toString() + "] should have been closed.");
+ close();
+ }
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+
+ private final class IterationThread extends Thread {
+ /** A copy of the graph for this thread only. */
+ private final ImmutableGraph g;
+ /** A copy of the transpose graph for this thread only. */
+ private final ImmutableGraph gt;
+ /** The index of this thread (just used to identify the thread). */
+ private final int index;
+ /** True if we should wait for the end of the current phase. */
+ private boolean threadShouldWait;
+
+ /** Create a new iteration thread.
+ * @param index the index of this thread (just used to identify the thread).
+ */
+ private IterationThread(final ImmutableGraph g, ImmutableGraph gt, final int index) {
+ this.g = g;
+ this.gt = gt;
+ this.index = index;
+ }
+
+ private final boolean synchronize(final int phase) throws InterruptedException {
+ lock.lock();
+ try {
+ threadShouldWait = true;
+ if (--aliveThreads == 0) allWaiting.signal();
+ if (aliveThreads < 0) throw new IllegalStateException();
+ while (threadShouldWait) start.await();
+ if (completed) return true;
+ if (phase != HyperBall.this.phase) throw new IllegalStateException("Main thread is in phase " + HyperBall.this.phase + ", but thread " + index + " is heading to phase " + phase);
+ return false;
+ }
+ finally {
+ lock.unlock();
+ }
+ }
+
+ @Override
+ public void run() {
+ try {
+ // Lots of local caching.
+ final int registerSize = HyperBall.this.registerSize;
+ final int counterLongwords = HyperBall.this.counterLongwords;
+ final boolean external = HyperBall.this.external;
+ final ImmutableGraph g = this.g;
+ final boolean doSumOfDistances = HyperBall.this.doSumOfDistances;
+ final boolean doSumOfInverseDistances = HyperBall.this.doSumOfInverseDistances;
+ final int numberOfDiscountFunctions = HyperBall.this.discountFunction.length;
+ final boolean doCentrality = doSumOfDistances || doSumOfInverseDistances || numberOfDiscountFunctions != 0;
+
+ final long[] accumulator = new long[counterLongwords];
+ final long[] mask = new long[counterLongwords];
+
+ final long t[] = new long[counterLongwords];
+ final long prevT[] = new long[counterLongwords];
+ final long u[] = new long[counterLongwords];
+
+ final ByteBuffer byteBuffer = external ? ByteBuffer.allocate((Long.SIZE / Byte.SIZE) * bufferSize * (counterLongwords + 1)) : null;
+ if (external) byteBuffer.clear();
+
+ for(;;) {
+
+ if (synchronize(0)) return;
+
+ // These variables might change across executions of the loop body.
+ final long granularity = HyperBall.this.adaptiveGranularity;
+ final long arcGranularity = (long)Math.ceil((double)numArcs * granularity / numNodes);
+ final long bits[][] = HyperBall.this.bits;
+ final long resultBits[][] = HyperBall.this.resultBits;
+ final boolean[][] modifiedCounter = HyperBall.this.modifiedCounter;
+ final boolean[][] modifiedResultCounter = HyperBall.this.modifiedResultCounter;
+ final boolean[][] mustBeChecked = HyperBall.this.mustBeChecked;
+ final boolean[][] nextMustBeChecked = HyperBall.this.nextMustBeChecked;
+ final boolean systolic = HyperBall.this.systolic;
+ final boolean local = HyperBall.this.local;
+ final boolean preLocal = HyperBall.this.preLocal;
+ final int localCheckShift = 6 - log2m;
+ final long[] localCheckList = HyperBall.this.localCheckList;
+ final LongSet localNextMustBeChecked = HyperBall.this.localNextMustBeChecked;
+
+ long start = -1;
+ long end = -1;
+ long modified = 0; // The number of registers that have been modified during the computation of the present task.
+ long unwritten = 0; // The number of counters not written to disk.
+
+ // In a local computation tasks are based on the content of localCheckList.
+ long upperLimit = local ? localCheckList.length : numNodes;
+
+ /* During standard iterations, cumulates the neighbourhood function for the nodes scanned
+ * by this thread. During systolic iterations, cumulates the *increase* of the
+ * neighbourhood function for the nodes scanned by this thread. */
+ final KahanSummation neighbourhoodFunctionDelta = new KahanSummation();
+
+ for(;;) {
+
+ // Try to get another piece of work.
+ synchronized(HyperBall.this.cumulativeOutdegrees) {
+ if (nextNode == upperLimit) break;
+ start = nextNode;
+ if (local) {
+ nextNode++;
+ if (log2m < 6) {
+ /* We cannot split the list unless the boundary crosses a
+ * multiple of 1 << localCheckShift. Otherwise, we might create
+ * race conditions with other threads. */
+ while(nextNode < upperLimit) {
+ if (((localCheckList[(int)(nextNode - 1)] ^ localCheckList[(int)nextNode]) >>> localCheckShift) != 0) break;
+ nextNode++;
+ }
+ }
+ }
+ else {
+ final long target = nextArcs + arcGranularity;
+ if (target >= numArcs) nextNode = numNodes;
+ else {
+ nextArcs = cumulativeOutdegrees.skipTo(target);
+ nextNode = cumulativeOutdegrees.currentIndex();
+ }
+ }
+ end = nextNode;
+ }
+
+ final NodeIterator nodeIterator = local || systolic ? null : g.nodeIterator(start);
+ long arcs = 0;
+
+ for(long i = start; i < end; i++) {
+ final long node = local ? localCheckList[(int)i] : i;
+ /* The three cases in which we enumerate successors:
+ * 1) A non-systolic computation (we don't know anything, so we enumerate).
+ * 2) A systolic, local computation (the node is by definition to be checked, as it comes from the local check list).
+ * 3) A systolic, non-local computation in which the node should be checked.
+ */
+ if (! systolic || local || BooleanBigArrays.get(mustBeChecked, node)) {
+ long d;
+ long[][] successor = null;
+ LazyLongIterator successors = null;
+
+ if (local || systolic) {
+ d = g.outdegree(node);
+ successors = g.successors(node);
+ }
+ else {
+ nodeIterator.nextLong();
+ d = nodeIterator.outdegree();
+ successor = nodeIterator.successorBigArray();
+ }
+
+ final int chunk = chunk(node);
+ getCounter(bits[chunk], node, t);
+ // Caches t's values into prevT
+ System.arraycopy(t, 0, prevT, 0, counterLongwords);
+
+ boolean counterModified = false;
+
+ for(long j = d; j-- != 0;) {
+ final long s = local || systolic ? successors.nextLong() : LongBigArrays.get(successor, j);
+ /* Neither self-loops nor unmodified counter do influence the computation. */
+ if (s != node && BooleanBigArrays.get(modifiedCounter, s)) {
+ counterModified = true; // This is just to mark that we entered the loop at least once.
+ getCounter(bits[chunk(s)], s, u);
+ max(t, u, accumulator, mask);
+ }
+ }
+
+ arcs += d;
+
+ if (ASSERTS) {
+ LongBigList test = LongArrayBitVector.wrap(t).asLongBigList(registerSize);
+ for(int rr = 0; rr < m; rr++) {
+ int max = (int)registers[chunk(node)].getLong((node << log2m) + rr);
+ if (local || systolic) successors = g.successors(node);
+ for(long j = d; j-- != 0;) {
+ final long s = local || systolic ? successors.nextLong() : LongBigArrays.get(successor, j);
+ max = Math.max(max, (int)registers[chunk(s)].getLong((s << log2m) + rr));
+ }
+ assert max == test.getLong(rr) : max + "!=" + test.getLong(rr) + " [" + rr + "]";
+ }
+ }
+
+ if (counterModified) {
+ /* If we enter this branch, we have maximised with at least one successor.
+ * We must thus check explicitly whether we have modified the counter. */
+ counterModified = false;
+ for(int p = counterLongwords; p-- != 0;)
+ if (prevT[p] != t[p]) {
+ counterModified = true;
+ break;
+ }
+ }
+
+ double post = Double.NaN;
+
+ /* We need the counter value only if the iteration is standard (as we're going to
+ * compute the neighbourhood function cumulating actual values, and not deltas) or
+ * if the counter was actually modified (as we're going to cumulate the neighbourhood
+ * function delta, or at least some centrality). */
+ if (! systolic || counterModified) post = count(t, 0);
+ if (! systolic) neighbourhoodFunctionDelta.add(post);
+
+ // Here counterModified is true only if the counter was *actually* modified.
+ if (counterModified && (systolic || doCentrality)) {
+ final double pre = count(node);
+ if (systolic) {
+ neighbourhoodFunctionDelta.add(-pre);
+ neighbourhoodFunctionDelta.add(post);
+ }
+
+ if (doCentrality) {
+ final double delta = post - pre;
+ // Note that this code is executed only for distances > 0.
+ if (delta > 0) { // Force monotonicity
+ if (doSumOfDistances) FloatBigArrays.add(sumOfDistances, node, (float)(delta * (iteration + 1)));
+ if (doSumOfInverseDistances) FloatBigArrays.add(sumOfInverseDistances, node, (float)(delta / (iteration + 1)));
+ for (int j = numberOfDiscountFunctions; j-- != 0;) FloatBigArrays.add(discountedCentrality[j], node, (float)(delta * discountFunction[j].get(iteration + 1)));
+ }
+ }
+ }
+
+ if (counterModified) {
+ /* We keep track of modified counters in the result if we are
+ * not in external mode (in external mode modified counters are
+ * computed when the update list is reloaded). Note that we must
+ * add the current node to the must-be-checked set for the next
+ * local iteration if it is modified, as it might need a copy to
+ * the result array at the next iteration. */
+ if (preLocal) localNextMustBeChecked.add(node);
+ if (! external) BooleanBigArrays.set(modifiedResultCounter, node, true);
+
+ if (systolic) {
+ final LazyLongIterator predecessors = gt.successors(node);
+ long p;
+ /* In systolic computations we must keep track of which counters must
+ * be checked on the next iteration. If we are preparing a local computation,
+ * we do this explicitly, by adding the predecessors of the current
+ * node to a set. Otherwise, we do this implicitly, by setting the
+ * corresponding entry in an array. */
+ if (preLocal) while((p = predecessors.nextLong()) != -1) localNextMustBeChecked.add(p);
+ else while((p = predecessors.nextLong()) != -1) BooleanBigArrays.set(nextMustBeChecked, p, true);
+ }
+
+ modified++;
+ }
+
+ if (external) {
+ if (counterModified) {
+ byteBuffer.putLong(node);
+ for(int p = counterLongwords; p-- != 0;) byteBuffer.putLong(t[p]);
+
+ if (! byteBuffer.hasRemaining()) {
+ byteBuffer.flip();
+ long time = -System.currentTimeMillis();
+ fileChannel.write(byteBuffer);
+ time += System.currentTimeMillis();
+ totalIoMillis += time;
+ numberOfWrites++;
+ byteBuffer.clear();
+ }
+ }
+ else unwritten++;
+ }
+ else {
+ /* This is slightly subtle: if a counter is not modified, and
+ * the present value was not a modified value in the first place,
+ * then we can avoid updating the result altogether. */
+ if (counterModified || BooleanBigArrays.get(modifiedCounter, node)) setCounter(t, resultBits[chunk], node);
+ else unwritten++;
+ }
+ }
+ else if (! external) {
+ /* Even if we cannot possibly have changed our value, still our copy
+ * in the result vector might need to be updated because it does not
+ * reflect our current value. */
+ if (BooleanBigArrays.get(modifiedCounter, node)) {
+ final int chunk = chunk(node);
+ transfer(bits[chunk], resultBits[chunk], node);
+ }
+ else unwritten++;
+ }
+ }
+
+ // Update the global progress counter.
+ HyperBall.this.arcs.addAndGet(arcs);
+ nodes.addAndGet(end - start);
+ }
+
+ if (external) {
+ // If we can avoid at all calling FileChannel.write(), we do so.
+ if(byteBuffer.position() != 0) {
+ byteBuffer.flip();
+ long time = -System.currentTimeMillis();
+ fileChannel.write(byteBuffer);
+ time += System.currentTimeMillis();
+ totalIoMillis += time;
+ numberOfWrites++;
+ byteBuffer.clear();
+ }
+ }
+
+ HyperBall.this.modified.addAndGet(modified);
+ HyperBall.this.unwritten.addAndGet(unwritten);
+
+ synchronized(HyperBall.this) {
+ current += neighbourhoodFunctionDelta.value();
+ }
+
+ if (external) {
+ synchronize(1);
+ /* Read into memory newly computed counters, updating modifiedCounter.
+ * Note that if m is less than 64 copyFromLocal(), being unsynchronised, might
+ * cause race conditions (when maximising each thread writes in a longword-aligned
+ * block of memory, so no race conditions can arise). Since synchronisation would
+ * lead to significant contention (as we cannot synchronise at a level finer than
+ * a bit vector, and update lists might be quite dense and local), we prefer simply
+ * to do the update with thread 0 only. */
+ if (index == 0 || m >= Long.SIZE) for(;;) {
+ byteBuffer.clear();
+ if (fileChannel.read(byteBuffer) <= 0) break;
+ byteBuffer.flip();
+ while(byteBuffer.hasRemaining()) {
+ final long node = byteBuffer.getLong();
+ for(int p = counterLongwords; p-- != 0;) t[p] = byteBuffer.getLong();
+ setCounter(t, bits[chunk(node)], node);
+ BooleanBigArrays.set(modifiedCounter, node, true);
+ }
+ }
+ }
+ }
+ }
+ catch(Throwable t) {
+ t.printStackTrace();
+ threadThrowable = t;
+ lock.lock();
+ try {
+ if (--aliveThreads == 0) allWaiting.signal();
+ }
+ finally {
+ lock.unlock();
+ }
+ }
+ }
+
+ @Override
+ public String toString() {
+ return "Thread " + index;
+ }
+ }
+
+ /** Performs a new iteration of HyperBall. */
+ public void iterate() throws IOException {
+ ensureOpen();
+ try {
+ iteration++;
+
+ // Let us record whether the previous computation was systolic or local.
+ final boolean previousWasSystolic = systolic;
+ final boolean previousWasLocal = local;
+
+ /* If less than one fourth of the nodes have been modified, and we have the transpose,
+ * it is time to pass to a systolic computation. */
+ systolic = gotTranspose && iteration > 0 && modified.get() < numNodes / 2;
+
+ /* Non-systolic computations add up the value of all counter.
+ * Systolic computations modify the last value by compensating for each modified counter. */
+ current = systolic ? last : 0;
+
+ // If we completed the last iteration in pre-local mode, we MUST run in local mode.
+ local = preLocal;
+
+ // We run in pre-local mode if we are systolic and few nodes where modified.
+ preLocal = systolic && modified.get() < .1 * numNodes * numNodes / numArcs;
+
+ info("Starting " + (systolic ? "systolic iteration (local: " + local + "; pre-local: " + preLocal + ")" : "standard " + (external ? "external " : "") + "iteration"));
+
+ if (! external) {
+ if (previousWasLocal) for(long x: localCheckList) BooleanBigArrays.set(modifiedResultCounter, x, false);
+ else BooleanBigArrays.fill(modifiedResultCounter, false);
+ }
+
+ if (local) {
+ /* In case of a local computation, we convert the set of must-be-checked for the
+ * next iteration into a check list. */
+ localCheckList = localNextMustBeChecked.toLongArray();
+ LongArrays.parallelQuickSort(localCheckList);
+ localNextMustBeChecked.clear();
+ }
+ else if (systolic) {
+ // Systolic, non-local computations store the could-be-modified set implicitly into this array.
+ BooleanBigArrays.fill(nextMustBeChecked, false);
+ // If the previous computation wasn't systolic, we must assume that all registers could have changed.
+ if (! previousWasSystolic) BooleanBigArrays.fill(mustBeChecked, true);
+ }
+
+ adaptiveGranularity = granularity;
+ if (numberOfThreads > 1 && ! local) {
+ if (iteration > 0) {
+ adaptiveGranularity = (long)Math.min(Math.max(1, numNodes / numberOfThreads), granularity * (numNodes / Math.max(1., modified())));
+ adaptiveGranularity = (adaptiveGranularity + Long.SIZE - 1) & ~(Long.SIZE - 1);
+ }
+ info("Adaptive granularity for this iteration: " + adaptiveGranularity);
+ }
+
+ modified.set(0);
+ totalIoMillis = 0;
+ numberOfWrites = 0;
+ final ProgressLogger npl = pl == null ? null : new ProgressLogger(LOGGER, 1, TimeUnit.MINUTES, "arcs");
+
+ if (npl != null) {
+ arcs.set(0);
+ npl.expectedUpdates = systolic || local ? -1 : numArcs;
+ npl.start("Scanning graph...");
+ }
+
+ nodes.set(0);
+ nextNode = nextArcs = 0;
+ unwritten.set(0);
+ if (external) fileChannel.position(0);
+
+ // Start all threads.
+ lock.lock();
+ try {
+ phase = 0;
+ aliveThreads = numberOfThreads;
+ for(IterationThread t: thread) t.threadShouldWait = false;
+ start.signalAll();
+
+ // Wait for all threads to complete their tasks, logging some stuff in the mean time.
+ while(aliveThreads != 0) {
+ allWaiting.await(1, TimeUnit.MINUTES);
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ final int aliveThreads = this.aliveThreads;
+ if (npl != null && aliveThreads != 0) {
+ if (arcs.longValue() != 0) npl.set(arcs.longValue());
+ if (external && numberOfWrites > 0) {
+ final long time = npl.millis();
+ info("Writes: " + numberOfWrites + "; per second: " + Util.format(1000.0 * numberOfWrites / time));
+ info("I/O time: " + Util.format((totalIoMillis / 1000.0)) + "s; per write: " + (totalIoMillis / 1000.0) / numberOfWrites + "s");
+ }
+ if (aliveThreads != 0) info("Alive threads: " + aliveThreads + " (" + Util.format(100.0 * aliveThreads / numberOfThreads) + "%)");
+ }
+ }
+ }
+ finally {
+ lock.unlock();
+ }
+
+ if (npl != null) {
+ npl.done(arcs.longValue());
+ if (! external) info("Unwritten counters: " + Util.format(unwritten.longValue()) + " (" + Util.format(100.0 * unwritten.longValue() / numNodes) + "%)");
+ info("Unmodified counters: " + Util.format(numNodes - modified.longValue()) + " (" + Util.format(100.0 * (numNodes - modified.longValue()) / numNodes) + "%)");
+ }
+
+
+ if (external) {
+ if (npl != null) {
+ npl.itemsName = "counters";
+ npl.start("Updating counters...");
+ }
+
+ // Read into memory the newly computed counters.
+
+ fileChannel.truncate(fileChannel.position());
+ fileChannel.position(0);
+
+ // In pre-local mode, we do not clear modified counters.
+ if (! preLocal) BooleanBigArrays.fill(modifiedCounter, false);
+
+ lock.lock();
+ try {
+ phase = 1;
+ aliveThreads = numberOfThreads;
+ for(IterationThread t: thread) t.threadShouldWait = false;
+ start.signalAll();
+ // Wait for all threads to complete the counter update.
+ while (aliveThreads != 0) allWaiting.await();
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ }
+ finally {
+ lock.unlock();
+ }
+
+ if (npl != null) {
+ npl.count = modified();
+ npl.done();
+ }
+ }
+ else {
+ // Switch the bit vectors.
+ for(int i = 0; i < bits.length; i++) {
+ if (npl != null) npl.update(bits[i].length);
+ final LongBigList r = registers[i];
+ registers[i] = resultRegisters[i];
+ resultRegisters[i] = r;
+ final long[] b = bits[i];
+ bits[i] = resultBits[i];
+ resultBits[i] = b;
+ }
+
+ // Switch modifiedCounters and modifiedResultCounters, and fill with zeroes the latter.
+ final boolean[][] t = modifiedCounter;
+ modifiedCounter = modifiedResultCounter;
+ modifiedResultCounter = t;
+ }
+
+ if (systolic) {
+ // Switch mustBeChecked and nextMustBeChecked, and fill with zeroes the latter.
+ final boolean[][] t = mustBeChecked;
+ mustBeChecked = nextMustBeChecked;
+ nextMustBeChecked = t;
+ }
+
+ last = current;
+ /* We enforce monotonicity. Non-monotonicity can only be caused
+ * by approximation errors. */
+ final double lastOutput = neighbourhoodFunction.getDouble(neighbourhoodFunction.size() - 1);
+ if (current < lastOutput) current = lastOutput;
+ relativeIncrement = current / lastOutput;
+
+ if (pl != null) {
+ pl.logger().info("Pairs: " + current + " (" + current * 100.0 / squareNumNodes + "%)");
+ pl.logger().info("Absolute increment: " + (current - lastOutput));
+ pl.logger().info("Relative increment: " + relativeIncrement);
+ }
+
+ neighbourhoodFunction.add(current);
+
+ if (pl != null) pl.updateAndDisplay();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ /** Returns the number of HyperLogLog counters that were modified by the last call to {@link #iterate()}.
+ *
+ * @return the number of HyperLogLog counters that were modified by the last call to {@link #iterate()}.
+ */
+ public long modified() {
+ return modified.get();
+ }
+
+ /** Runs HyperBall. The computation will stop when {@link #modified()} returns false. */
+ public void run() throws IOException {
+ run(Long.MAX_VALUE);
+ }
+
+ /** Runs HyperBall.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ */
+ public void run(final long upperBound) throws IOException {
+ run(upperBound, -1);
+ }
+
+ /** Runs HyperBall.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ * @param threshold a value that will be used to stop the computation by relative increment if the neighbourhood function is being computed; if you specify -1,
+ * the computation will stop when {@link #modified()} returns false.
+ */
+ public void run(long upperBound, final double threshold) throws IOException {
+ run(upperBound, threshold, seed);
+ }
+
+ /** Runs HyperBall.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ * @param threshold a value that will be used to stop the computation by relative increment if the neighbourhood function is being computed; if you specify -1,
+ * the computation will stop when {@link #modified()} returns false.
+ * @param seed the random seed passed to {@link HyperLogLogCounterArray#HyperLogLogCounterArray(long, long, int, long)}.
+ */
+ public void run(long upperBound, final double threshold, final long seed) throws IOException {
+ upperBound = Math.min(upperBound, numNodes);
+
+ init(seed);
+
+ for(long i = 0; i < upperBound; i++) {
+ iterate();
+
+ if (modified() == 0) {
+ info("Terminating approximation after " + i + " iteration(s) by stabilisation");
+ break;
+ }
+
+ if (i > 3 && relativeIncrement < (1 + threshold)) {
+ info("Terminating approximation after " + i + " iteration(s) by relative bound on the neighbourhood function");
+ break;
+ }
+ }
+
+ if (pl != null) pl.done();
+ }
+
+ /** Throws a {@link NotSerializableException}, as this class implements {@link Serializable}
+ * because it extends {@link HyperLogLogCounterArray}, but it's not really. */
+ private void writeObject(@SuppressWarnings("unused") final ObjectOutputStream oos) throws IOException {
+ throw new NotSerializableException();
+ }
+
+
+ public static void main(String arg[]) throws IOException, JSAPException, IllegalArgumentException, ClassNotFoundException, IllegalAccessException, InvocationTargetException, InstantiationException, NoSuchMethodException {
+ SimpleJSAP jsap = new SimpleJSAP(HyperBall.class.getName(), "Runs HyperBall on the given graph, possibly computing positive geometric centralities.\n\nPlease note that to compute negative centralities on directed graphs (which is usually what you want) you have to compute positive centralities on the transpose.",
+ new Parameter[] {
+ new FlaggedOption("log2m", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'l', "log2m", "The logarithm of the number of registers."),
+ new FlaggedOption("upperBound", JSAP.LONGSIZE_PARSER, Long.toString(Long.MAX_VALUE), JSAP.NOT_REQUIRED, 'u', "upper-bound", "An upper bound to the number of iterations."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, "-1", JSAP.NOT_REQUIRED, 't', "threshold", "A threshold that will be used to stop the computation by relative increment. If it is -1, the iteration will stop only when all registers do not change their value (recommended)."),
+ new FlaggedOption("threads", JSAP.INTSIZE_PARSER, "0", JSAP.NOT_REQUIRED, 'T', "threads", "The number of threads to be used. If 0, the number will be estimated automatically."),
+ new FlaggedOption("granularity", JSAP.INTSIZE_PARSER, Integer.toString(DEFAULT_GRANULARITY), JSAP.NOT_REQUIRED, 'g', "granularity", "The number of node per task in a multicore environment."),
+ new FlaggedOption("bufferSize", JSAP.INTSIZE_PARSER, Util.formatBinarySize(DEFAULT_BUFFER_SIZE), JSAP.NOT_REQUIRED, 'b', "buffer-size", "The size of an I/O buffer in bytes."),
+ new FlaggedOption("neighbourhoodFunction", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'n', "neighbourhood-function", "Store an approximation the neighbourhood function in text format."),
+ new FlaggedOption("sumOfDistances", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'd', "sum-of-distances", "Store an approximation of the sum of distances from each node as a binary list of floats."),
+ new FlaggedOption("harmonicCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'h', "harmonic-centrality", "Store an approximation of the positive harmonic centrality (the sum of the reciprocals of distances from each node) as a binary list of floats."),
+ new FlaggedOption("discountedGainCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'z', "discounted-gain-centrality", "A positive discounted gain centrality to be approximated and stored; it is specified as O:F where O is the spec of an object of type Int2DoubleFunction and F is the name of the file where the binary list of floats will be stored. The spec can be either the name of a public field of HyperBall, or a constructor invocation of a class implementing Int2DoubleFunction.").setAllowMultipleDeclarations(true),
+ new FlaggedOption("closenessCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'c', "closeness-centrality", "Store an approximation of the positive closeness centrality of each node (the reciprocal of sum of the distances from each node) as a binary list of floats. Terminal nodes will have centrality equal to zero."),
+ new FlaggedOption("linCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'L', "lin-centrality", "Store an approximation of the positive Lin centrality of each node (the reciprocal of sum of the distances from each node multiplied by the square of the number of nodes reachable from the node) as a binary list of floats. Terminal nodes will have centrality equal to one."),
+ new FlaggedOption("nieminenCentrality", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'N', "nieminen-centrality", "Store an approximation of the positive Nieminen centrality of each node (the square of the number of nodes reachable from each node minus the sum of the distances from the node) as a binary list of floats."),
+ new FlaggedOption("reachable", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'r', "reachable", "Store an approximation of the number of nodes reachable from each node as a binary list of floats."),
+ new FlaggedOption("seed", JSAP.LONG_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'S', "seed", "The random seed."),
+ new Switch("spec", 's', "spec", "The basename is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new Switch("offline", 'o', "offline", "Do not load the graph in main memory. If this option is used, the graph will be loaded in offline (for one thread) or mapped (for several threads) mode."),
+ new Switch("external", 'e', "external", "Use an external dump file instead of core memory to store new counter values. Note that the file might be very large: you might need to set suitably the Java temporary directory (-Djava.io.tmpdir=DIR)."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("basenamet", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the transpose graph for systolic computations (strongly suggested). If it is equal to <basename>, the graph will be assumed to be symmetric and will be loaded just once."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean external = jsapResult.getBoolean("external");
+ final boolean offline = jsapResult.getBoolean("offline");
+ final String neighbourhoodFunctionFile = jsapResult.getString("neighbourhoodFunction");
+ final boolean neighbourhoodFunction = jsapResult.userSpecified("neighbourhoodFunction");
+ final String sumOfDistancesFile = jsapResult.getString("sumOfDistances");
+ final boolean sumOfDistances = jsapResult.userSpecified("sumOfDistances");
+ final String harmonicCentralityFile = jsapResult.getString("harmonicCentrality");
+ final boolean harmonicCentrality = jsapResult.userSpecified("harmonicCentrality");
+ final String closenessCentralityFile = jsapResult.getString("closenessCentrality");
+ final boolean closenessCentrality = jsapResult.userSpecified("closenessCentrality");
+ final String linCentralityFile = jsapResult.getString("linCentrality");
+ final boolean linCentrality = jsapResult.userSpecified("linCentrality");
+ final String nieminenCentralityFile = jsapResult.getString("nieminenCentrality");
+ final boolean nieminenCentrality = jsapResult.userSpecified("nieminenCentrality");
+ final String reachableFile = jsapResult.getString("reachable");
+ final boolean reachable = jsapResult.userSpecified("reachable");
+ final String basename = jsapResult.getString("basename");
+ final String basenamet = jsapResult.getString("basenamet");
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ final int log2m = jsapResult.getInt("log2m");
+ final int threads = jsapResult.getInt("threads");
+ final int bufferSize = jsapResult.getInt("bufferSize");
+ final int granularity = jsapResult.getInt("granularity");
+ final long seed = jsapResult.userSpecified("seed") ? jsapResult.getLong("seed") : Util.randomSeed();
+
+ final String[] discountedGainCentralitySpec = jsapResult.getStringArray("discountedGainCentrality");
+ final Int2DoubleFunction[] discountFunction = new Int2DoubleFunction[discountedGainCentralitySpec.length];
+ final String[] discountedGainCentralityFile = new String[discountedGainCentralitySpec.length];
+ for (int i = 0; i < discountedGainCentralitySpec.length; i++) {
+ int pos = discountedGainCentralitySpec[i].indexOf(':');
+ if (pos < 0) throw new IllegalArgumentException("Wrong spec <" + discountedGainCentralitySpec[i] + ">");
+ discountedGainCentralityFile[i] = discountedGainCentralitySpec[i].substring(pos + 1);
+ String gainSpec = discountedGainCentralitySpec[i].substring(0, pos);
+ Int2DoubleFunction candidateFunction;
+ try {
+ candidateFunction = (Int2DoubleFunction)HyperBall.class.getField(gainSpec).get(null);
+ }
+ catch (SecurityException e) {
+ throw new IllegalArgumentException("Field " + gainSpec + " exists but cannot be accessed", e);
+ }
+ catch (ClassCastException e) {
+ throw new IllegalArgumentException("Field " + gainSpec + " exists but it is not of type Int2DoubleFunction", e);
+ }
+ catch (NoSuchFieldException e) {
+ candidateFunction = null;
+ }
+ discountFunction[i] = candidateFunction == null? ObjectParser.fromSpec(gainSpec, Int2DoubleFunction.class) : candidateFunction;
+ }
+
+ final ImmutableGraph graph = spec
+ ? ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE)
+ : offline
+ ? ((numberOfThreads(threads) == 1 && basenamet == null ? ImmutableGraph.loadOffline(basename) : ImmutableGraph.loadMapped(basename, new ProgressLogger())))
+ : ImmutableGraph.load(basename, new ProgressLogger());
+
+ final ImmutableGraph grapht = basenamet == null ? null : basenamet.equals(basename) ? graph : spec ? ObjectParser.fromSpec(basenamet, ImmutableGraph.class, GraphClassParser.PACKAGE) :
+ offline ? ImmutableGraph.loadMapped(basenamet, new ProgressLogger()) : ImmutableGraph.load(basenamet, new ProgressLogger());
+
+ final HyperBall hyperBall = new HyperBall(graph, grapht, log2m, pl, threads, bufferSize, granularity, external, sumOfDistances || closenessCentrality || linCentrality || nieminenCentrality, harmonicCentrality, discountFunction, seed);
+ hyperBall.run(jsapResult.getLong("upperBound"), jsapResult.getDouble("threshold"));
+ hyperBall.close();
+
+ if (neighbourhoodFunction) {
+ final PrintStream stream = new PrintStream(new FastBufferedOutputStream(new FileOutputStream(neighbourhoodFunctionFile)));
+ for(DoubleIterator i = hyperBall.neighbourhoodFunction.iterator(); i.hasNext();) stream.println(BigDecimal.valueOf(i.nextDouble()).toPlainString());
+ stream.close();
+ }
+
+ if (sumOfDistances) BinIO.storeFloats(hyperBall.sumOfDistances, sumOfDistancesFile);
+ if (harmonicCentrality) BinIO.storeFloats(hyperBall.sumOfInverseDistances, harmonicCentralityFile);
+ for (int i = 0; i < discountedGainCentralitySpec.length; i++) BinIO.storeFloats(hyperBall.discountedCentrality[i], discountedGainCentralityFile[i]);
+ if (closenessCentrality) {
+ final long n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(closenessCentralityFile)));
+ for (long i = 0; i < n; i++) {
+ final float d = FloatBigArrays.get(hyperBall.sumOfDistances, i);
+ dos.writeFloat(d == 0 ? 0 : 1 / d);
+ }
+ dos.close();
+ }
+ if (linCentrality) {
+ final long n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(linCentralityFile)));
+ for (long i = 0; i < n; i++) {
+ // Lin's index for isolated nodes is by (our) definition one (it's smaller than any other node).
+ final float d = FloatBigArrays.get(hyperBall.sumOfDistances, i);
+ if (d == 0) dos.writeFloat(1);
+ else {
+ final double count = hyperBall.count(i);
+ dos.writeFloat((float)(count * count / d));
+ }
+ }
+ dos.close();
+ }
+ if (nieminenCentrality) {
+ final long n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(nieminenCentralityFile)));
+ for (long i = 0; i < n; i++) {
+ final double count = hyperBall.count(i);
+ dos.writeFloat((float)(count * count - FloatBigArrays.get(hyperBall.sumOfDistances, i)));
+ }
+ dos.close();
+ }
+ if (reachable) {
+ final long n = graph.numNodes();
+ final DataOutputStream dos = new DataOutputStream(new FastBufferedOutputStream(new FileOutputStream(reachableFile)));
+ for (long i = 0; i < n; i++) dos.writeFloat((float)hyperBall.count(i));
+ dos.close();
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/ParallelBreadthFirstVisit.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/ParallelBreadthFirstVisit.java
new file mode 100644
index 0000000..4c2c9bc
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/ParallelBreadthFirstVisit.java
@@ -0,0 +1,346 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PUOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.fastutil.BigArrays;
+import it.unimi.dsi.fastutil.longs.LongBigArrayBigList;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.util.concurrent.CyclicBarrier;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.atomic.AtomicLongArray;
+
+/** Performs breadth-firsts visits of a graph exploiting multicore parallelism.
+ *
+ * <p>To use this class you first create an instance, and then invoke {@link #visit(long)}. If you want perform
+ * more visits preserving the {@link #marker} state you can invoke {@link #visit(long)} again.
+ * By calling {@link #clear()}, instead, you can reset {@link #marker} (i.e., forget about visited nodes).
+ *
+ * <p>Alternatively, {@link #visitAll()} will start a visit from all the nodes of the graph in a more efficient way.
+ *
+ * <p>After the visit, you can peek at the &ldquo;{@linkplain BigArrays big array}&ldquo; {@link #marker} field to discover details about the visit.
+ * Depending on the {@link #parent} value provided at construction time, {@link #marker}
+ * will be filled with parent information (e.g., with the index
+ * of the parent node in the visit tree) or with a <em>{@linkplain #round round number}</em> increased at each nonempty visit,
+ * which act as a connected-component index if the graph is symmetric.
+ *
+ * <p>Observe that in the former case (if {@link #parent} is <code>true</code>), {@link #marker} will
+ * contain the value -1 for the nodes that have not been reached by the visit, the parent of the node in the BFS tree
+ * if the node was not the root, or the node itself for the root.
+ *
+ * <p>In the case of {@link #visit(long)}, {@link #queue} and {@link #cutPoints}, too, provide useful information. In
+ * particular, the nodes in {@link #queue} from the <var>d</var>-th to the (<var>d</var>&nbsp;+1)-th cutpoint
+ * are exactly the nodes at distance <var>d</var> from the source.
+ *
+ * <h2>Performance issues</h2>
+ *
+ * <p>This class needs three longs per node.
+ * If there are several available cores, breadth-first visits will be <em>decomposed</em> into relatively
+ * small tasks (small blocks of nodes in the queue at the same distance from the starting node)
+ * and each task will be assigned to the first available core. Since all tasks are completely
+ * independent, this ensures a very high degree of parallelism. However, on very sparse graphs the cost
+ * of keeping the threads synchronised can be extremely high, and even end up <em>increasing</em> the visit time.
+ *
+ * <p>Note that if the degree distribution is extremely skewed some cores might get stuck
+ * in the enumeration of the successors of some nodes with a very high degree.
+ */
+
+public class ParallelBreadthFirstVisit {
+ /** The graph under examination. */
+ public final ImmutableGraph graph;
+ /** The queue of visited nodes. */
+ public final LongBigArrayBigList queue;
+ /** At the end of a visit, the cutpoints of {@link #queue}. The <var>d</var>-th cutpoint is the first node in the queue at distance <var>d</var>. The
+ * last cutpoint is the queue size. */
+ public final LongBigArrayBigList cutPoints;
+ /** Whether {@link #marker} contains parent nodes or round numbers. */
+ public final boolean parent;
+ /** The marker &ldquo;big&rdquo; array; contains -1 for nodes that have not still been enqueued, the parent of the visit tree if
+ * {@link #parent} is true, or an index increased at each visit if {@link #parent} is false, which in the symmetric case is the index
+ * of the connected component of the node. It has the same form of a {@linkplain BigArrays big array}, but it is handled manually. */
+ public final AtomicLongArray[] marker;
+ /** The global progress logger. */
+ private final ProgressLogger pl;
+ /** The number of threads. */
+ private final int numberOfThreads;
+ /** The number of nodes visited. */
+ private final AtomicLong progress;
+ /** The next node position to be picked from the last segment of {@link #queue}. */
+ private final AtomicLong nextPosition;
+ /** If true, the current visit is over. */
+ private volatile boolean completed;
+ /** The barrier used to synchronize visiting threads. */
+ private volatile CyclicBarrier barrier;
+ /** Keeps track of problems in visiting threads. */
+ private volatile Throwable threadThrowable;
+ /** A number increased at each nonempty visit (used to mark {@link #marker} if {@link #parent} is false). */
+ public long round;
+
+ /** Creates a new class for keeping track of the state of parallel breadth-first visits.
+ *
+ * @param graph a graph.
+ * @param requestedThreads the requested number of threads (0 for {@link Runtime#availableProcessors()}).
+ * @param parent if true, {@link #marker} will contain parent nodes; otherwise, it will contain {@linkplain #round round numbers}.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public ParallelBreadthFirstVisit(final ImmutableGraph graph, final int requestedThreads, final boolean parent, final ProgressLogger pl) {
+ this.graph = graph;
+ this.parent = parent;
+ this.pl = pl;
+
+ queue = new LongBigArrayBigList(graph.numNodes());
+ progress = new AtomicLong();
+ nextPosition = new AtomicLong();
+ cutPoints = new LongBigArrayBigList();
+ numberOfThreads = requestedThreads != 0 ? requestedThreads : Runtime.getRuntime().availableProcessors();
+
+ marker = new AtomicLongArray[(int)(graph.numNodes() + BigArrays.SEGMENT_SIZE - 1 >>> BigArrays.SEGMENT_SHIFT)];
+ if ((graph.numNodes() & BigArrays.SEGMENT_MASK) != 0) {
+ marker[marker.length - 1] = new AtomicLongArray((int)(graph.numNodes() & BigArrays.SEGMENT_MASK));
+ for(int i = marker.length - 1; i-- != 0;) marker[i] = new AtomicLongArray(BigArrays.SEGMENT_SIZE);
+ }
+ else for(int i = marker.length; i-- != 0;) marker[i] = new AtomicLongArray(BigArrays.SEGMENT_SIZE);
+
+ clear();
+ }
+
+ /** Clears the internal state of the visit, setting all {@link #marker} entries and {@link #round} to -1. */
+ public void clear() {
+ round = -1;
+ for(int s = marker.length; s-- != 0;) {
+ final AtomicLongArray t = marker[s];
+ for(int d = t.length(); d-- != 0;) t.set(d, -1);
+ }
+ }
+
+ private final class IterationThread extends Thread {
+ private static final int GRANULARITY = 1000;
+
+ @Override
+ public void run() {
+ try {
+ // We cache frequently used fields.
+ final AtomicLongArray[] marker = ParallelBreadthFirstVisit.this.marker;
+ final ImmutableGraph graph = ParallelBreadthFirstVisit.this.graph.copy();
+ final boolean parent = ParallelBreadthFirstVisit.this.parent;
+
+ for(;;) {
+ barrier.await();
+ if (completed) return;
+ final LongBigArrayBigList out = new LongBigArrayBigList();
+ final long first = cutPoints.getLong(cutPoints.size64() - 2);
+ final long last = cutPoints.getLong(cutPoints.size64() - 1);
+ long mark = round;
+ for(;;) {
+ // Try to get another piece of work.
+ final long start = first + nextPosition.getAndAdd(GRANULARITY);
+ if (start >= last) {
+ nextPosition.getAndAdd(-GRANULARITY);
+ break;
+ }
+
+ final long end = Math.min(last, start + GRANULARITY);
+ out.clear();
+
+ for(long pos = start; pos < end; pos++) {
+ final long curr = queue.getLong(pos);
+ if (parent == true) mark = curr;
+ final LazyLongIterator successors = graph.successors(curr);
+ for(long s; (s = successors.nextLong()) != -1;)
+ if (marker[BigArrays.segment(s)].compareAndSet(BigArrays.displacement(s), -1, mark)) out.add(s);
+ }
+
+ progress.addAndGet(end - start);
+
+ if (! out.isEmpty()) synchronized(queue) {
+ queue.addAll(out);
+ }
+ }
+ }
+ }
+ catch(Throwable t) {
+ threadThrowable = t;
+ }
+ }
+ }
+
+
+ /** Performs a breadth-first visit of the given graph starting from the given node.
+ *
+ * <p>This method will increment {@link #round}.
+ *
+ * @param start the starting node.
+ * @return the number of visited nodes.
+ * @see #visit(long,long)
+ */
+ public long visit(final long start) {
+ return visit(start, -1);
+ }
+
+
+ /** Performs a breadth-first visit of the given graph starting from the given node.
+ *
+ * <p>This method will increment {@link #round} if at least one node is visited.
+ *
+ * @param start the starting node.
+ * @param expectedSize the expected size (number of nodes) of the visit (for logging), or -1 to use the number of nodes of the graph.
+ * @return the number of visited nodes.
+ */
+ public long visit(final long start, final long expectedSize) {
+ if (marker[BigArrays.segment(start)].get(BigArrays.displacement(start)) != -1) return 0;
+ round++;
+ completed = false;
+ queue.clear();
+ cutPoints.clear();
+ queue.add(start);
+ cutPoints.add(0);
+ marker[BigArrays.segment(start)].set(BigArrays.displacement(start), parent ? start : round);
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+ progress.set(0);
+
+ if (pl != null) {
+ pl.start("Starting visit...");
+ pl.expectedUpdates = expectedSize != -1 ? expectedSize : graph.numNodes();
+ pl.itemsName = "nodes";
+ }
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ @Override
+ public void run() {
+ if (pl != null) pl.set(progress.get());
+
+ if (queue.size64() == cutPoints.getLong(cutPoints.size64() - 1)) {
+ completed = true;
+ return;
+ }
+
+ cutPoints.add(queue.size64());
+ nextPosition.set(0);
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (pl != null) pl.done();
+ return queue.size64();
+ }
+
+ /** Visits all nodes. Calls {@link #clear()} initially.
+ *
+ * <p>This method is more efficient than invoking {@link #visit(long, long)} on all nodes as threads are created just once.
+ */
+ public void visitAll() {
+ final IterationThread[] thread = new IterationThread[numberOfThreads];
+ for(int i = thread.length; i-- != 0;) thread[i] = new IterationThread();
+ final long n = graph.numNodes();
+ completed = false;
+ clear();
+ queue.clear();
+ cutPoints.clear();
+ progress.set(0);
+
+ if (pl != null) {
+ pl.start("Starting visits...");
+ pl.expectedUpdates = graph.numNodes();
+ pl.displayLocalSpeed = true;
+ pl.itemsName = "nodes";
+ }
+
+ barrier = new CyclicBarrier(numberOfThreads, new Runnable() {
+ long curr = -1;
+ @Override
+ public void run() {
+ if (pl != null) pl.set(progress.get());
+ // Either first call, or queue did not grow from the last call.
+ if (curr == -1 || queue.size64() == cutPoints.getLong(cutPoints.size64() - 1)) {
+ if (pl != null) pl.set(progress.get());
+ // Look for the first nonterminal node not yet visited.
+ for(;;) {
+ while(++curr < n && marker[BigArrays.segment(curr)].get(BigArrays.displacement(curr)) != -1);
+
+ if (curr == n) {
+ completed = true;
+ return;
+ }
+ else {
+ round++;
+ marker[BigArrays.segment(curr)].set(BigArrays.displacement(curr), parent ? curr : round);
+
+ final long d = graph.outdegree(curr);
+ if (d != 0 && ! (d == 1 && graph.successors(curr).nextLong() == curr)) {
+ queue.clear();
+ queue.add(curr);
+
+ cutPoints.clear();
+ cutPoints.add(0);
+ break;
+ }
+ }
+ }
+ }
+
+ cutPoints.add(queue.size64());
+ nextPosition.set(0);
+ }
+ }
+ );
+
+ for(int i = thread.length; i-- != 0;) thread[i].start();
+ for(int i = thread.length; i-- != 0;)
+ try {
+ thread[i].join();
+ }
+ catch (InterruptedException e) {
+ throw new RuntimeException(e);
+ }
+
+ if (threadThrowable != null) throw new RuntimeException(threadThrowable);
+ if (pl != null) pl.done();
+ }
+
+
+ /** Returns a node at maximum distance during the last visit (e.g., a node realising the positive eccentricity of the starting node).
+ *
+ * @return the maximum distance computed during the last visit.
+ */
+ public long nodeAtMaxDistance() {
+ return queue.getLong(queue.size64() - 1);
+ }
+
+ /** Returns the maximum distance computed during the last visit (e.g., the eccentricity of the source).
+ *
+ * @return the maximum distance computed during the last visit.
+ */
+
+ public long maxDistance() {
+ return cutPoints.size64() - 2;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponents.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponents.java
new file mode 100644
index 0000000..2bba4b0
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponents.java
@@ -0,0 +1,458 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.GraphClassParser;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.Stack;
+import it.unimi.dsi.fastutil.booleans.BooleanBigArrayBigList;
+import it.unimi.dsi.fastutil.booleans.BooleanStack;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrayBigList;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongStack;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrayBigList;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+
+/** Computes the strongly connected components (and optionally the buckets) of an immutable graph.
+ *
+ * <p>The {@link #compute(ImmutableGraph, boolean, ProgressLogger)} method of this class will return
+ * an instance that contains the data computed by running a variant of 's algorithm on an immutable graph.
+ * The implementation is iterative, rather than recursive, to work around known limitations on the size of
+ * the stack in Java.
+ * Besides the usually strongly connected components, it is possible to compute the <em>buckets</em> of the
+ * graph, that is, nodes belonging to components that are terminal, but not dangling, in the component DAG.
+ *
+ * <p>After getting an instance, it is possible to run the {@link #computeSizes()} and {@link #sortBySize(long[][])}
+ * methods to obtain further information. This scheme has been devised to exploit the available memory as much
+ * as possible&mdash;after the components have been computed, the returned instance keeps no track of
+ * the graph, and the related memory can be freed by the garbage collector.
+ */
+
+
+public class StronglyConnectedComponents {
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ private static final Logger LOGGER = LoggerFactory.getLogger(StronglyConnectedComponents.class);
+
+ /** The number of strongly connected components. */
+ final public long numberOfComponents;
+ /** The component of each node. */
+ final public long[][] component;
+ /** The bit vector for buckets, or <code>null</code>, in which case buckets have not been computed. */
+ final public LongArrayBitVector buckets;
+
+ protected StronglyConnectedComponents(final long numberOfComponents, final long[][] component, final LongArrayBitVector buckets) {
+ this.numberOfComponents = numberOfComponents;
+ this.component = component;
+ this.buckets = buckets;
+ }
+
+ private final static class Visit {
+ /** The graph. */
+ private final ImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final long n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final long[][] status;
+ /** The buckets. */
+ private final LongArrayBitVector buckets;
+ /** The component stack. */
+ private final LongBigArrayBigList componentStack;
+ /** The first-visit clock (incremented at each visited node). */
+ private long clock;
+ /** The number of components already output. */
+ private long numberOfComponents;
+
+ private Visit(final ImmutableGraph graph, final long[][] status, final LongArrayBitVector buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ componentStack = new LongBigArrayBigList(n);
+ }
+
+ /** Performs a visit starting form a given node.
+ *
+ * @param startNode the first node to visit.
+ */
+ private void visit(final long startNode) {
+ final BooleanStack olderNodeFound = new BooleanBigArrayBigList();
+ final LongStack nodeStack = new LongBigArrayBigList();
+ final Stack<LazyLongIterator> successorsStack = new ObjectBigArrayBigList<>();
+ final long[][] status = this.status;
+ // For simplicify, we compute nonbuckets and then flip the values.
+ final LongArrayBitVector nonBuckets = this.buckets;
+
+ LongBigArrays.set(status, startNode, ++clock);
+ componentStack.push(startNode);
+ nodeStack.push(startNode);
+ successorsStack.push(graph.successors(startNode));
+ olderNodeFound.push(false);
+ if (computeBuckets && graph.outdegree(startNode) == 0) nonBuckets.set(startNode);
+
+ main: while(! nodeStack.isEmpty()) {
+ final long currentNode = nodeStack.topLong();
+ final LazyLongIterator successors = successorsStack.top();
+
+ for(long s; (s = successors.nextLong()) != -1;) {
+ final long successorStatus = LongBigArrays.get(status, s);
+ if (successorStatus == 0) {
+ LongBigArrays.set(status, s, ++clock);
+ nodeStack.push(s);
+ componentStack.push(s);
+ successorsStack.push(graph.successors(s));
+ olderNodeFound.push(false);
+ if (computeBuckets && graph.outdegree(s) == 0) nonBuckets.set(s);
+ continue main;
+ }
+ else if (successorStatus > 0) {
+ if (successorStatus < LongBigArrays.get(status, currentNode)) {
+ LongBigArrays.set(status, currentNode, successorStatus);
+ olderNodeFound.popBoolean();
+ olderNodeFound.push(true);
+ }
+ }
+ else if (computeBuckets) nonBuckets.set(currentNode);
+ }
+
+ nodeStack.popLong();
+ successorsStack.pop();
+ if (pl != null) pl.lightUpdate();
+
+ if (olderNodeFound.popBoolean()) {
+ final long parentNode = nodeStack.topLong();
+ final long currentNodeStatus = LongBigArrays.get(status, currentNode);
+ if (currentNodeStatus < LongBigArrays.get(status, parentNode)) {
+ LongBigArrays.set(status, parentNode, currentNodeStatus);
+ olderNodeFound.popBoolean();
+ olderNodeFound.push(true);
+ }
+
+ if (computeBuckets && nonBuckets.getBoolean(currentNode)) nonBuckets.set(parentNode);
+ }
+ else {
+ if (computeBuckets && ! nodeStack.isEmpty()) nonBuckets.set(nodeStack.topLong());
+ final boolean notABucket = computeBuckets ? nonBuckets.getBoolean(currentNode) : false;
+ numberOfComponents++;
+ long z;
+ do {
+ z = componentStack.popLong();
+ // Component markers are -c-1, where c is the component number.
+ LongBigArrays.set(status, z, -numberOfComponents);
+ if (notABucket) nonBuckets.set(z);
+ } while(z != currentNode);
+ }
+ }
+ }
+
+
+ public void run() {
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (long x = 0; x < n; x++) if (LongBigArrays.get(status, x) == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for(int i = status.length; i-- != 0;) {
+ final long[] t = status[i];
+ for(int d = t.length; d-- != 0;) t[d] = -t[d] - 1;
+ }
+
+ if (buckets != null) buckets.flip();
+ }
+ }
+
+ /** Computes the strongly connected components of a given graph.
+ *
+ * @param graph the graph whose strongly connected components are to be computed.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponents compute(final ImmutableGraph graph, final boolean computeBuckets, final ProgressLogger pl) {
+ final long n = graph.numNodes();
+ final Visit visit = new Visit(graph, LongBigArrays.newBigArray(n), computeBuckets ? LongArrayBitVector.ofLength(n) : null, pl);
+ visit.run();
+ return new StronglyConnectedComponents(visit.numberOfComponents, visit.status, visit.buckets);
+ }
+
+
+ private final static class FilteredVisit {
+ /** The graph. */
+ private final ArcLabelledImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final long n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** A filter on arc labels. */
+ private final LabelledArcFilter filter;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final long[][] status;
+ /** The buckets. */
+ private final LongArrayBitVector buckets;
+ /** The component stack. */
+ private final LongBigArrayBigList componentStack;
+ /** The first-visit clock (incremented at each visited node). */
+ private long clock;
+ /** The number of components already output. */
+ private long numberOfComponents;
+
+ private FilteredVisit(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final long[][] status, final LongArrayBitVector buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.filter = filter;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ componentStack = new LongBigArrayBigList(n);
+ }
+
+ private long filteredOutdegree(final long node) {
+ // Definitely not so efficient, ma very simple.
+ long filteredOutdegree = 0;
+ final LabelledArcIterator successors = graph.successors(node);
+ for(long s; (s = successors.nextLong()) != -1;) if (filter.accept(node, s, successors.label())) filteredOutdegree++;
+ return filteredOutdegree;
+ }
+
+ /** Performs a visit starting form a given node.
+ *
+ * @param startNode the first node to visit.
+ */
+ private void visit(final long startNode) {
+ final LongArrayBitVector olderNodeFound = LongArrayBitVector.ofLength(n);
+ final LongStack nodeStack = new LongBigArrayBigList();
+ final Stack<LabelledArcIterator> successorsStack = new ObjectBigArrayBigList<>();
+ final long[][] status = this.status;
+ // For simplicify, we compute nonbuckets and then flip the values.
+ final LongArrayBitVector nonBuckets = this.buckets;
+
+ LongBigArrays.set(status, startNode, ++clock);
+ componentStack.push(startNode);
+ nodeStack.push(startNode);
+ successorsStack.push(graph.successors(startNode));
+ if (computeBuckets && filteredOutdegree(startNode) == 0) nonBuckets.set(startNode);
+
+ main: while(! nodeStack.isEmpty()) {
+ final long currentNode = nodeStack.topLong();
+ final LabelledArcIterator successors = successorsStack.top();
+
+ for(long s; (s = successors.nextLong()) != -1;) {
+ if (! filter.accept(currentNode, s, successors.label())) continue;
+ final long successorStatus = LongBigArrays.get(status, s);
+ if (successorStatus == 0) {
+ LongBigArrays.set(status, s, ++clock);
+ nodeStack.push(s);
+ componentStack.push(s);
+ successorsStack.push(graph.successors(s));
+ if (computeBuckets && filteredOutdegree(s) == 0) nonBuckets.set(s);
+ continue main;
+ }
+ else if (successorStatus > 0) {
+ if (successorStatus < LongBigArrays.get(status, currentNode)) {
+ LongBigArrays.set(status, currentNode, successorStatus);
+ olderNodeFound.set(currentNode);
+ }
+ }
+ else if (computeBuckets) nonBuckets.set(currentNode);
+ }
+
+ nodeStack.popLong();
+ successorsStack.pop();
+ if (pl != null) pl.lightUpdate();
+
+ if (olderNodeFound.getBoolean(currentNode)) {
+ final long parentNode = nodeStack.topLong();
+ final long currentNodeStatus = LongBigArrays.get(status, currentNode);
+ if (currentNodeStatus < LongBigArrays.get(status, parentNode)) {
+ LongBigArrays.set(status, parentNode, currentNodeStatus);
+ olderNodeFound.set(parentNode);
+ }
+
+ if (computeBuckets && nonBuckets.getBoolean(currentNode)) nonBuckets.set(parentNode);
+ }
+ else {
+ if (computeBuckets && ! nodeStack.isEmpty()) nonBuckets.set(nodeStack.topLong());
+ final boolean notABucket = computeBuckets ? nonBuckets.getBoolean(currentNode) : false;
+ numberOfComponents++;
+ long z;
+ do {
+ z = componentStack.popLong();
+ // Component markers are -c-1, where c is the component number.
+ LongBigArrays.set(status, z, -numberOfComponents);
+ if (notABucket) nonBuckets.set(z);
+ } while(z != currentNode);
+ }
+ }
+ }
+
+
+ public void run() {
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (long x = 0; x < n; x++) if (LongBigArrays.get(status, x) == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for(int i = status.length; i-- != 0;) {
+ final long[] t = status[i];
+ for(int d = t.length; d-- != 0;) t[d] = -t[d] - 1;
+ }
+
+ if (buckets != null) buckets.flip();
+ }
+ }
+
+ /** Computes the strongly connected components of a given arc-labelled graph, filtering its arcs.
+ *
+ * @param graph the arc-labelled graph whose strongly connected components are to be computed.
+ * @param filter a filter selecting the arcs that must be taken into consideration.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponents compute(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final boolean computeBuckets, final ProgressLogger pl) {
+ final long n = graph.numNodes();
+ FilteredVisit filteredVisit = new FilteredVisit(graph, filter, LongBigArrays.newBigArray(n), computeBuckets ? LongArrayBitVector.ofLength(n) : null, pl);
+ filteredVisit.run();
+ return new StronglyConnectedComponents(filteredVisit.numberOfComponents, filteredVisit.status, filteredVisit.buckets);
+ }
+
+
+ /** Returns the size big array for this set of strongly connected components.
+ *
+ * @return the size big array for this set of strongly connected components.
+ */
+ public long[][] computeSizes() {
+ final long[][] size = LongBigArrays.newBigArray(numberOfComponents);
+ for(int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int d = t.length; d-- != 0;) LongBigArrays.incr(size, t[d]);
+ }
+ return size;
+ }
+
+ /** Renumbers by decreasing size the components of this set.
+ *
+ * <p>After a call to this method, both the internal status of this class and the argument
+ * big array are permuted so that the sizes of strongly connected components are decreasing
+ * in the component index.
+ *
+ * @param size the components sizes, as returned by {@link #computeSizes()}.
+ */
+ public void sortBySize(final long[][] size) {
+ final long[][] perm = Util.identity(LongBigArrays.length(size));
+ LongBigArrays.quickSort(perm, 0, LongBigArrays.length(perm), (x, y) -> Long.compare(LongBigArrays.get(size, y), LongBigArrays.get(size, x)));
+ final long[][] copy = LongBigArrays.copy(size);
+
+ for(int i = size.length; i-- != 0;) {
+ final long[] t = size[i];
+ final long[] u = perm[i];
+ for(int d = t.length; d-- != 0;) t[d] = LongBigArrays.get(copy, u[d]);
+ }
+ Util.invertPermutationInPlace(perm);
+
+ for(int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int d = t.length; d-- != 0;) t[d] = LongBigArrays.get(perm, t[d]);
+ }
+ }
+
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(StronglyConnectedComponents.class.getName(),
+ "Computes the strongly connected components (and optionally the buckets) of a graph of given basename. The resulting data is saved " +
+ "in files stemmed from the given basename with extension .scc (a list of binary integers specifying the " +
+ "component of each node), .sccsizes (a list of binary integer specifying the size of each component) and .buckets " +
+ " (a serialised LongArrayBitVector specifying buckets). Please use suitable JVM options to set a large stack size.",
+ new Parameter[] {
+ new Switch("sizes", 's', "sizes", "Compute component sizes."),
+ new Switch("renumber", 'r', "renumber", "Renumber components in decreasing-size order."),
+ new Switch("buckets", 'b', "buckets", "Compute buckets (nodes belonging to a bucket component, i.e., a terminal nondangling component)."),
+ new FlaggedOption("filter", new ObjectParser(LabelledArcFilter.class, GraphClassParser.PACKAGE), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "filter", "A filter for labelled arcs; requires the provided graph to be arc labelled."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final String resultsBasename = jsapResult.getString("resultsBasename", basename);
+ final LabelledArcFilter filter = (LabelledArcFilter)jsapResult.getObject("filter");
+ ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ final StronglyConnectedComponents components =
+ filter != null ? StronglyConnectedComponents.compute(ArcLabelledImmutableGraph.load(basename), filter, jsapResult.getBoolean("buckets"), pl)
+ : StronglyConnectedComponents.compute(ImmutableGraph.load(basename), jsapResult.getBoolean("buckets"), pl);
+
+ if (jsapResult.getBoolean("sizes") || jsapResult.getBoolean("renumber")) {
+ final long[][] size = components.computeSizes();
+ if (jsapResult.getBoolean("renumber")) components.sortBySize(size);
+ if (jsapResult.getBoolean("sizes")) BinIO.storeLongs(size, resultsBasename + ".sccsizes");
+ }
+ BinIO.storeLongs(components.component, resultsBasename + ".scc");
+ if (components.buckets != null) BinIO.storeObject(components.buckets, resultsBasename + ".buckets");
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/package.html b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/package.html
new file mode 100644
index 0000000..8c07b3e
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/algo/package.html
@@ -0,0 +1,10 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>Webgraph</title>
+ </head>
+
+ <body>
+ <P>Classes implementing useful algorithms on graphs.
+ </body>
+</html>
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/IntegerListImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/IntegerListImmutableGraph.java
new file mode 100644
index 0000000..8560b01
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/IntegerListImmutableGraph.java
@@ -0,0 +1,165 @@
+package it.unimi.dsi.big.webgraph.examples;
+
+/*
+ * Copyright (C) 2006-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.ImmutableSequentialGraph;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.DataInputStream;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+
+/** Exposes a graph in a simple binary format as an (offline-only) {@link ImmutableGraph}.
+ *
+ * <P>This class is a simple example that should help in understanding how to interface
+ * WebGraph with external data. We have a graph contained in a file and represented by a list of binary
+ * 32-bit integers as follows:
+ * first we have the number of nodes, then the number of successors of node 0, then the list in increasing
+ * order of successors of node 0, then the number of successors of node 1, then the list in increasing
+ * order of successors of node 1, and so on.
+ *
+ * <P>If we want to transform this graph into, say, a {@link it.unimi.dsi.big.webgraph.BVGraph},
+ * we must create a class that exposes the file as an {@link it.unimi.dsi.big.webgraph.ImmutableGraph}
+ * and than save it using {@link it.unimi.dsi.big.webgraph.BVGraph#store(ImmutableGraph,CharSequence)} or by calling
+ * the main method of {@link it.unimi.dsi.big.webgraph.BVGraph}.
+ * A complete implementation is not necessary, as {@link it.unimi.dsi.big.webgraph.BVGraph} uses
+ * just {@link #nodeIterator()}. Since we are just interesting in importing data, we do not
+ * implement efficient random access methods, and the only loading method we implement is {@link #loadOffline(CharSequence)}.
+ */
+
+public class IntegerListImmutableGraph extends ImmutableSequentialGraph {
+
+ /** The filename of the graph. */
+ final private String filename;
+ /** The number of nodes, read at creation time and cached. */
+ final private long numNodes;
+
+ private IntegerListImmutableGraph(final CharSequence filename) throws IOException {
+ this.filename = filename.toString();
+ final DataInputStream dis = new DataInputStream(new FileInputStream(this.filename));
+ numNodes = dis.readLong();
+ dis.close();
+ }
+
+ @Override
+ public long numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public NodeIterator nodeIterator() {
+ try {
+ return new NodeIterator() {
+ final long n = numNodes();
+ final DataInputStream dis = new DataInputStream(new FileInputStream(IntegerListImmutableGraph.this.filename));
+ long curr = - 1, outdegree;
+ long[][] successorsArray = LongBigArrays.EMPTY_BIG_ARRAY;
+
+ {
+ try {
+ dis.readInt(); // Skip number of nodes
+ }
+ catch(IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new NoSuchElementException();
+ try {
+ outdegree = dis.readLong();
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return ++curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return (curr < n - 1);
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (curr == - 1) throw new IllegalStateException();
+ successorsArray = LongBigArrays.ensureCapacity(successorsArray, outdegree, 0);
+ try {
+ for(long i = 0; i< outdegree; i++) LongBigArrays.set(successorsArray, i, dis.readLong());
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return successorsArray;
+ }
+
+ @Override
+ public long outdegree() {
+ if (curr == - 1) throw new IllegalStateException();
+ return outdegree;
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ dis.close();
+ }
+ finally {
+ super.finalize();
+ }
+ }
+ };
+ }
+ catch (FileNotFoundException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ public static ImmutableGraph load(final CharSequence basename, final ProgressLogger pl) {
+ throw new UnsupportedOperationException("Graphs may be loaded offline only");
+ }
+
+ public static ImmutableGraph load(final CharSequence basename) {
+ return load(basename, (ProgressLogger)null);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(final CharSequence basename, final ProgressLogger pl) {
+ return load(basename, pl);
+ }
+
+ @Deprecated
+ public static ImmutableGraph loadSequential(final CharSequence basename) {
+ return load(basename, (ProgressLogger)null);
+ }
+
+ public static ImmutableGraph loadOffline(final CharSequence basename, final ProgressLogger pl) throws IOException {
+ return new IntegerListImmutableGraph(basename);
+ }
+
+ public static ImmutableGraph loadOffline(final CharSequence basename) throws IOException {
+ return loadOffline(basename, (ProgressLogger)null);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/IntegerTriplesArcLabelledImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/IntegerTriplesArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..6da20e3
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/IntegerTriplesArcLabelledImmutableGraph.java
@@ -0,0 +1,205 @@
+package it.unimi.dsi.big.webgraph.examples;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.AbstractLazyLongIterator;
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableSequentialGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.big.webgraph.labelling.BitStreamArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.GammaCodedIntLabel;
+import it.unimi.dsi.big.webgraph.labelling.Label;
+import it.unimi.dsi.fastutil.objects.ObjectArrayList;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.util.Arrays;
+import java.util.Comparator;
+import java.util.NoSuchElementException;
+
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** A class exposing a list of triples as an {@link ArcLabelledImmutableGraph}. The triples are
+ * interpreted as labelled arcs: the first element is the source, the second element is the target,
+ * and the third element must be a nonnegative integer that will be saved using a {@link GammaCodedIntLabel}.
+ *
+ * <p>This class is mainly a useful example of how to expose of your data <i>via</i> an {@link ArcLabelledImmutableGraph}, and
+ * it is also used to build test cases, but it is not efficient or particularly refined.
+ *
+ * <p>A main method reads from standard input a list of TAB-separated triples and writes the corresponding graph
+ * using {@link BVGraph} and {@link BitStreamArcLabelledImmutableGraph}.
+ */
+
+public class IntegerTriplesArcLabelledImmutableGraph extends ArcLabelledImmutableSequentialGraph {
+ /** The list of triples. */
+ final private int[][] triple;
+ /** The prototype of the labels used by this class. */
+ final private GammaCodedIntLabel prototype;
+ /** The number of nodes, computed at construction time by triple inspection. */
+ final private long n;
+
+ /** Creates a new arc-labelled immutable graph using a specified list of triples.
+ *
+ * <p>Note that it is impossible to specify isolated nodes with indices larger than
+ * the largest node with positive indegree or outdegree, as the number of nodes is computed
+ * by maximising over all indices in <code>triple</code>.
+ *
+ * @param triple a list of triples specifying labelled arcs (see the {@linkplain IntegerTriplesArcLabelledImmutableGraph class documentation});
+ * order is not relevant, but multiple arcs are not allowed.
+ */
+ public IntegerTriplesArcLabelledImmutableGraph(int[][] triple) {
+ this.triple = triple;
+ prototype = new GammaCodedIntLabel("FOO");
+ int m = 0;
+ for(int i = 0; i < triple.length; i++) m = Math.max(m, Math.max(triple[i][0], triple[i][1]));
+ Arrays.sort(triple, new Comparator<int[]>() {
+ @Override
+ public int compare(int[] p, int[] q) {
+ final int t = p[0] - q[0]; // Compare by source
+ if (t != 0) return t;
+ final int u = p[1] - q[1]; // Compare by destination
+ if (u == 0) throw new IllegalArgumentException("Duplicate arc <" + p[0] + "," + p[1] + ">");
+ return u;
+ }
+ });
+
+ n = m + 1;
+ }
+
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+ @Override
+ public long numNodes() {
+ return n;
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(long from) {
+ if (from == 0) return nodeIterator();
+ throw new UnsupportedOperationException();
+ }
+
+ private final class ArcIterator extends AbstractLazyLongIterator implements LabelledArcIterator {
+ private final int d;
+ private int k = 0; // Index of the last returned triple is pos+k
+ private final int pos;
+ private final GammaCodedIntLabel label;
+
+ private ArcIterator(int d, int pos, GammaCodedIntLabel label) {
+ this.d = d;
+ this.pos = pos;
+ this.label = label;
+ }
+
+ @Override
+ public Label label() {
+ if (k == 0) throw new IllegalStateException();
+ label.value = triple[pos + k][2];
+ return label;
+ }
+
+ @Override
+ public long nextLong() {
+ if (k >= d) return -1;
+ return triple[pos + ++k][1];
+ }
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return new ArcLabelledNodeIterator() {
+ /** Last node returned by this iterator. */
+ private int last = -1;
+ /** Last triple examined by this iterator. */
+ private int pos = -1;
+ /** A local copy of the prototye. */
+ private final GammaCodedIntLabel label = prototype.copy();
+
+ @Override
+ public LabelledArcIterator successors() {
+ if (last < 0) throw new IllegalStateException();
+ final int d = (int)outdegree(); // Triples to be returned are pos+1,pos+2,...,pos+d
+ return new ArcIterator(d, pos, label);
+ }
+
+ @Override
+ public long outdegree() {
+ if (last < 0) throw new IllegalStateException();
+ int p;
+ for (p = pos + 1; p < triple.length && triple[p][0] == last; p++);
+ return p - pos - 1;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return last < n - 1;
+ }
+
+ @Override
+ public long nextLong() {
+ if (!hasNext()) throw new NoSuchElementException();
+ if (last >= 0) pos += outdegree();
+ return ++last;
+ }
+
+ };
+ }
+
+ public static void main(String arg[]) throws JSAPException, IOException {
+ final SimpleJSAP jsap = new SimpleJSAP(IntegerTriplesArcLabelledImmutableGraph.class.getName(),
+ "Reads from standard input a list of triples <source,dest,label>, where the three " +
+ "components are separated by a TAB, and saves the " +
+ "corresponding arc-labelled graph using a BVGraph and a BitStreamArcLabelledImmutableGraph. " +
+ "Labels are represeted using GammaCodedIntLabel.",
+ new Parameter[] {
+ //new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting arc-labelled graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) return;
+ final String basename = jsapResult.getString("basename");
+
+ // We read triples from stdin, parse them and feed them to the constructor.
+ BufferedReader br = new BufferedReader(new InputStreamReader(System.in, "ASCII"));
+ ObjectArrayList<int[]> list = new ObjectArrayList<>();
+
+ String line;
+ while((line = br.readLine()) != null) {
+ final String p[] = line.split("\t");
+ list.add(new int[] { Integer.parseInt(p[0]),Integer.parseInt(p[1]), Integer.parseInt(p[2]) });
+ }
+
+ final ArcLabelledImmutableGraph g = new IntegerTriplesArcLabelledImmutableGraph(list.toArray(new int[0][]));
+ BVGraph.store(g, basename + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX);
+ BitStreamArcLabelledImmutableGraph.store(g, basename, basename + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/OutdegreeStats.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/OutdegreeStats.java
new file mode 100644
index 0000000..79ab272
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/OutdegreeStats.java
@@ -0,0 +1,108 @@
+package it.unimi.dsi.big.webgraph.examples;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.big.webgraph.GraphClassParser;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** The main method of this class loads an arbitrary {@link it.unimi.dsi.big.webgraph.ImmutableGraph}
+ * and performs a sequential scan to establish the minimum, maximum and average outdegree.
+ */
+
+public class OutdegreeStats {
+
+ private OutdegreeStats() {}
+
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, JSAPException, IOException {
+ SimpleJSAP jsap = new SimpleJSAP(OutdegreeStats.class.getName(), "Prints on standard error the maximum, minimum and average degree of a graph, and outputs on standard output the numerosity of each outdegree value (first line is the number of nodes with outdegree 0).",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), null, JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java class for the source graph."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) return;
+
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final String basename = jsapResult.getString("basename");
+
+ final ProgressLogger pl = new ProgressLogger();
+ pl.logInterval = jsapResult.getLong("logInterval");
+ final ImmutableGraph graph;
+ // We fetch by reflection the class specified by the user
+ if (graphClass != null) graph = (ImmutableGraph)graphClass.getMethod("loadOffline", CharSequence.class).invoke(null, basename);
+ else graph = ImmutableGraph.loadOffline(basename, pl);
+
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ long[][] count = LongBigArrays.EMPTY_BIG_ARRAY;
+ long d, curr, maxd = 0, maxNode = 0, mind = Integer.MAX_VALUE, minNode = 0;;
+ long totd = 0;
+
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Scanning...");
+
+ for(long i = graph.numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextLong();
+ d = nodeIterator.outdegree();
+
+ if (d < mind) {
+ mind = d;
+ minNode = curr;
+ }
+
+ if (d > maxd){
+ maxd = d;
+ maxNode = curr;
+ }
+
+ totd += d;
+
+ if (d >= LongBigArrays.length(count)) count = LongBigArrays.grow(count, d + 1);
+ LongBigArrays.incr(count, d);
+
+ pl.lightUpdate();
+ }
+
+ pl.done();
+
+ System.err.println("The minimum outdegree is " + mind + ", attained by node " + minNode);
+ System.err.println("The maximum outdegree is " + maxd + ", attained by node " + maxNode);
+ System.err.println("The average outdegree is " + (double)totd / graph.numNodes());
+
+ TextIO.storeLongs(count, 0, maxd + 1, System.out);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/package.html b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/package.html
new file mode 100644
index 0000000..f78c861
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/examples/package.html
@@ -0,0 +1,12 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>WebGraph Usage Examples</title>
+ </head>
+
+ <body>
+
+ <P>Example classes that do nice things using the WebGraph framework.
+
+ </body>
+</html>
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractIntLabel.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractIntLabel.java
new file mode 100644
index 0000000..c2f5541
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractIntLabel.java
@@ -0,0 +1,128 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract (single-attribute) integer label.
+ *
+ * <p>This class provides basic methods for a label holding an integer.
+ * Concrete implementations may impose further requirements on the integer.
+ *
+ * <p>Implementing subclasses must provide constructors, {@link Label#copy()},
+ * {@link Label#fromBitStream(it.unimi.dsi.io.InputBitStream, long)}, {@link Label#toBitStream(it.unimi.dsi.io.OutputBitStream, long)}
+ * and possibly override {@link #toString()}.
+ */
+
+public abstract class AbstractIntLabel extends AbstractLabel implements Label {
+ /** The key of the attribute represented by this label. */
+ protected final String key;
+ /** The value of the attribute represented by this label. */
+ public int value;
+
+ /** Creates an int label with given key and value.
+ *
+ * @param key the (only) key of this label.
+ * @param value the value of this label.
+ */
+ public AbstractIntLabel(String key, int value) {
+ this.key = key;
+ this.value = value;
+ }
+
+ @Override
+ public String wellKnownAttributeKey() {
+ return key;
+ }
+
+ @Override
+ public String[] attributeKeys() {
+ return new String[] { key };
+ }
+
+ @Override
+ public Class<?>[] attributeTypes() {
+ return new Class[] { int.class };
+ }
+
+ @Override
+ public Object get(String key) {
+ return Integer.valueOf(getInt(key));
+ }
+
+ @Override
+ public int getInt(String key) {
+ if (this.key.equals(key)) return value;
+ throw new IllegalArgumentException("Unknown key " + key);
+ }
+
+ @Override
+ public long getLong(String key) {
+ return getInt(key);
+ }
+
+ @Override
+ public float getFloat(String key) {
+ return getInt(key);
+ }
+
+ @Override
+ public double getDouble(String key) {
+ return getInt(key);
+ }
+
+ @Override
+ public Object get() {
+ return Integer.valueOf(getInt());
+ }
+
+ @Override
+ public int getInt() {
+ return value;
+ }
+
+ @Override
+ public long getLong() {
+ return value;
+ }
+
+ @Override
+ public float getFloat() {
+ return value;
+ }
+
+ @Override
+ public double getDouble() {
+ return value;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value;
+ }
+
+ @Override
+ public boolean equals(Object x) {
+ if (x instanceof AbstractIntLabel) return (value == ((AbstractIntLabel)x).value);
+ else return false;
+ }
+
+ @Override
+ public int hashCode() {
+ return value;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractIntListLabel.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractIntListLabel.java
new file mode 100644
index 0000000..4c75a13
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractIntListLabel.java
@@ -0,0 +1,90 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import java.util.Arrays;
+
+/** An abstract (single-attribute) list-of-integers label.
+*
+* <p>This class provides basic methods for a label holding a list of integers.
+* Concrete implementations may impose further requirements on the integer.
+*
+* <p>Implementing subclasses must provide constructors, {@link Label#copy()},
+* {@link Label#fromBitStream(it.unimi.dsi.io.InputBitStream, long)}, {@link Label#toBitStream(it.unimi.dsi.io.OutputBitStream, long)}
+* and possibly override {@link #toString()}.
+*/
+
+public abstract class AbstractIntListLabel extends AbstractLabel implements Label {
+ /** The key of the attribute represented by this label. */
+ protected final String key;
+ /** The values of the attribute represented by this label. */
+ public int[] value;
+
+ /** Creates an int label with given key and value.
+ *
+ * @param key the (only) key of this label.
+ * @param value the value of this label.
+ */
+ public AbstractIntListLabel(String key, int[] value) {
+ this.key = key;
+ this.value = value;
+ }
+
+ @Override
+ public String wellKnownAttributeKey() {
+ return key;
+ }
+
+ @Override
+ public String[] attributeKeys() {
+ return new String[] { key };
+ }
+
+ @Override
+ public Class<?>[] attributeTypes() {
+ return new Class[] { int[].class };
+ }
+
+ @Override
+ public Object get(String key) {
+ if (this.key.equals(key)) return value;
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public Object get() {
+ return value;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + Arrays.toString(value);
+ }
+
+ @Override
+ public boolean equals(Object x) {
+ if (x instanceof AbstractIntListLabel) return Arrays.equals(value, ((AbstractIntListLabel)x).value);
+ else return false;
+ }
+
+ @Override
+ public int hashCode() {
+ return Arrays.hashCode(value);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractLabel.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractLabel.java
new file mode 100644
index 0000000..9c95c13
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/AbstractLabel.java
@@ -0,0 +1,104 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** An abstract implementation throwing an {@link IllegalArgumentException} on all primitive-type methods. */
+
+public abstract class AbstractLabel implements Label {
+
+ @Override
+ public byte getByte() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public short getShort(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public int getInt(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public long getLong(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public float getFloat(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public double getDouble(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public char getChar(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public boolean getBoolean(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public byte getByte(String key) throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public short getShort() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public int getInt() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public long getLong() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public float getFloat() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public double getDouble() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public char getChar() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+
+ @Override
+ public boolean getBoolean() throws IllegalArgumentException {
+ throw new IllegalArgumentException();
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..cf38352
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledImmutableGraph.java
@@ -0,0 +1,234 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.IOException;
+import java.io.InputStream;
+
+/** An abstract implementation of a graph labelled on its arcs.
+ *
+ * <p>The main purpose of this class is that of override covariantly the return
+ * type of {@link #nodeIterator()} and {@link #nodeIterator(long)} so that
+ * it is an {@link ArcLabelledNodeIterator}, and the return type of
+ * all static load methods and of {@link #copy()} so that it is an {@link ArcLabelledImmutableGraph} (the
+ * methods themselves just delegate to the corresponding method in {@link ImmutableGraph}).
+ *
+ * <p>The only additional instance methods are {@link #labelBigArray(long)} and {@link #prototype()}.
+ *
+ * <h2>Saving labels</h2>
+ *
+ * <P>A subclass of this class <strong>may</strong> implement
+ * <UL>
+ * <LI><code>store(ArcLabelledImmutableGraph, CharSequence, CharSequence, ProgressLogger)</code>;
+ * <LI><code>store(ArcLabelledImmutableGraph, CharSequence, CharSequence)</code>.
+ * </UL>
+ *
+ * <p>These methods must save the labels of the given arc-labelled graph using the first given character
+ * sequence as a basename, and a suitable property file using the second given basename. Note that the graph
+ * will <strong>not</strong> be saved&mdash;use the <code>store()</code>
+ * method of an {@link ImmutableGraph} implementation for that purpose.
+ *
+ * <p>For istance, assuming <code>g</code> is an arc-labelled graph the idiomatic way
+ * of storing it on disk using {@link BVGraph} for the underlying graph and
+ * {@link BitStreamArcLabelledImmutableGraph} for the labels is
+ * <pre>
+ * BVGraph.store(g, "foo");
+ * BitStreamArcLabelledImmutableGraph.store(g, "bar", "foo");
+ * </pre>
+ *
+ * <h2>Underlying graphs</h2>
+ *
+ * <p>Often, implementations of this class will just wrap an <em>underlying graph</em> (i.e.,
+ * an instance of {@link ImmutableGraph}). In that case, we suggest that if the implementation
+ * uses property files the basename of the underlying graph is specified using the property
+ * key {@link #UNDERLYINGGRAPH_PROPERTY_KEY}. If the basename must be generated starting
+ * from the arc-labelled graph basename, we suggest to just add at the end the string
+ * {@link #UNDERLYINGGRAPH_SUFFIX}.
+ */
+
+public abstract class ArcLabelledImmutableGraph extends ImmutableGraph {
+
+ /** The standard property key for the underlying graph. All implementations decorating
+ * with labels an underlying graph are strongly encouraged to use this property
+ * name to specify the basename of the underlying graph. */
+ public static final String UNDERLYINGGRAPH_PROPERTY_KEY = "underlyinggraph";
+ /** The standard suffix added to basenames in order to give a basename
+ * to the underlying graph, when needed. */
+ public static final String UNDERLYINGGRAPH_SUFFIX = "-underlying";
+
+
+ @Override
+ public abstract ArcLabelledImmutableGraph copy();
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator() {
+ return nodeIterator(0);
+ }
+
+ /** Returns a node iterator for scanning the graph sequentially, starting from the given node.
+ *
+ * <P>This implementation strengthens that provided in {@link ImmutableGraph}, but
+ * calls the labelled random-access method {@link #successors(long)}.
+ *
+ * @param from the node from which the iterator will iterate.
+ * @return an {@link ArcLabelledNodeIterator} for accessing nodes, successors and their labels sequentially.
+ *
+ * @see ImmutableGraph#nodeIterator()
+ */
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final long from) {
+ return new ArcLabelledNodeIterator() {
+ long curr = from - 1;
+ final long n = numNodes();
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ return ++curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return (curr < n - 1);
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return ArcLabelledImmutableGraph.this.successors(curr);
+ }
+
+ @Override
+ public long outdegree() {
+ if (curr == from - 1) throw new IllegalStateException();
+ return ArcLabelledImmutableGraph.this.outdegree(curr);
+ }
+ };
+ }
+
+ @Override
+ public abstract ArcLabelledNodeIterator.LabelledArcIterator successors(long x);
+
+ /** Returns a prototype of the labels used by this graph. The prototype can be
+ * used to produce new copies, but must not be modified by the caller.
+ *
+ * @return a prototype for the labels of this graph.
+ */
+ public abstract Label prototype();
+
+ /** Returns a reference to an array containing the labels of the arcs going out of a given node
+ * in the same order as the order in which the corresponding successors are returned by {@link #successors(long)}.
+ *
+ * <P>The returned array may contain more entries than the outdegree of <code>x</code>.
+ * However, only those with indices from 0 (inclusive) to the outdegree of <code>x</code> (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwrap the iterator returned by {@link #successors(long)} and
+ * writes in a newly allocated array copies of the labels returned by {@link LabelledArcIterator#label()}.
+ *
+ * @return an array whose first elements are the labels of the arcs going
+ * out of <code>x</code>; the array must not be modified by the caller.
+ */
+
+ public Label[][] labelBigArray(long x) {
+ return ArcLabelledNodeIterator.unwrap(successors(x), outdegree(x));
+ }
+
+ @Deprecated
+ public static ArcLabelledImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadSequential(basename);
+ }
+
+ @Deprecated
+ public static ArcLabelledImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadSequential(basename, pl);
+ }
+
+ public static ArcLabelledImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadOffline(basename);
+ }
+
+ public static ArcLabelledImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadOffline(basename, pl);
+ }
+
+ public static ArcLabelledImmutableGraph load(CharSequence basename) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.load(basename);
+ }
+
+ public static ArcLabelledImmutableGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.load(basename, pl);
+ }
+
+ public static ArcLabelledImmutableGraph loadOnce(InputStream is) throws IOException {
+ return (ArcLabelledImmutableGraph)ImmutableGraph.loadOnce(is);
+ }
+
+ @Override
+ public String toString() {
+ final StringBuilder s = new StringBuilder();
+
+ long numArcs = -1;
+ try {
+ numArcs = numArcs();
+ }
+ catch(UnsupportedOperationException ignore) {}
+
+ s.append("Nodes: " + numNodes() + "\nArcs: " + (numArcs == -1 ? "unknown" : Long.toString(numArcs)) + "\n");
+
+ final ArcLabelledNodeIterator nodeIterator = nodeIterator();
+ ArcLabelledNodeIterator.LabelledArcIterator successors;
+ long curr;
+ for (long i = numNodes(); i-- != 0;) {
+ curr = nodeIterator.nextLong();
+ s.append("Successors of " + curr + " (degree " + nodeIterator.outdegree() + "):");
+ successors = nodeIterator.successors();
+ long d = nodeIterator.outdegree();
+ while (d-- != 0) s.append(" " + successors.nextLong() + " [" + successors.label() + "]");
+ s.append('\n');
+ }
+ return s.toString();
+ }
+
+ @Override
+ public boolean equals(Object x) {
+ if (! (x instanceof ArcLabelledImmutableGraph)) return false;
+ ArcLabelledImmutableGraph g = (ArcLabelledImmutableGraph)x;
+ if (g.numNodes() != numNodes()) return false;
+ ArcLabelledNodeIterator nodeIterator = nodeIterator();
+ ArcLabelledNodeIterator gNodeIterator = g.nodeIterator();
+ while (nodeIterator.hasNext()) {
+ nodeIterator.nextLong(); gNodeIterator.nextLong();
+ if (nodeIterator.outdegree() != gNodeIterator.outdegree()) return false;
+ LabelledArcIterator arcIterator = nodeIterator.successors();
+ LabelledArcIterator gArcIterator = gNodeIterator.successors();
+ long d = nodeIterator.outdegree();
+ while (d-- != 0) {
+ if (arcIterator.nextLong() != gArcIterator.nextLong()
+ || ! arcIterator.label().equals(gArcIterator.label())) return false;
+ }
+ }
+ return true;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledImmutableSequentialGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledImmutableSequentialGraph.java
new file mode 100644
index 0000000..c40af50
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledImmutableSequentialGraph.java
@@ -0,0 +1,58 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+
+/** An abstract arc-labelled immutable graph that throws an {@link java.lang.UnsupportedOperationException}
+ * on all random-access methods.
+ *
+ * <p>The main purpose of this class is to be used as a base for the numerous anonymous
+ * classes that do not support random access.
+ */
+
+public abstract class ArcLabelledImmutableSequentialGraph extends ArcLabelledImmutableGraph {
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public long[][] successorBigArray(final long x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public Label[][] labelBigArray(final long x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public long outdegree(final long x) { throw new UnsupportedOperationException(); }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(long x) {
+ if (x == 0) return nodeIterator();
+ throw new UnsupportedOperationException();
+ }
+ /** Throws an {@link java.lang.UnsupportedOperationException}. */
+ @Override
+ public LabelledArcIterator successors(long x) { throw new UnsupportedOperationException(); }
+ /** Returns false.
+ * @return false.
+ */
+ @Override
+ public boolean randomAccess() { return false; }
+
+ /** Throws an {@link UnsupportedOperationException}. */
+ @Override
+ public ArcLabelledImmutableGraph copy() { throw new UnsupportedOperationException(); }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledNodeIterator.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledNodeIterator.java
new file mode 100644
index 0000000..0921ed2
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcLabelledNodeIterator.java
@@ -0,0 +1,89 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+
+/** An iterator returning nodes, their successors and labels on the arcs.
+ *
+ * <p>The purpose of this abstract implementation is to override covariantly
+ * the return type of of {@link NodeIterator#successors()}, so that
+ * it has to be a {@link ArcLabelledNodeIterator.LabelledArcIterator}, and provide a general
+ * implementation of a new {@link #labelBigArray()} method that returns
+ * the labels of the arcs going out of the current node as an array.
+ */
+public abstract class ArcLabelledNodeIterator extends NodeIterator {
+
+ private static final Label[][] LABEL_EMPTY_BIG_ARRAY = new Label[0][0];
+
+ /** An iterator returning successor and the labels of the arcs toward them.
+ * The label can be accessed through {@link #label()}, which must be called just after
+ * advancing the iterator.
+ *
+ * <p><strong>Warning</strong>: the returned label can be the same object
+ * upon several calls to {@link #label()}; if you need to store it,
+ * you should {@linkplain Label#copy() copy it}.
+ */
+ public interface LabelledArcIterator extends LazyLongIterator {
+ /** The label of arc leading to the last returned successor.
+ *
+ * @return the label of arc leading to the last returned successor.
+ */
+ public Label label();
+ }
+
+ @Override
+ public abstract ArcLabelledNodeIterator.LabelledArcIterator successors();
+
+ /** Returns a reference to an array containing the labels of the arcs going out of the current node
+ * in the same order as the order in which the corresponding successors are returned by {@link #successors()}.
+ *
+ * <P>The returned array may contain more entries than the outdegree of the current node.
+ * However, only those with indices from 0 (inclusive) to the outdegree of the current node (exclusive)
+ * contain valid data.
+ *
+ * <P>This implementation just unwrap the iterator returned by {@link #successors()} and
+ * writes in a newly allocated array copies of the labels returned by {@link LabelledArcIterator#label()}.
+ *
+ * @return an array whose first elements are the labels of the arcs going
+ * out of the current node; the array must not be modified by the caller.
+ */
+
+ public Label[][] labelBigArray() {
+ return unwrap(successors(), outdegree());
+ }
+
+ /** Returns a new array of labels filled with exactly <code>howMany</code> labels from the given iterator.
+ * Note that the iterator is required to have at least as many labels as needed.
+ *
+ * @param iterator the iterator.
+ * @param howMany the number of labels.
+ * @return the new array where labels are copied.
+ */
+ protected static Label[][] unwrap(final ArcLabelledNodeIterator.LabelledArcIterator iterator, final long howMany) {
+ final Label[][] result = ObjectBigArrays.newBigArray(LABEL_EMPTY_BIG_ARRAY, howMany);
+ for (long i = 0; i < howMany; i++) {
+ iterator.nextLong();
+ ObjectBigArrays.set(result, i, iterator.label().copy());
+ }
+ return result;
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcRelabelledImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcRelabelledImmutableGraph.java
new file mode 100644
index 0000000..86c6142
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/ArcRelabelledImmutableGraph.java
@@ -0,0 +1,228 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.AbstractLazyLongIterator;
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.GraphClassParser;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Properties;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** Exhibits an arc-labelled immutable graph as another arc-labelled immutable graph changing only
+ * the kind of labels. Labels of the source graphs are mapped to labels
+ * of the exhibited graph via a suitable strategy provided at construction time.
+ */
+public class ArcRelabelledImmutableGraph extends ArcLabelledImmutableGraph {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(ArcRelabelledImmutableGraph.class);
+
+ /** A way to convert a label into another label.
+ */
+ public static interface LabelConversionStrategy {
+ /** Takes a label <code>from</code> and writes its content into another label <code>to</code>.
+ * If the types of labels are incompatible, or unapt for this strategy, an {@link IllegalArgumentException}
+ * or a {@link ClassCastException} will be thrown.
+ *
+ * @param from source label.
+ * @param to target label.
+ * @param source the source node of the arc labelled by the two labels.
+ * @param target the target node of the arc labelled by the two labels.
+ */
+ public void convert(Label from, Label to, long source, long target);
+ }
+
+ /** A conversion strategy that converts between any two classes extending {@link AbstractIntLabel}.
+ */
+ public static final LabelConversionStrategy INT_LABEL_CONVERSION_STRATEGY = new LabelConversionStrategy() {
+ @Override
+ public void convert(final Label from, final Label to, final long source, final long target) {
+ ((AbstractIntLabel)to).value = ((AbstractIntLabel)from).value;
+ }
+
+ };
+
+ /** The wrapped graph. */
+ private final ArcLabelledImmutableGraph wrappedGraph;
+ /** The new type of labels. */
+ private final Label newLabelPrototype;
+ /** The conversion strategy to be used. */
+ private final LabelConversionStrategy conversionStrategy;
+
+ /** Creates a relabelled graph with given label prototype.
+ *
+ * @param wrappedGraph the graph we are going to relabel.
+ * @param newLabelPrototype the prototype for the new type of labels.
+ * @param conversionStrategy the strategy to convert the labels of the wrapped graph into the new labels.
+ */
+ public ArcRelabelledImmutableGraph(final ArcLabelledImmutableGraph wrappedGraph, final Label newLabelPrototype, final LabelConversionStrategy conversionStrategy) {
+ this.wrappedGraph = wrappedGraph;
+ this.newLabelPrototype = newLabelPrototype;
+ this.conversionStrategy = conversionStrategy;
+ }
+
+ @Override
+ public ArcRelabelledImmutableGraph copy() {
+ return new ArcRelabelledImmutableGraph(wrappedGraph.copy(), newLabelPrototype.copy(), conversionStrategy);
+ }
+
+ private final class RelabelledArcIterator extends AbstractLazyLongIterator implements LabelledArcIterator {
+ /** The wrapped arc iterator. */
+ private final LabelledArcIterator wrappedArcIterator;
+ /** The source node of the current {@link #wrappedArcIterator}. */
+ private final long source;
+ /** The target of the current arc. */
+ private long target;
+
+ public RelabelledArcIterator(final LabelledArcIterator wrappedArcIterator, final long source) {
+ this.wrappedArcIterator = wrappedArcIterator;
+ this.source = source;
+ }
+
+ @Override
+ public Label label() {
+ conversionStrategy.convert(wrappedArcIterator.label(), newLabelPrototype, source, target);
+ return newLabelPrototype;
+ }
+
+ @Override
+ public long nextLong() {
+ return target = wrappedArcIterator.nextLong();
+ }
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final long from) {
+ return new ArcLabelledNodeIterator() {
+ /** The current node. */
+ private long current = -1;
+
+ ArcLabelledNodeIterator wrappedNodeIterator = wrappedGraph.nodeIterator(from);
+ @Override
+ public LabelledArcIterator successors() {
+ return new RelabelledArcIterator(wrappedNodeIterator.successors(), current);
+ }
+
+ @Override
+ public long outdegree() {
+ return wrappedNodeIterator.outdegree();
+ }
+
+ @Override
+ public boolean hasNext() {
+ return wrappedNodeIterator.hasNext();
+ }
+
+ @Override
+ public long nextLong() {
+ return current = wrappedNodeIterator.nextLong();
+ }
+
+ };
+ }
+
+ @Override
+ public LabelledArcIterator successors(long x) {
+ return new RelabelledArcIterator(wrappedGraph.successors(x), x);
+ }
+
+ @Override
+ public Label prototype() {
+ return newLabelPrototype;
+ }
+
+ @Override
+ public long numNodes() {
+ return wrappedGraph.numNodes();
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return wrappedGraph.randomAccess();
+ }
+
+ @Override
+ public long outdegree(long x) {
+ return wrappedGraph.outdegree(x);
+ }
+
+ public static void main(String arg[]) throws JSAPException, IOException, IllegalArgumentException, SecurityException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, ClassNotFoundException, InstantiationException {
+ final SimpleJSAP jsap = new SimpleJSAP(ArcRelabelledImmutableGraph.class.getName(),
+ "Relabels a graph with given basename, with integer labels, saving it with a different basename and " +
+ "using another (typically: different) type of integer labels, specified via a spec, and possibly using " +
+ "a different kind of graph class.",
+ new Parameter[] {
+ new FlaggedOption("underlyingGraphClass", GraphClassParser.getParser(), BVGraph.class.getName(), JSAP.NOT_REQUIRED, 'u', "underlying-graph-class", "Forces a Java immutable graph class to be used for saving the underlying graph (if the latter did not exist before)."),
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), BitStreamArcLabelledImmutableGraph.class.getName(), JSAP.NOT_REQUIRED, 'g', "graph-class", "Forces a Java arc-labelled graph class to be used for saving."),
+ new UnflaggedOption("spec", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The label spec (e.g. FixedWidthIntLabel(FOO,10))."),
+ new UnflaggedOption("source", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the source arc-labelled graph."),
+ new UnflaggedOption("target", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the target arc-labelled graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) return;
+ final Class<?> destClass = jsapResult.getClass("graphClass");
+ final Class<?> underlyingDestClass = jsapResult.getClass("underlyingGraphClass");
+ final String sourceBasename = jsapResult.getString("source");
+ final String targetBasename = jsapResult.getString("target");
+ final String spec = jsapResult.getString("spec");
+ final Label label = ObjectParser.fromSpec(spec, Label.class);
+
+ ImmutableGraph source = ImmutableGraph.loadOffline(sourceBasename);
+ if (! (source instanceof ArcLabelledImmutableGraph)) throw new IllegalArgumentException("The graph " + sourceBasename + " of class " + sourceBasename.getClass().getName() + " is not arc-labelled");
+ ArcLabelledImmutableGraph labSource = (ArcLabelledImmutableGraph)source;
+
+ if (! (labSource.prototype() instanceof AbstractIntLabel && label instanceof AbstractIntLabel)) throw new IllegalArgumentException("Relabelling from command line is only allowed for int labels, not for " + labSource.prototype().getClass().getName() + " -> " + label.getClass().getName());
+ ArcLabelledImmutableGraph labTarget = new ArcRelabelledImmutableGraph(labSource, label, ArcRelabelledImmutableGraph.INT_LABEL_CONVERSION_STRATEGY);
+
+ ProgressLogger pl = new ProgressLogger(LOGGER);
+
+ Properties prop = new Properties();
+ prop.load(new FileInputStream(sourceBasename + ImmutableGraph.PROPERTIES_EXTENSION));
+ String underlyingBasename = prop.getProperty(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY); // Tries to get the underlying basename
+ if (underlyingBasename == null)
+ // If the underlying did not exist, we store it with a fixed basename variant
+ underlyingDestClass.getMethod("store", ImmutableGraph.class, CharSequence.class, ProgressLogger.class)
+ .invoke(null, labTarget, underlyingBasename = targetBasename + ArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX, pl);
+
+ destClass.getMethod("store", ArcLabelledImmutableGraph.class, CharSequence.class, CharSequence.class, ProgressLogger.class)
+ .invoke(null, labTarget, targetBasename, underlyingBasename, pl);
+
+ }
+
+
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/BitStreamArcLabelledImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/BitStreamArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..1aec5c6
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/BitStreamArcLabelledImmutableGraph.java
@@ -0,0 +1,537 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.util.Properties;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.AbstractLazyLongIterator;
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastMultiByteArrayInputStream;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.sux4j.util.EliasFanoMonotoneLongBigList;
+
+/** A labelled graph storing its labels as a bit stream.
+ *
+ * <p>Instances of this class wrap a given {@linkplain ImmutableGraph immutable graph} and a bit stream.
+ * Given a prototype {@link Label}, the bit stream is then considered as containing all labels of all arcs
+ * as returned by a complete enumeration (made using {@link #nodeIterator()}). The overall graph is described
+ * by a <em>label file</em> (with extension
+ * <code>.labels</code>), an <em>offset file</em> (with extension
+ * <code>.labeloffsets</code>) and a <em>property file</em> (with extension
+ * <code>.properties</code>). The latter, not surprisingly, is a Java property file.
+ *
+ * <H2>The Label and Offset Files</H2>
+ *
+ * <P>Since the labels are stored as a bit stream, we must have some way to know where the labels
+ * related to the successors of each node start.
+ * This information is stored in the offset file, which contains the bit offset of the list of labels
+ * of the arcs going out of each node (in particular,
+ * the offset of the first list will be zero). As a commodity, the offset file contains an additional
+ * offset pointing just after the last list (providing, as a side-effect, the actual bit length of the label file).
+ * Each offset (except for the first one) is stored as a {@linkplain OutputBitStream#writeGamma(int) &gamma;-coded} difference from the previous offset.
+ *
+ * <H2>The Property File</H2>
+ *
+ * <p>The property file for an instance of this class must contain the following entries:
+ *
+ * <dl>
+ * <dt>graphclass
+ * <dd>the name of this class; it is necessary so that load methods in
+ * {@link ImmutableGraph} can identify this class;
+ * <dt>underlyinggraph
+ * <dd>the basename (relative to the name of the property file, unless it is absolute) of the underlying {@link ImmutableGraph};
+ * <dt>labelspec
+ * <dd>a string describing a constructor call for a label class; an example is
+ * <div style="margin:1em; text-align: center">
+ * <code>it.unimi.dsi.webgraph.labelling.FixedWidthIntLabel(FOO,10)</code>
+ * </div>
+ * parameters
+ * are separated by a comma, and no quoting or escaping is allowed (see {@link Label} for details
+ * about string-based constructors).
+ * </dl>
+ *
+ * <p>The {@link #load(it.unimi.dsi.big.webgraph.ImmutableGraph.LoadMethod, CharSequence, java.io.InputStream, ProgressLogger) load()}
+ * method of this class takes care of looking at the property file, loading the underlying immutable graph,
+ * and setting up either sequential or random access to the bit stream containing the labels. If
+ * just sequential access is required, the offsets are not loaded into memory, and if just offline
+ * access is required, bit stream is never loaded into memory.
+ *
+ * <h2>Saving labels</h2>
+ *
+ * <p>The {@link #store(ArcLabelledImmutableGraph, CharSequence, CharSequence)}
+ * and {@link #store(ArcLabelledImmutableGraph, CharSequence, CharSequence, ProgressLogger)}
+ * methods will save the labels of an instance of this graph as expected, that is,
+ * the bitstream and its offsets will be saved with the extensions described above.
+ */
+
+public class BitStreamArcLabelledImmutableGraph extends ArcLabelledImmutableGraph {
+ /** The standard extension for the labels bit stream. */
+ public static final String LABELS_EXTENSION = ".labels";
+ /** The standard extension for the label offsets bit stream. */
+ public static final String LABEL_OFFSETS_EXTENSION = ".labeloffsets";
+ /** The standard property key for a label specification. */
+ public static final String LABELSPEC_PROPERTY_KEY = "labelspec";
+
+ /** The buffer size we use for most operations. */
+ private static final int STD_BUFFER_SIZE = 1024 * 1024;
+ /** The underlying immutable graph. */
+ public final ImmutableGraph g;
+ /** A prototype label, used to deserialise labels and create copies. */
+ protected final Label prototype;
+
+ /** A byte array containing the label bit stream, or <code>null</code> for offline processing or for streams longer than {@link Integer#MAX_VALUE} bytes (see {@link #labelStream}). */
+ private final byte[] byteArray;
+ /** A multi-byte array input stream that replaces {@link #byteArray} for streams longer than {@link Integer#MAX_VALUE} bytes. */
+ private final FastMultiByteArrayInputStream labelStream;
+ /** The basename of this graph (required for offline access). */
+ protected final CharSequence basename;
+ /** The offset array, or <code>null</code> for sequential access. */
+ protected final EliasFanoMonotoneLongBigList offset;
+
+ /** Builds a new labelled graph using a bit stream of labels.
+ *
+ * @param basename the basename of the graph (mandatory for offline access).
+ * @param g the underlying immutable graph.
+ * @param prototype a label instance.
+ * @param byteArray a byte array containing the bit stream of labels, or <code>null</code> for offile access
+ * or large file access.
+ * @param labelStream if <code>byteArray</code> is <code>null</code>, this stream is used as the bit stream of labels.
+ * @param offset the offset array for random access, or <code>null</code>.
+ */
+ protected BitStreamArcLabelledImmutableGraph(CharSequence basename, ImmutableGraph g, Label prototype, byte[] byteArray, FastMultiByteArrayInputStream labelStream, EliasFanoMonotoneLongBigList offset) {
+ this.g = g;
+ this.byteArray = byteArray;
+ this.labelStream = labelStream;
+ this.prototype = prototype;
+ this.basename = basename;
+ this.offset = offset;
+ }
+
+ @Override
+ public BitStreamArcLabelledImmutableGraph copy() {
+ return new BitStreamArcLabelledImmutableGraph(basename, g.copy(), prototype.copy(), byteArray, labelStream, offset);
+ }
+
+ /** Returns the label bit stream.
+ *
+ * <p>This method takes care of creating the bit stream from the right source&mdash;the byte array,
+ * the stream of multiple byte arrays or the label file itself.
+ *
+ * @return the label bit stream.
+ */
+ protected InputBitStream newInputBitStream() throws FileNotFoundException {
+ return byteArray != null ? new InputBitStream(byteArray) :
+ labelStream != null ? new InputBitStream(new FastMultiByteArrayInputStream(labelStream)) :
+ new InputBitStream(basename + LABELS_EXTENSION);
+ }
+
+ @Override
+ public CharSequence basename() {
+ return basename;
+ }
+
+ /** Return the actual offset of the labels of the arcs going out of a given node.
+ *
+ * @param x a node.
+ * @return the offset of the labels of the arcs going out of <code>x</code>.
+ */
+ protected long offset(final long x) {
+ // Without offsets, we just give up.
+ return offset.getLong(x);
+ }
+
+ protected static class BitStreamLabelledArcIterator extends AbstractLazyLongIterator implements ArcLabelledNodeIterator.LabelledArcIterator {
+ final protected LazyLongIterator underlyingIterator;
+ final protected InputBitStream ibs;
+ final protected Label label;
+ final protected long from;
+
+ public BitStreamLabelledArcIterator(final BitStreamArcLabelledImmutableGraph alg, final long x) {
+ this.underlyingIterator = alg.g.successors(from = x);
+ try {
+ ibs = alg.newInputBitStream();
+ ibs.position(alg.offset(x));
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ label = alg.prototype.copy();
+ }
+
+ @Override
+ public Label label() {
+ return label;
+ }
+
+ @Override
+ public long nextLong() {
+ final long successor = underlyingIterator.nextLong();
+ if (successor == -1) return -1;
+ try {
+ label.fromBitStream(ibs, from);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return successor;
+ }
+ }
+
+ @Override
+ public ArcLabelledNodeIterator.LabelledArcIterator successors(final long x) {
+ return new BitStreamLabelledArcIterator(this, x);
+ }
+
+ @Override
+ public long[][] successorBigArray(final long x) {
+ return g.successorBigArray(x);
+ }
+
+ @Override
+ public long numNodes() {
+ return g.numNodes();
+ }
+
+ @Override
+ public long numArcs() {
+ return g.numArcs();
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return g.randomAccess() && offset != null;
+ }
+
+ @Override
+ public long outdegree(long x) {
+ return g.outdegree(x);
+ }
+
+ @Deprecated
+ public static BitStreamArcLabelledImmutableGraph loadSequential(CharSequence basename) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, null);
+ }
+
+ @Deprecated
+ public static BitStreamArcLabelledImmutableGraph loadSequential(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.SEQUENTIAL, basename, pl);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph loadOffline(CharSequence basename) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, null);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph loadOffline(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.OFFLINE, basename, pl);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph load(CharSequence basename) throws IOException {
+ return load(LoadMethod.STANDARD, basename, null);
+ }
+
+ public static BitStreamArcLabelledImmutableGraph load(CharSequence basename, ProgressLogger pl) throws IOException {
+ return load(LoadMethod.STANDARD, basename, pl);
+ }
+
+ /** Loads a labelled graph using the given method and offset step.
+ *
+ * <p>If <code>offsetStep</code> is larger than 1 and the the underlying graph is
+ * a {@link BVGraph}, the value will be passed to {@link BVGraph#load(CharSequence, int, ProgressLogger)}.
+ *
+ * @param method a load method.
+ * @param basename the basename of the graph.
+ * @param pl a progress logger.
+ * @return a graph labelled using a bit stream.
+ */
+
+ @SuppressWarnings("deprecation")
+ protected static BitStreamArcLabelledImmutableGraph load(LoadMethod method, CharSequence basename, ProgressLogger pl) throws IOException {
+ final FileInputStream propertyFile = new FileInputStream(basename + PROPERTIES_EXTENSION);
+ final Properties properties = new Properties();
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ if (properties.getProperty(UNDERLYINGGRAPH_PROPERTY_KEY) == null) throw new IOException("The property file for " + basename + " does not contain an underlying graph basename");
+ // We resolve the underlying graph basename relatively to our basename
+ String graphName = properties.getProperty(UNDERLYINGGRAPH_PROPERTY_KEY);
+ // This is a workaround because absolute filenames are not correctly relativised
+ if (! (new File(graphName).isAbsolute())) graphName = new File(new File(basename.toString()).getParentFile(), properties.getProperty(UNDERLYINGGRAPH_PROPERTY_KEY)).toString();
+
+ final ImmutableGraph g;
+
+ // A kluge to pass the offset step down to a BVGraph
+
+ final FileInputStream graphPropertyFile = new FileInputStream(graphName + PROPERTIES_EXTENSION);
+ final Properties graphProperties = new Properties();
+ graphProperties.load(graphPropertyFile);
+ graphPropertyFile.close();
+
+ g = ImmutableGraph.load(method, graphName, null, pl);
+
+ // We parse the label spec and build a prototype
+ if (properties.getProperty(LABELSPEC_PROPERTY_KEY) == null) throw new IOException("The property file for " + basename + " does not contain a label specification");
+ Label prototype;
+ try {
+ try {
+ prototype = ObjectParser.fromSpec(new File(basename.toString()).getParentFile(), properties.getProperty(LABELSPEC_PROPERTY_KEY), Label.class);
+ }
+ catch(NoSuchMethodException e) {
+ prototype = ObjectParser.fromSpec(properties.getProperty(LABELSPEC_PROPERTY_KEY), Label.class);
+ }
+ }
+ catch (RuntimeException e) {
+ throw new RuntimeException(e);
+ }
+ catch (Exception e) {
+ throw new RuntimeException(e);
+ }
+
+ byte[] byteArray = null;
+ FastMultiByteArrayInputStream labelStream = null;
+ EliasFanoMonotoneLongBigList offsets = null;
+
+ if (method != LoadMethod.OFFLINE) {
+ if (pl != null) {
+ pl.itemsName = "bytes";
+ pl.start("Loading labels...");
+ }
+
+ final FileInputStream fis = new FileInputStream(basename + LABELS_EXTENSION);
+ final long size = fis.getChannel().size();
+ if (size <= Integer.MAX_VALUE) byteArray = BinIO.loadBytes(basename + LABELS_EXTENSION);
+ else labelStream = new FastMultiByteArrayInputStream(fis, size);
+
+ if (pl != null) {
+ pl.count = size;
+ pl.done();
+ }
+ // We do not load offsets if only sequential access is required.
+ if (method != LoadMethod.SEQUENTIAL) {
+ if (pl != null) {
+ pl.itemsName = "deltas";
+ pl.expectedUpdates = g.numNodes() + 1;
+ pl.start("Loading label offsets...");
+ }
+ final InputBitStream offsetStream = new InputBitStream(basename + LABEL_OFFSETS_EXTENSION);
+
+ offsets = new EliasFanoMonotoneLongBigList(g.numNodes() + 1, size * Byte.SIZE + 1, new LongIterator() {
+ private long off;
+ private int i;
+
+ @Override
+ public boolean hasNext() {
+ return i <= g.numNodes();
+ }
+ @Override
+ public long nextLong() {
+ i++;
+ try {
+ return off = offsetStream.readLongGamma() + off;
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ }
+ });
+
+ offsetStream.close();
+ if (pl != null) {
+ pl.count = g.numNodes() + 1;
+ pl.done();
+ pl.logger().info("Label pointer bits per node: " + offsets.numBits() / (g.numNodes() + 1.0));
+ }
+ }
+
+ fis.close();
+ }
+
+ return new BitStreamArcLabelledImmutableGraph(basename, g, prototype, byteArray, labelStream, offsets);
+
+ }
+
+ private final static class BitStreamArcLabelledNodeIterator extends ArcLabelledNodeIterator {
+ final private NodeIterator underlyingNodeIterator;
+ final private InputBitStream ibs;
+ final private Label prototype;
+ private Label[][] label = new Label[0][0];
+
+ public BitStreamArcLabelledNodeIterator(final long from, final ImmutableGraph g, final Label prototype, final InputBitStream ibs) {
+ this.prototype = prototype;
+ this.ibs = ibs;
+ underlyingNodeIterator = g.nodeIterator();
+ // Skip nodes up to from. This is necessary to skip labels, too.
+ for(long i = from; i-- != 0;) nextLong();
+ }
+
+ private final static class BitStreamArcLabelledNodeIteratorArcIterator extends AbstractLazyLongIterator implements ArcLabelledNodeIterator.LabelledArcIterator {
+ private final Label[][] label;
+ private final long[][] successor;
+ private final long outdegree;
+ private long curr;
+
+ public BitStreamArcLabelledNodeIteratorArcIterator(final long outdegree, final long[][] ls, final Label[][] label) {
+ this.outdegree = outdegree;
+ this.successor = ls;
+ this.label = label;
+ curr = -1;
+ }
+
+ @Override
+ public Label label() {
+ if (curr == -1) throw new IllegalStateException("This successor iterator is currently not valid");
+ return ObjectBigArrays.get(label, curr);
+ }
+
+ @Override
+ public long nextLong() {
+ if (curr == outdegree - 1) return -1;
+ return LongBigArrays.get(successor, ++curr);
+ }
+
+ @Override
+ public long skip(final long n) {
+ final long toSkip = Math.min(n, outdegree - 1 - curr);
+ curr += toSkip;
+ return toSkip;
+ }
+ }
+
+
+ @Override
+ public ArcLabelledNodeIterator.LabelledArcIterator successors() {
+ return new BitStreamArcLabelledNodeIteratorArcIterator(underlyingNodeIterator.outdegree(), underlyingNodeIterator.successorBigArray(), label);
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ return underlyingNodeIterator.successorBigArray();
+ }
+
+ @Override
+ public Label[][] labelBigArray() {
+ return label;
+ }
+
+ @Override
+ public long outdegree() {
+ return underlyingNodeIterator.outdegree();
+ }
+
+ @Override
+ public long nextLong() {
+ final long curr = underlyingNodeIterator.nextLong();
+ final long d = underlyingNodeIterator.outdegree();
+ // Store all labels of arcs going out of the current node
+ if (ObjectBigArrays.length(label) < d) {
+ label = ObjectBigArrays.grow(label, d);
+ outer: for(int i = label.length; i-- != 0;) {
+ final Label[] t = label[i];
+ for(int j = t.length; j-- != 0;)
+ if (t[j] == null) t[j] = prototype.copy();
+ else break outer;
+ }
+ }
+ try {
+ for(long i = 0; i < d; i++) ObjectBigArrays.get(label, i).fromBitStream(ibs, curr);
+ }
+ catch (IOException e) {
+ throw new RuntimeException(e);
+ }
+ return curr;
+ }
+
+ @Override
+ public boolean hasNext() {
+ return underlyingNodeIterator.hasNext();
+ }
+
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final long from) {
+ try {
+ return new BitStreamArcLabelledNodeIterator(from, g, prototype, newInputBitStream());
+ }
+ catch (FileNotFoundException e) {
+ throw new RuntimeException(e);
+ }
+ }
+
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+ public static void store(final ArcLabelledImmutableGraph graph, final CharSequence basename, final CharSequence underlyingBasename) throws IOException {
+ store(graph, basename, underlyingBasename, null);
+ }
+
+ public static void store(final ArcLabelledImmutableGraph graph, final CharSequence basename, final CharSequence underlyingBasename, final ProgressLogger pl) throws IOException {
+ final OutputBitStream labels = new OutputBitStream(basename + LABELS_EXTENSION, STD_BUFFER_SIZE);
+ final OutputBitStream offsets = new OutputBitStream(basename + LABEL_OFFSETS_EXTENSION, STD_BUFFER_SIZE);
+
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = graph.numNodes();
+ pl.start("Saving labels...");
+ }
+
+ final ArcLabelledNodeIterator nodeIterator = graph.nodeIterator();
+ offsets.writeGamma(0);
+ long curr;
+ long count;
+ LabelledArcIterator successors;
+
+ while(nodeIterator.hasNext()) {
+ curr = nodeIterator.nextLong();
+ successors = nodeIterator.successors();
+ count = 0;
+ while(successors.nextLong() != -1) count += successors.label().toBitStream(labels, curr);
+ offsets.writeLongGamma(count);
+ if (pl != null) pl.lightUpdate();
+ }
+
+ if (pl != null) pl.done();
+ labels.close();
+ offsets.close();
+
+ final PrintWriter properties = new PrintWriter(new FileOutputStream(basename + ImmutableGraph.PROPERTIES_EXTENSION));
+ properties.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ properties.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + underlyingBasename);
+ properties.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + graph.prototype().toSpec());
+ properties.close();
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/FixedWidthIntLabel.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/FixedWidthIntLabel.java
new file mode 100644
index 0000000..923419d
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/FixedWidthIntLabel.java
@@ -0,0 +1,99 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+
+/** An integer represented in fixed width. The provided width must
+ * be smaller than 32.
+ */
+
+public class FixedWidthIntLabel extends AbstractIntLabel {
+ /** The bit width used to represent the value of this label. */
+ protected final int width;
+
+ /** Creates a new fixed-width int label.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ * @param value the value of this label.
+ */
+ public FixedWidthIntLabel(String key, int width, int value) {
+ super(key, value);
+ if (width < 0 || width > 31) throw new IllegalArgumentException("Width out of range: " + width);
+ if (value < 0 || value >= 1L << width) throw new IllegalArgumentException("Value out of range: " + Integer.toString(value));
+ this.width = width;
+ }
+
+ /** Creates a new fixed-width int label of value 0.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ */
+ public FixedWidthIntLabel(String key, int width) {
+ this(key, width, 0);
+ }
+
+ /** Creates a new fixed-width integer label using the given key and width
+ * with value 0.
+ *
+ * @param arg two strings containing the key and the width of this label.
+ */
+ public FixedWidthIntLabel(String... arg) {
+ this(arg[0], Integer.parseInt(arg[1]));
+ }
+
+ @Override
+ public Label copy() {
+ return new FixedWidthIntLabel(key, width, value);
+ }
+
+ @Override
+ public int fromBitStream(final InputBitStream inputBitStream, final long sourceUnused) throws IOException {
+ value = inputBitStream.readInt(width);
+ return width;
+ }
+
+ @Override
+ public int toBitStream(final OutputBitStream outputBitStream, final long sourceUnused) throws IOException {
+ return outputBitStream.writeInt(value, width);
+ }
+
+ /** Returns the width of this label (as provided at construction time).
+ * @return the width of this label.
+ */
+ @Override
+ public int fixedWidth() {
+ return width;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value + " (width:" + width + ")";
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + "," + width + ")";
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/FixedWidthIntListLabel.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/FixedWidthIntListLabel.java
new file mode 100644
index 0000000..ebc892b
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/FixedWidthIntListLabel.java
@@ -0,0 +1,106 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.fastutil.ints.IntArrays;
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+/** A list of integers represented in fixed width. The provided width must
+ * be smaller than 32. Each list is prefixed by its length written
+ * in {@linkplain OutputBitStream#writeGamma(int) &gamma; coding}.
+ */
+
+public class FixedWidthIntListLabel extends AbstractIntListLabel {
+ /** The bit width used to represent the value of this label. */
+ private final int width;
+
+ /** Creates a new fixed-width int label.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ * @param value the value of this label.
+ */
+ public FixedWidthIntListLabel(String key, int width, int[] value) {
+ super(key, value);
+ if (width < 0 || width > 31) throw new IllegalArgumentException("Width out of range: " + width);
+ for(int i = value.length; i-- != 0;) if (value[i] < 0 || value[i] >= 1L << width) throw new IllegalArgumentException("Value out of range: " + Integer.toString(value[i]));
+ this.width = width;
+ }
+
+ /** Creates a new fixed-width label with an empty list.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ */
+ public FixedWidthIntListLabel(String key, int width) {
+ this(key, width, IntArrays.EMPTY_ARRAY);
+ }
+
+ /** Creates a new fixed-width integer label using the given key and width
+ * with an empty list.
+ *
+ * @param arg two strings containing the key and the width of this label.
+ */
+ public FixedWidthIntListLabel(String... arg) {
+ this(arg[0], Integer.parseInt(arg[1]));
+ }
+
+ @Override
+ public Label copy() {
+ return new FixedWidthIntListLabel(key, width, value.clone());
+ }
+
+ @Override
+ public int fromBitStream(InputBitStream inputBitStream, final long sourceUnused) throws IOException {
+ long readBits = inputBitStream.readBits();
+ value = new int[inputBitStream.readGamma()];
+ for(int i = 0; i < value.length; i++) value[i] = inputBitStream.readInt(width);
+ return (int)(inputBitStream.readBits() - readBits);
+ }
+
+ @Override
+ public int toBitStream(OutputBitStream outputBitStream, final long sourceUnused) throws IOException {
+ int bits = outputBitStream.writeGamma(value.length);
+ for(int i = 0; i < value.length; i++) bits += outputBitStream.writeInt(value[i], width);
+ return bits;
+ }
+
+ /** Returns -1 (the fixed width refers to a single integer, not to the entire list).
+ * @return -1;
+ */
+ @Override
+ public int fixedWidth() {
+ return -1;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + Arrays.toString(value) + " (width:" + width + ")";
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + "," + width + ")";
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/GammaCodedIntLabel.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/GammaCodedIntLabel.java
new file mode 100644
index 0000000..338b40f
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/GammaCodedIntLabel.java
@@ -0,0 +1,98 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+
+/** A natural number represented in {@linkplain OutputBitStream#writeGamma(int) &gamma; coding}. */
+
+public class GammaCodedIntLabel extends AbstractIntLabel {
+
+ /** Creates a new label with given key and value.
+ *
+ * @param key the (only) key.
+ * @param value the value of this label.
+ */
+ public GammaCodedIntLabel(String key, int value) {
+ super(key, value);
+ if (value < 0) throw new IllegalArgumentException("Value cannot be negative: " + value);
+ }
+
+ /** Creates a new &gamma;-coded label using the given key and value 0.
+ *
+ * @param key one string containing the key of this label.
+ */
+ public GammaCodedIntLabel(String... key) {
+ super(key[0], 0);
+ }
+
+ @Override
+ public GammaCodedIntLabel copy() {
+ return new GammaCodedIntLabel(key, value);
+ }
+
+ /** Fills this label {@linkplain InputBitStream#readGamma() reading a &gamma;-coded natural number}
+ * from the given input bit stream.
+ *
+ * @param inputBitStream an input bit stream.
+ * @return the number of bits read to fill this lbael.
+ */
+
+ @Override
+ public int fromBitStream(InputBitStream inputBitStream, final long sourceUnused) throws IOException {
+ long prevRead = inputBitStream.readBits();
+ value = inputBitStream.readGamma();
+ return (int)(inputBitStream.readBits() - prevRead);
+ }
+
+ /** Writes this label {@linkplain OutputBitStream#writeGamma(int) as a &gamma;-coded natural number}
+ * to the given output bit stream.
+ *
+ * @param outputBitStream an output bit stream.
+ * @return the number of bits written.
+ */
+
+ @Override
+ public int toBitStream(OutputBitStream outputBitStream, final long sourceUnused) throws IOException {
+ return outputBitStream.writeGamma(value);
+ }
+
+ /** Returns -1 (as this label has not a fixed width).
+ * @return -1.
+ */
+
+ @Override
+ public int fixedWidth() {
+ return -1;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value + " (gamma)";
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + ")";
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/IntegerLabelFilter.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/IntegerLabelFilter.java
new file mode 100644
index 0000000..ae262ac
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/IntegerLabelFilter.java
@@ -0,0 +1,45 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2008-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.fastutil.ints.IntOpenHashSet;
+
+public class IntegerLabelFilter implements LabelledArcFilter {
+ /** The values of the label that will be preserved. */
+ private final IntOpenHashSet values;
+ private final String key;
+
+ public IntegerLabelFilter(final String key, int... value) {
+ this.key = key;
+ values = new IntOpenHashSet(value);
+ }
+
+ public IntegerLabelFilter(final String... keyAndvalues) {
+ if (keyAndvalues.length == 0) throw new IllegalArgumentException("You must specificy a key name");
+ this.key = keyAndvalues[0].length() == 0 ? null : keyAndvalues[0];
+ values = new IntOpenHashSet(keyAndvalues.length);
+ for(int i = 1; i < keyAndvalues.length; i++) values.add(Integer.parseInt(keyAndvalues[i]));
+ }
+
+ @Override
+ public boolean accept(long i, long j, Label label) {
+ return values.contains(key == null ? label.getInt() : label.getInt(key));
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/Label.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/Label.java
new file mode 100644
index 0000000..8d31ea9
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/Label.java
@@ -0,0 +1,290 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.lang.FlyweightPrototype;
+import it.unimi.dsi.lang.ObjectParser;
+
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+/** A set of attributes that can be used to decorate a node or
+ * an arc of a graph. Attributes appear in the form of &lt;<var>key</var>,<var>value</var>&gt;
+ * pairs, where keys are of type {@link String}. Among attributes,
+ * one (called the <em>well-known attribute</em>), has a special status:
+ * its key can be obtained by using the {@link #wellKnownAttributeKey()} method.
+ *
+ * <p>Values associated to attributes can be anything: the value can be
+ * obtained (in the form of an object) with {@link #get(String)}.
+ * If the value is of primitive type, the alternative type-specific method
+ * (e.g., {@link #getInt(String)}, or {@link #getChar(String)}) can be
+ * called, with the proviso that such methods may throw an {@link java.lang.IllegalArgumentException}
+ * if the attribute type can not be converted to the one specified without loss of information.
+ *
+ * <p>The value of the well-known attribute can be obtained with {@link #get()},
+ * or with the appropriate type-specific version of the method.
+ *
+ * <h2>Serialisation</h2>
+ *
+ * <p>Implementations must provide {@link #toBitStream(OutputBitStream, long)} and {@link #fromBitStream(InputBitStream, long)}
+ * methods that serialise to a bitstream and deserialise to a bitstream a label, respectively. Since
+ * {@link #fromBitStream(InputBitStream, long)} has no length information, the label format must
+ * be self-delimiting. This can be obtained with a fixed length scheme (see, e.g., {@link FixedWidthIntLabel}),
+ * or using self-delimiting codes (see, e.g., {@link GammaCodedIntLabel}).
+ *
+ * <p>The methods {@link #toBitStream(OutputBitStream,long)}
+ * and {@link #fromBitStream(InputBitStream,long)} are given as an additional information the number of source
+ * node of the arc over which this label is put. They may use this information to decide how the
+ * label should be stored (typically, to do a more clever compression job).
+ *
+ * <p>The advantage of fixed-width labels (i.e., those for which {@link #fixedWidth()} does not return -1)
+ * is that when loading a {@link BitStreamArcLabelledImmutableGraph} with an offset step larger than 1 the position in the bitstream
+ * for the labels of a node can be calculated more quickly, as the computation just requires the outdegree
+ * of the nodes, whereas in general one has to skip in-between labels with an explicit deserialisation.
+ *
+ * <h2>String-based constructors</h2>
+ *
+ * <p>By convention, all concrete classes implementing this interface must follow the {@link ObjectParser} conventions:
+ * in particular, they must provide a constructor accepting strings (either in fixed or variable number) where the first string is the key.
+ * The constructor must perform data validation and build an instance with a default value (e.g., 0 for numerical labels). The
+ * constructor is used, for instance, by {@link BitStreamArcLabelledImmutableGraph} to instantiate a label prototype.
+ * Finally, the method {@link #toSpec()} must return a string that is accepted by {@link ObjectParser}.
+ */
+
+
+public interface Label extends FlyweightPrototype<Label> {
+ /** Returns the well-known attribute key.
+ *
+ * @return the well-known attribute key.
+ */
+ public String wellKnownAttributeKey();
+
+ /** All attribute keys (in arbitrary order).
+ *
+ * @return the keys of all attributes.
+ */
+ public String[] attributeKeys();
+
+ /** The types of all attributes in the same order as they are returned by {@link #attributeKeys()}.
+ *
+ * @return the type of all attributes.
+ */
+ public Class<?>[] attributeTypes();
+
+ /** The value associated to the attribute with given key.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws NoSuchElementException if the attribute key is not one of the attributes of this label.
+ */
+ public Object get(String key) throws NoSuchElementException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a byte.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public byte getByte(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a short.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public short getShort(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a int.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public int getInt(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a long.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public long getLong(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a float.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public float getFloat(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a double.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public double getDouble(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a char.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public char getChar(String key) throws IllegalArgumentException;
+
+ /** The value associated to the attribute with given key, provided that the latter has a type that fits a boolean.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @param key the attribute key.
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public boolean getBoolean(String key) throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ */
+ public Object get() throws NoSuchElementException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a byte.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public byte getByte() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a short.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public short getShort() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a int.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public int getInt() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a long.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public long getLong() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a float.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public float getFloat() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a double.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public double getDouble() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a char.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public char getChar() throws IllegalArgumentException;
+
+ /** The value associated to the well-known attribute, provided that the latter has a type that fits a boolean.
+ * Otherwise, an {@link IllegalArgumentException} is thrown.
+ *
+ * @return the attribute value; if the attribute type is primitive, it is wrapped suitably.
+ * @throws IllegalArgumentException if the attribute key is not known, or it has the wrong type.
+ */
+ public boolean getBoolean() throws IllegalArgumentException;
+
+ /** Returns a copy of this label.
+ *
+ * @return a new label that copies this one.
+ */
+ @Override
+ public Label copy();
+
+ /** Returns a string representing the specification of this label.
+ *
+ * <p>Each label class can be instantiated in several ways (e.g., {@link FixedWidthIntLabel}
+ * requires a name for the well-known attribute and a number of bits). This method must return
+ * a representation that can be used by {@link ObjectParser} to instantiate the class, and
+ * consequently there <strong>must</strong> exist a matching constructor whose arguments are strings.
+ *
+ * <p>There is an equation that must be always satisfied:
+ * <pre style="text-align:center; padding: .5em">
+ * ObjectParser.fromSpec(x.toSpec()).toSpec().equals(x.toSpec())
+ * </pre>
+ * @return a string representing the specification of this label.
+ * @see ObjectParser#fromSpec(String, Class)
+ */
+ public String toSpec();
+
+ /** Fills this label with data from the given input bit stream, knowing the source node of the arc.
+ * If {@link #fixedWidth()} is not negative, the value returned must coincide with {@link #fixedWidth()}.
+ * This method is optional.
+ *
+ * @param inputBitStream an input bit stream offering a label.
+ * @param source the source node.
+ * @return the number of bits read to fill this label.
+ */
+ public int fromBitStream(InputBitStream inputBitStream, long source) throws IOException, UnsupportedOperationException;
+
+ /** Writes out this label to the given input bit stream, in self-delimiting form, knowing the source node of the arc.
+ * If {@link #fixedWidth()} is not negative, the value returned must coincide with {@link #fixedWidth()}.
+ * This method is optional.
+ *
+ * @param outputBitStream an output bit stream where the label will be written.
+ * @param source the source node.
+ * @return the number of bits written.
+ */
+ public int toBitStream(OutputBitStream outputBitStream, long source) throws IOException, UnsupportedOperationException;
+
+ /** Returns the fixed length of this label, in bits, if this label has fixed width.
+ *
+ * @return the fixed length of this label, or -1 if this label has not fixed width.
+ */
+ public int fixedWidth();
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/LabelMergeStrategy.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/LabelMergeStrategy.java
new file mode 100644
index 0000000..e700e04
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/LabelMergeStrategy.java
@@ -0,0 +1,44 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+/** A way to merge two labels into one; the actual merge is performed by the {@link #merge(Label, Label)}
+ * method. Usually, strategies require that the two labels provided are of
+ * the same kind (i.e., instances of the same {@link it.unimi.dsi.big.webgraph.labelling.Label}
+ * class). Moreover, some strategies only accept label of a certain type,
+ * and throw an {@link java.lang.IllegalArgumentException} if the type
+ * is wrong.
+ *
+ */
+public interface LabelMergeStrategy {
+
+ /** Merges two given labels; either label may be <code>null</code>, but not
+ * both. Implementing classes may decide to throw an {@link IllegalArgumentException}
+ * if the labels provided are not of the same type, or not of a
+ * specific type.
+ *
+ * @param first the first label to be merged.
+ * @param second the second label to be merged.
+ * @return the resulting label (note that the returned label may be reused by the
+ * implementing class, so users are invited to make a {@link Label#copy()}
+ * of it if they need to keep the label in between calls).
+ */
+ public Label merge(Label first, Label second);
+
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/LabelSemiring.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/LabelSemiring.java
new file mode 100644
index 0000000..9b72a9d
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/LabelSemiring.java
@@ -0,0 +1,80 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2008-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.Transform;
+
+/** A semiring used to compose labels.
+ * <p>When {@linkplain Transform#compose(it.unimi.dsi.big.webgraph.ImmutableGraph, it.unimi.dsi.big.webgraph.ImmutableGraph) composing}
+ * two labelled graphs, we need a way to combine labels along a path, and a way to combine labels from different
+ * paths connecting two nodes. These two operations are implemented by
+ * {@link #multiply(Label, Label)} and {@link #add(Label, Label)}. The name of the two
+ * methods are due to the fact that their operations must define a <em>semiring</em>
+ * for which you must also provide a {@link #zero()} and a {@link #one()}. For instance,
+ * if a graph is labelled with weights, a semiring implementing {@link #multiply(Label, Label)} by
+ * a standard sum and {@link #add(Label, Label)} using the minimum operator will give a composition
+ * strategy that computes the shortest path connecting two nodes.
+ *
+ * <p>Usually, strategies require that the two labels provided are of
+ * the same kind (i.e., instances of the same {@link it.unimi.dsi.big.webgraph.labelling.Label}
+ * class). Moreover, some strategies only accept label of a certain type,
+ * and throw an {@link java.lang.IllegalArgumentException} if the type
+ * is wrong.
+ */
+public interface LabelSemiring {
+
+ /** Multiply two given labels; either label may be <code>null</code>, but not
+ * both. Implementing classes may decide to throw an {@link IllegalArgumentException}
+ * if the labels provided are not of the same type, or not of a
+ * specific type.
+ *
+ * @param first the first label to be multiplied.
+ * @param second the second label to be multiplied.
+ * @return the resulting label (note that the returned label may be reused by the
+ * implementing class, so users are invited to make a {@link Label#copy()}
+ * of it if they need to keep the label in between calls).
+ */
+ public Label multiply(Label first, Label second);
+
+ /** Adds two given labels; either label may be <code>null</code>, but not
+ * both. Implementing classes may decide to throw an {@link IllegalArgumentException}
+ * if the labels provided are not of the same type, or not of a
+ * specific type.
+ *
+ * @param first the first label to be added.
+ * @param second the second label to be added.
+ * @return the resulting label (note that the returned label may be reused by the
+ * implementing class, so users are invited to make a {@link Label#copy()}
+ * of it if they need to keep the label in between calls).
+ */
+ public Label add(Label first, Label second);
+
+ /** Returns the zero of {@link #add(Label, Label)}.
+ *
+ * @return the zero of {@link #add(Label, Label)}.
+ */
+ public Label zero();
+
+ /** Returns the one of {@link #multiply(Label, Label)}.
+ *
+ * @return the one of {@link #multiply(Label, Label)}.
+ */
+ public Label one();
+
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/Labels.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/Labels.java
new file mode 100644
index 0000000..55e6bb2
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/Labels.java
@@ -0,0 +1,32 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+public class Labels {
+
+ /** A strategy that keeps the first label, if present, and the second only
+ * if the first is not present.
+ */
+ public static final LabelMergeStrategy KEEP_FIRST_MERGE_STRATEGY = new LabelMergeStrategy() {
+ @Override
+ public Label merge(Label first, Label second) {
+ return first != null? first : second;
+ }
+ };
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/UnionArcLabelledImmutableGraph.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/UnionArcLabelledImmutableGraph.java
new file mode 100644
index 0000000..92acdce
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/UnionArcLabelledImmutableGraph.java
@@ -0,0 +1,307 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.big.webgraph.Transform;
+import it.unimi.dsi.big.webgraph.UnionImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+import it.unimi.dsi.fastutil.objects.ObjectIterators;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+/** An arc-labelled immutable graph representing the union of two given such graphs.
+ * Here by &ldquo;union&rdquo; we mean that an arc will belong to the union iff it belongs to at least one of the two graphs (the number of
+ * nodes of the union is taken to be the maximum among the number of nodes of each graph). Labels are assumed to have the same
+ * prototype in both graphs, and are treated as follows: if an arc is present in but one graph, its label in the resulting
+ * graph is going to be the label of the arc in the graph where it comes from; if an arc is present in both graphs, the labels
+ * are combined using a provided {@link LabelMergeStrategy}.
+ *
+ * <h2>Remarks about the implementation</h2>
+ *
+ * <p>Due to the lack of multiple inheritance, we could not extend both {@link UnionImmutableGraph}
+ * and {@link ArcLabelledImmutableGraph}, hence we forcedly decided to extend the latter. The possibility of using delegation
+ * on the former was also discarded because the code for reading and merging labels is so tightly coupled with the rest that it
+ * would have been essentially useless (and even dangerous) to delegate the iteration methods. As a result, some of the code of this
+ * class is actually almost a duplicate of the code of {@link UnionImmutableGraph}.
+ */
+public class UnionArcLabelledImmutableGraph extends ArcLabelledImmutableGraph {
+ @SuppressWarnings("unused")
+ private static final Logger LOGGER = LoggerFactory.getLogger(Transform.class);
+ @SuppressWarnings("unused")
+ private static final boolean DEBUG = false;
+ private static final boolean ASSERTS = true;
+ private final ArcLabelledImmutableGraph g0, g1;
+ private final long n0, n1, numNodes;
+
+ /** The strategy used to merge labels when the same arc is present in both graphs. */
+ private final LabelMergeStrategy labelMergeStrategy;
+
+ /** The node whose successors are cached, or -1 if no successors are currently cached. */
+ private final int cachedNode = -1;
+
+ /** The outdegree of the cached node, if any. */
+ private int outdegree ;
+
+ /** The successors of the cached node, if any; note that the array might be larger. */
+ private long[][] cache = LongBigArrays.EMPTY_BIG_ARRAY;
+
+ /** The labels on the arcs going out of the cached node, if any; note that the array might be larger. */
+ private Label labelCache[][] = new Label[0][0];
+ /** The prototype for the labels of this graph. */
+ private final Label prototype;
+
+ @Override
+ public UnionArcLabelledImmutableGraph copy() {
+ return new UnionArcLabelledImmutableGraph(g0.copy(), g1.copy(), labelMergeStrategy);
+ }
+
+ /** Creates the union of two given graphs.
+ *
+ * @param g0 the first graph.
+ * @param g1 the second graph.
+ * @param labelMergeStrategy the strategy used to merge labels when the same arc is present in both graphs.
+ */
+ public UnionArcLabelledImmutableGraph(ArcLabelledImmutableGraph g0, ArcLabelledImmutableGraph g1, LabelMergeStrategy labelMergeStrategy) {
+ this.g0 = g0;
+ this.g1 = g1;
+ this.labelMergeStrategy = labelMergeStrategy;
+ n0 = g0.numNodes();
+ n1 = g1.numNodes();
+ numNodes = Math.max(n0, n1);
+ if (g0.prototype().getClass() != g1.prototype().getClass()) throw new IllegalArgumentException("The two graphs have different label classes (" + g0.prototype().getClass().getSimpleName() + ", " +g1.prototype().getClass().getSimpleName() + ")");
+ prototype = g0.prototype();
+ }
+
+ @Override
+ public ArcLabelledNodeIterator nodeIterator(final long from) {
+
+ return new ArcLabelledNodeIterator() {
+ /** If outdegree is nonnegative, the successors of the current node (this array may be, however, larger). */
+ @SuppressWarnings("hiding")
+ private long cache[][] = LongBigArrays.EMPTY_BIG_ARRAY;
+ /** If outdegree is nonnegative, the labels on the arcs going out of the current node (this array may be, however, larger). */
+ @SuppressWarnings("hiding")
+ private Label[][] labelCache = new Label[0][0];
+ /** The outdegree of the current node, or -1 if the successor array for the current node has not been computed yet. */
+ @SuppressWarnings("hiding")
+ private long outdegree = -1;
+ private ArcLabelledNodeIterator i0 = from < n0? g0.nodeIterator(from) : null;
+ private ArcLabelledNodeIterator i1 = from < n1? g1.nodeIterator(from) : null;
+
+ @Override
+ public boolean hasNext() {
+ return i0 != null && i0.hasNext() || i1 != null && i1.hasNext();
+ }
+
+ @Override
+ public long nextLong() {
+ if (! hasNext()) throw new java.util.NoSuchElementException();
+ outdegree = -1;
+ long result = -1;
+ if (i0 != null) {
+ if (i0.hasNext()) result = i0.nextLong();
+ else i0 = null;
+ }
+ if (i1 != null) {
+ if (i1.hasNext()) result = i1.nextLong();
+ else i1 = null;
+ }
+ return result;
+ }
+
+ @Override
+ public long[][] successorBigArray() {
+ if (outdegree != -1) return cache;
+ if (i0 == null) {
+ outdegree = i1.outdegree();
+ cache = i1.successorBigArray();
+ labelCache = i1.labelBigArray();
+ return cache;
+ }
+ if (i1 == null) {
+ outdegree = i0.outdegree();
+ cache = i0.successorBigArray();
+ labelCache = i0.labelBigArray();
+ return cache;
+ }
+ // We need to perform a manual merge
+ ArcLabelledNodeIterator.LabelledArcIterator succ0 = i0.successors();
+ ArcLabelledNodeIterator.LabelledArcIterator succ1 = i1.successors();
+ long s0 = -1, s1 = -1;
+ Label l0 = null, l1 = null;
+ outdegree = 0;
+ // Note that the parallel OR is necessary.
+ while ((s0 != -1 || (s0 = succ0.nextLong()) != -1) | (s1 != -1 || (s1 = succ1.nextLong()) != -1)) {
+ if (s0 != -1) l0 = succ0.label().copy();
+ if (s1 != -1) l1 = succ1.label().copy();
+ if (ASSERTS) assert s0 >= 0 || s1 >= 0;
+ cache = LongBigArrays.grow(cache, outdegree + 1);
+ labelCache = ObjectBigArrays.grow(labelCache, outdegree + 1);
+ if (s1 < 0 || 0 <= s0 && s0 < s1) {
+ LongBigArrays.set(cache, outdegree, s0);
+ ObjectBigArrays.set(labelCache, outdegree, l0);
+ s0 = -1;
+ } else if (s0 < 0 || 0 <= s1 && s1 < s0) {
+ LongBigArrays.set(cache, outdegree, s1);
+ ObjectBigArrays.set(labelCache, outdegree, l1);
+ s1 = -1;
+ } else {
+ if (ASSERTS) assert s0 == s1 && s0 >= 0;
+ LongBigArrays.set(cache, outdegree, s0);
+ ObjectBigArrays.set(labelCache, outdegree, labelMergeStrategy.merge(l0, l1));
+ s0 = s1 = -1;
+ }
+ outdegree++;
+ }
+ return cache;
+ }
+
+ @Override
+ public long outdegree() {
+ successorBigArray(); // So that the cache is filled up
+ return outdegree;
+ }
+
+ @Override
+ public Label[][] labelBigArray() {
+ successorBigArray(); // So that the cache is filled up
+ return labelCache;
+ }
+
+ @Override
+ public LabelledArcIterator successors() {
+ successorBigArray(); // So that the cache is filled up
+ return new LabelledArcIterator() {
+ long nextToBeReturned = 0;
+
+ @Override
+ public Label label() {
+ return ObjectBigArrays.get(labelCache, nextToBeReturned - 1);
+ }
+
+ @Override
+ public long nextLong() {
+ if (nextToBeReturned == outdegree) return -1;
+ return LongBigArrays.get(cache, nextToBeReturned++);
+ }
+
+ @Override
+ public long skip(long x) {
+ long skipped = Math.min(x, outdegree - nextToBeReturned);
+ nextToBeReturned += skipped;
+ return skipped;
+ }
+ };
+ }
+ };
+
+ }
+
+ @Override
+ public long numNodes() {
+ return numNodes;
+ }
+
+ @Override
+ public boolean randomAccess() {
+ return g0.randomAccess() && g1.randomAccess();
+ }
+
+ @Override
+ public long[][] successorBigArray(long x) {
+ if (x == cachedNode) return cache;
+ // We need to perform a manual merge
+ ArcLabelledNodeIterator.LabelledArcIterator succ0 = (LabelledArcIterator) (x < n0? g0.successors(x) : ObjectIterators.EMPTY_ITERATOR);
+ ArcLabelledNodeIterator.LabelledArcIterator succ1 = (LabelledArcIterator) (x < n1? g1.successors(x) : ObjectIterators.EMPTY_ITERATOR);
+ long outdegree = 0;
+ long s0 = -1, s1 = -1;
+ Label l0 = null, l1 = null;
+ while ((s0 != -1 || (s0 = succ0.nextLong()) != -1) | (s1 != -1 || (s1 = succ1.nextLong()) != -1)) {
+ if (s0 != -1) l0 = succ0.label().copy();
+ if (s1 != -1) l1 = succ1.label().copy();
+ if (ASSERTS) assert s0 >= 0 || s1 >= 0;
+ cache = LongBigArrays.grow(cache, outdegree + 1);
+ labelCache = ObjectBigArrays.grow(labelCache, outdegree + 1);
+ if (s1 < 0 || 0 <= s0 && s0 < s1) {
+ LongBigArrays.set(cache, outdegree, s0);
+ ObjectBigArrays.set(labelCache, outdegree, l0);
+ s0 = -1;
+ } else if (s0 < 0 || 0 <= s1 && s1 < s0) {
+ LongBigArrays.set(cache, outdegree, s1);
+ ObjectBigArrays.set(labelCache, outdegree, l1);
+ s1 = -1;
+ } else {
+ if (ASSERTS) assert s0 == s1 && s0 >= 0;
+ LongBigArrays.set(cache, outdegree, s0);
+ ObjectBigArrays.set(labelCache, outdegree, labelMergeStrategy.merge(l0, l1));
+ s0 = s1 = -1;
+ }
+ outdegree++;
+ }
+ return cache;
+ }
+
+ @Override
+ public long outdegree(final long x) {
+ successorBigArray(x); // So the cache gets filled
+ return outdegree;
+ }
+
+ @Override
+ public Label[][] labelBigArray(final long x) {
+ successorBigArray(x); // So that the cache is filled up
+ return labelCache;
+ }
+
+ @Override
+ public LabelledArcIterator successors(final long x) {
+ successorBigArray(x); // So that the cache is filled up
+ return new LabelledArcIterator() {
+ long nextToBeReturned = 0;
+
+ @Override
+ public Label label() {
+ return ObjectBigArrays.get(labelCache, nextToBeReturned);
+ }
+
+ @Override
+ public long nextLong() {
+ if (nextToBeReturned == outdegree) return -1;
+ return LongBigArrays.get(cache, nextToBeReturned++);
+ }
+
+ @Override
+ public long skip(long x) {
+ long skipped = Math.min(x, outdegree - nextToBeReturned);
+ nextToBeReturned += skipped;
+ return skipped;
+ }
+ };
+ }
+
+ @Override
+ public Label prototype() {
+ return prototype;
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/package.html b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/package.html
new file mode 100644
index 0000000..01b0daa
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/labelling/package.html
@@ -0,0 +1,49 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>Webgraph</title>
+ </head>
+
+ <body>
+
+ <P>Main classes implementing labelling for {@linkplain it.unimi.dsi.webgraph.ImmutableGraph immutable graphs}.
+ A labelled immutable graph is a graph endowed with labels, on its arcs and/or on its nodes; currently, only
+ arc labelling is implemented (since node labelling can be easily dealt with outside of the WebGraph framework, anyway).
+
+ <p><strong>Warning</strong>: this package is experimental.
+
+ <H1>Labels</H1>
+
+ <P>A label is just an instance of a class implementing the {@link it.unimi.dsi.webgraph.labelling.Label} interface: essentially,
+ for maximum versatility, is a set of key/value pairs, where keys are strings and values can be essentially anything; in most simple cases,
+ though, labels will be made by a single key/value pair (and the key will be, of course, irrelevant).
+ All arcs of the same graph will have labels of the same class, and for this reason labels offer a {@link it.unimi.dsi.webgraph.labelling.Label#copy()}
+ method that allows the prototype design pattern to be used.
+
+ <P>The only requirement for the serialisation of labels is that every label can be written as a self-delimiting bit sequence (via the
+ {@link it.unimi.dsi.webgraph.labelling.Label#toBitStream(it.unimi.dsi.io.OutputBitStream,int)} method); essentially,
+ two kinds of label exists: fixed-width labels (that write themselves using always the same, fixed number of bits) or
+ variable-width labels; you can know whether a label has fixed width or not by calling {@link it.unimi.dsi.webgraph.labelling.Label#fixedWidth()}
+ (this method will return -1 if the width is variable).
+
+ <P>As an example, single-attribute integer label classes are implemented, {@linkplain it.unimi.dsi.webgraph.labelling.FixedWidthIntLabel one using fixed width}
+ and {@linkplain it.unimi.dsi.webgraph.labelling.GammaCodedIntLabel another using &gamma;-coding}.
+
+ <H2>Labelled graphs</H2>
+
+ <P>An {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph arc-labelled immutable graphs} is an
+ {@linkplain it.unimi.dsi.webgraph.ImmutableGraph immutable graphs} with labels on its arcs; it rewrites the immutable graphs methods
+ covariantly so that, for example, when one iterates on the successors of a node using {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph#successors(int)}
+ not a simple {@link it.unimi.dsi.fastutil.ints.IntIterator} is returned (iterating over the nodes that are successors of the given node), but rather
+ a {@link it.unimi.dsi.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator} (that returns every time a node/label pair).
+
+ <P>Even though different implementations of {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph arc-labelled immutable graphs}
+ may exist, we provide one ({@link it.unimi.dsi.webgraph.labelling.BitStreamArcLabelledImmutableGraph}) that assumes that an immutable graph has been
+ provided and that labels have been written onto a label file in the same order as the arcs of the immutable graph would be returned by the
+ {@link it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph#nodeIterator()} method. An additional offset file must be provided that
+ allows one to know the offset (in bit) within the label file where the labels of the arcs going out of a given node start.
+ These data are generated using the <code>store()</code> methods whose implementation is suggested in
+ the class documentation of {@linkplain it.unimi.dsi.webgraph.labelling.ArcLabelledImmutableGraph}.
+
+ </body>
+</html>
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/package.html b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/package.html
new file mode 100644
index 0000000..ff5e29c
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/package.html
@@ -0,0 +1,13 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>Webgraph</title>
+ </head>
+
+ <body>
+
+ <P>Main classes implementing the WebGraph algorithms.
+
+
+ </body>
+</html>
diff --git a/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/test/SpeedTest.java b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/test/SpeedTest.java
new file mode 100644
index 0000000..cfdea53
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/it/unimi/dsi/big/webgraph/test/SpeedTest.java
@@ -0,0 +1,146 @@
+package it.unimi.dsi.big.webgraph.test;
+
+/*
+ * Copyright (C) 2003-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.ImmutableGraph.LoadMethod;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.XorShift1024StarRandom;
+import it.unimi.dsi.webgraph.GraphClassParser;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+
+public class SpeedTest {
+ private final static int WARMUP = 3;
+ private final static int REPEAT = 10;
+ private SpeedTest() {}
+
+ @SuppressWarnings("boxing")
+ static public void main(String arg[]) throws IllegalArgumentException, SecurityException, JSAPException, IOException, IllegalAccessException, InvocationTargetException, NoSuchMethodException, ClassNotFoundException, InstantiationException {
+ final SimpleJSAP jsap = new SimpleJSAP(SpeedTest.class.getName(), "Tests the access speed of an ImmutableGraph. By default, the graph is enumerated sequentially, but you can specify a number of nodes to be accessed randomly.\n\nThis class executes " + WARMUP + " warmup iterations, and then averages the timings of the following " + REPEAT + " iterations.",
+ new Parameter[] {
+ new FlaggedOption("graphClass", GraphClassParser.getParser(), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'g', "graphClass", "Forces a Java class for the source graph."),
+ new Switch("spec", 's', "spec", "The basename is a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new FlaggedOption("seed", JSAP.LONG_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'S', "seed", "A seed for the pseudorandom number generator."),
+ new FlaggedOption("random", JSAP.LONGSIZE_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'r', "random", "Perform a random-access test on this number of nodes instead of enumerating sequentially the whole graph."),
+ new Switch("first", 'f', "first", "Just enumerate the first successor of each tested node."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ final JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean random = jsapResult.userSpecified("random");
+ final boolean spec = jsapResult.getBoolean("spec");
+ final boolean first = jsapResult.userSpecified("first");
+ final Class<?> graphClass = jsapResult.getClass("graphClass");
+ final String basename = jsapResult.getString("basename");
+ if (graphClass != null && spec) throw new IllegalArgumentException("Options --graph-class and --spec are incompatible.");
+
+ final ProgressLogger pl = new ProgressLogger();
+ final long seed = jsapResult.userSpecified("seed") ? jsapResult.getLong("seed") : Util.randomSeed();
+ final XorShift1024StarRandom r = new XorShift1024StarRandom();
+
+ System.err.println("Seed: " + seed);
+
+ // The number of overall links, unless first is true, in which case the number of tested nodes.
+ long totLinks = 0;
+ long cumulativeTime = 0;
+
+ final long samples;
+ final ImmutableGraph graph;
+
+ if (random) {
+ if (jsapResult.userSpecified("graphClass")) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.STANDARD.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ else if (spec) graph = ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ else graph = ImmutableGraph.load(basename, pl);
+
+ final long n = graph.numNodes();
+ samples = jsapResult.getLong("random");
+
+ r.setSeed(seed);
+ if (first) totLinks = samples;
+ else for(long i = samples; i-- != 0;) totLinks += graph.outdegree(r.nextLong(n));
+
+ System.err.println(first ? "Accessing the first link on " + samples + " random nodes using ImmutableGraph.successors()..." : "Accessing links on " + samples + " random nodes using ImmutableGraph.successors()...");
+
+ for(int k = WARMUP + REPEAT; k-- != 0;) {
+ r.setSeed(seed);
+ long time = -System.nanoTime();
+ if (first)
+ for(long i = samples; i-- != 0;) graph.successors(r.nextLong(n)).nextLong();
+ else
+ for(long i = samples; i-- != 0;)
+ for(LazyLongIterator links = graph.successors(r.nextLong(n)); links.nextLong() != - 1;);
+
+ time += System.nanoTime();
+
+ if (k < REPEAT) cumulativeTime += time;
+ System.err.printf("Intermediate time: %3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ time / 1E9, samples, totLinks, (samples * 1E9) / time, (totLinks * 1E9) / time, time / (double)samples, time / (double)totLinks);
+ }
+ }
+ else {
+ if (first) throw new IllegalArgumentException("Option --first requires --random.");
+ if (jsapResult.userSpecified("graphClass")) graph = (ImmutableGraph)graphClass.getMethod(LoadMethod.STANDARD.toMethod(), CharSequence.class, ProgressLogger.class).invoke(null, basename, pl);
+ else if (spec) graph = ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE);
+ else graph = ImmutableGraph.load(basename, pl);
+
+ samples = graph.numNodes();
+
+ System.err.println("Accessing links sequentially using ImmutableGraph.successorArray()...");
+
+ for(int k = WARMUP + REPEAT; k-- != 0;) {
+ long time = -System.nanoTime();
+ final NodeIterator nodeIterator = graph.nodeIterator();
+ totLinks = 0;
+ for(long i = samples; i-- != 0;) {
+ nodeIterator.nextLong();
+ totLinks += nodeIterator.outdegree();
+ nodeIterator.successorBigArray();
+ }
+ time += System.nanoTime();
+
+ if (k < REPEAT) cumulativeTime += time;
+ System.err.printf("Intermediate time: %3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ time / 1E9, samples, totLinks, (samples * 1E9) / time, (totLinks * 1E9) / time, time / (double)samples, time / (double)totLinks);
+ }
+ }
+
+ final double averageTime = cumulativeTime / (double)REPEAT;
+ System.out.printf("Time: %.3fs nodes: %d; arcs %d; nodes/s: %.3f arcs/s: %.3f ns/node: %3f, ns/link: %.3f\n",
+ averageTime / 1E9, samples, totLinks, (samples * 1E9) / averageTime, (totLinks * 1E9) / averageTime, averageTime / samples, averageTime / totLinks);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/src/overview.html b/third_party/webgraph-big-3.5.0/src/overview.html
new file mode 100644
index 0000000..af1c003
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/src/overview.html
@@ -0,0 +1,56 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+<html>
+ <head>
+ <title>WebGraph (big)</title>
+ </head>
+
+ <body>
+
+ <P>WebGraph is a framework to study the web graph. It provides simple ways to manage
+ very large graphs, exploiting modern compression techniques.
+ The big version is a fork of the original WebGraph that can handle more than 2<sup>31</sup>
+ nodes. For more details on WebGraph that are common between the standard and the big
+ version, please see <a href="http://webgraph.dsi.unimi.it/">WebGraph</a>.
+
+ <h2>Main differences</h2>
+
+ <p>If you are used to WebGraph, the main difference is that, of course, nodes are
+ indexed by long integers. Correspondingly, iterators on nodes are
+ {@link it.unimi.dsi.big.webgraph.LazyLongIterator}s, and all array-based methods
+ (such as {@link it.unimi.dsi.big.webgraph.ImmutableGraph#successorBigArray(long)} or
+ {@link it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph#labelBigArray(long)})
+ return {@linkplain it.unimi.dsi.fastutil.BigArrays big arrays}.
+
+ <p>Some classes have not been ported, and will be ported on an &ldquo;as-needed&rdquo; basis.
+
+ <h2>Porting code</h2>
+
+ <p>If you want to port code written for WebGraph to the big version, the main
+ nuisance is the fact that {@link it.unimi.dsi.big.webgraph.ImmutableGraph#successorBigArray(long)}
+ returns, as the name says, a {@linkplain it.unimi.dsi.fastutil.BigArrays big array}, which
+ cannot be accessed like a standard Java array. Watch out in particular for accesses to the
+ <code>length</code> field, which will be syntactically correct even on a big array, but
+ <strong>must</strong> be replaced by calls to a suitable method (e.g.,
+ {@link it.unimi.dsi.fastutil.longs.LongBigArrays#length(long[][])}). In general, you
+ must get accustomed to big-array methods before porting code.
+
+ <p>To simplify many mundane matters, such as unit tests, {@link it.unimi.dsi.big.webgraph.ImmutableGraph}
+ provides two static wrapping methods ({@link it.unimi.dsi.big.webgraph.ImmutableGraph#wrap(it.unimi.dsi.webgraph.ImmutableGraph)}
+ and {@link it.unimi.dsi.big.webgraph.ImmutableGraph#wrap(ImmutableGraph)}) that
+ turn a standard {@link it.unimi.dsi.webgraph.ImmutableGraph} into a big {@link it.unimi.dsi.big.webgraph.ImmutableGraph}
+ and viceversa. Thus, for instance, there is no big version of
+ {@link it.unimi.dsi.webgraph.ArrayListMutableGraph}: it is expected that instances will be just
+ wrapped should you need to use them in the big framework.
+
+ <h2>Compatibility</h2>
+
+ <p>The serialisation format of the standard and big versions of {@link it.unimi.dsi.webgraph.BVGraph} are compatible (of
+ course, you cannot load a graph with more than 2<sup>31</sup> elements using the standard version). The <em>same</em>
+ graph loaded with instances of the two classes, however, will not by {@linkplain java.lang.Object#equals(Object) equal}.
+ You must wrap one or the other (see above) to check for equality.
+
+ <p>Note also that usually satellite data generated by various utilities (e.g., {@link it.unimi.dsi.big.webgraph.algo.StronglyConnectedComponents})
+ are written using formats that are <strong>not</strong> compatible.
+
+ </body>
+</html>
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ArcListASCIIGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ArcListASCIIGraphTest.java
new file mode 100644
index 0000000..3a37c32
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ArcListASCIIGraphTest.java
@@ -0,0 +1,56 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+
+import org.junit.Test;
+
+public class ArcListASCIIGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testLoadOnce() throws UnsupportedEncodingException, IOException {
+
+ ArcListASCIIGraph g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("0 2\n0 1\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 \t 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("1 2".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("2 1".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("0 1\n2 1".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+
+ g = ArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("\n\n0 1\n2 1\n\n".getBytes("ASCII")));
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), ImmutableGraph.wrap(new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView()));
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/BVGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/BVGraphTest.java
new file mode 100644
index 0000000..0e1019b
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/BVGraphTest.java
@@ -0,0 +1,214 @@
+package it.unimi.dsi.big.webgraph;
+
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Properties;
+import java.util.zip.GZIPInputStream;
+
+import org.junit.Test;
+
+public class BVGraphTest extends WebGraphTestCase {
+
+ public static File storeTempGraph(final ImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(BVGraphTest.class.getSimpleName(), "test");
+ BVGraph.store(g, basename.toString());
+ return basename;
+ }
+
+ public static File storeTempGraph(final ImmutableGraph g, int windowSize, int maxRefCount, int minIntervalLength, int flags) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(BVGraphTest.class.getSimpleName(), "test");
+ BVGraph.store(g, basename.toString(), windowSize, maxRefCount, minIntervalLength, 3, flags);
+ return basename;
+ }
+
+ @Test
+ public void testCompression() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n = 1; n < 8; n++) { // Graph construction parameter
+ for(int type = 1; type < 3; type++) {
+ final ImmutableGraph g = ImmutableGraph.wrap(type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView());
+ for(int w = 0; w < 3; w++) { // Window size
+ for(int r = 0; r < (w == 0 ? 1 : 3); r++) { // Max backward references
+ for(int i = 0; i < 4; i++) { // Minimum interval length; 0 is NO_INTERVALS
+ System.err.println("Testing type " + type + ", n=" + n + ", w=" + w + ", r=" + r + ", i=" + i + "...");
+ final File basename = BVGraphTest.storeTempGraph(g, w, r, i, 0);
+ final Properties properties = new Properties();
+ final FileInputStream propertyFile = new FileInputStream(basename + BVGraph.PROPERTIES_EXTENSION);
+ properties.load(propertyFile);
+ propertyFile.close();
+ assertEquals(new File(basename + BVGraph.GRAPH_EXTENSION).length(),
+ (Long.parseLong(properties.getProperty("bitsforoutdegrees"))+
+ Long.parseLong(properties.getProperty("bitsforreferences"))+
+ Long.parseLong(properties.getProperty("bitsforblocks"))+
+ Long.parseLong(properties.getProperty("bitsforintervals"))+
+ Long.parseLong(properties.getProperty("bitsforresiduals")) + 7) / 8
+ );
+
+ assertEquals(g.numArcs(), Long.parseLong(properties.getProperty("copiedarcs")) + Long.parseLong(properties.getProperty("intervalisedarcs")) + Long.parseLong(properties.getProperty("residualarcs")));
+ ImmutableGraph h;
+
+ System.err.println("Testing offline...");
+ h = BVGraph.loadOffline(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ // We try to force deallocation of memory-mapped graphs
+ System.gc();
+
+ System.err.println("Testing mapped...");
+ h = BVGraph.loadMapped(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ System.err.println("Testing standard...");
+ h = BVGraph.load(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ deleteGraph(basename);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testLarge() throws IOException {
+ ASCIIGraph asciiGraph = ASCIIGraph.loadOnce(new GZIPInputStream(getClass().getResourceAsStream("cnr-2000.graph-txt.gz")));
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ assertEquals(asciiGraph, g);
+
+ asciiGraph = ASCIIGraph.loadOnce(new GZIPInputStream(getClass().getResourceAsStream("cnr-2000.graph-txt.gz")));
+ NodeIterator nodeIterator = asciiGraph.nodeIterator();
+ for(int i = 0; i < g.numNodes(); i++) {
+ nodeIterator.nextLong();
+ int d = (int)nodeIterator.outdegree();
+ assertEquals(d, g.outdegree(i));
+ LazyLongIterator asciiSuccessors = nodeIterator.successors(), successors = g.successors(i);
+ for(int j = 0; j <= d; j++) assertEquals(asciiSuccessors.nextLong(), successors.nextLong());
+ }
+
+ deleteGraph(path);
+ }
+
+ @Test
+ public void testStats() throws IOException {
+ String path = getGraphPath("cnr-2000");
+ ImmutableGraph g = ImmutableGraph.load(path);
+ // We overwrite the previously created temporary graph
+ BVGraph.store(g, path + "2");
+
+ // Test statistics
+ final int[] bin = new int[32];
+ NodeIterator nodeIterator = g.nodeIterator();
+ for(int i = 0; i < g.numNodes(); i++) {
+ nodeIterator.nextLong();
+ int d = (int)nodeIterator.outdegree();
+ long[][] a = nodeIterator.successorBigArray();
+ if (d > 0) {
+ for(int j = d - 1; j-- != 0;) bin[Fast.mostSignificantBit(LongBigArrays.get(a, j + 1) - LongBigArrays.get(a, j))]++;
+ final int msb = Fast.mostSignificantBit(Fast.int2nat(LongBigArrays.get(a, 0) - i));
+ if (msb >= 0) bin[msb]++;
+ }
+ }
+
+ Properties properties = new Properties();
+ final FileInputStream inStream = new FileInputStream(path + "2" + BVGraph.PROPERTIES_EXTENSION);
+ properties.load(inStream);
+ inStream.close();
+ String stats = properties.getProperty("successorexpstats");
+ String[] s = stats.split(",");
+ for(int i = s.length; i-- != 0;) assertEquals(bin[i], Integer.parseInt(s[i]));
+
+ long gap = 1, totGap = 0, tot = 0;
+ double totLogGap = 0;
+ for(int i = 0; i < s.length; i++) {
+ totGap += (gap * 2 + gap - 1) * Integer.parseInt(s[i]);
+ totLogGap += (Fast.log2(gap * 2 + gap + 1) - 1) * Integer.parseInt(s[i]);
+ tot += Integer.parseInt(s[i]);
+ gap *= 2;
+ }
+
+ assertEquals((double)totGap / (tot * 2), Double.parseDouble(properties.getProperty("successoravggap")), 1E-3);
+ assertEquals(totLogGap / tot, Double.parseDouble(properties.getProperty("successoravgloggap")), 1E-3);
+
+ assertEquals(new File(path + "2" + BVGraph.GRAPH_EXTENSION).length(),
+ (Long.parseLong(properties.getProperty("bitsforoutdegrees"))+
+ Long.parseLong(properties.getProperty("bitsforreferences"))+
+ Long.parseLong(properties.getProperty("bitsforblocks"))+
+ Long.parseLong(properties.getProperty("bitsforintervals"))+
+ Long.parseLong(properties.getProperty("bitsforresiduals")) + 7) / 8
+ );
+
+ assertEquals(g.numArcs(), Long.parseLong(properties.getProperty("copiedarcs")) + Long.parseLong(properties.getProperty("intervalisedarcs")) + Long.parseLong(properties.getProperty("residualarcs")));
+
+ // To test residual stats, we compress with no intervalisation etc.
+ BVGraph.store(g, path + "2", 0, 0, 0, 3, 0);
+
+ // Test statistics
+ Arrays.fill(bin, 0);
+ nodeIterator = g.nodeIterator();
+ for(int i = 0; i < g.numNodes(); i++) {
+ nodeIterator.nextLong();
+ int d = (int)nodeIterator.outdegree();
+ long[][] a = nodeIterator.successorBigArray();
+ if (d > 0) {
+ for(int j = d - 1; j-- != 0;) bin[Fast.mostSignificantBit(LongBigArrays.get(a, j + 1) - LongBigArrays.get(a, j))]++;
+ final int msb = Fast.mostSignificantBit(Fast.int2nat(LongBigArrays.get(a, 0) - i));
+ if (msb >= 0) bin[msb]++;
+ }
+ }
+
+ /* TODO: write test for residuals
+ stats = properties.getProperty("residualexpstats");
+ s = stats.split(",");
+ for(int i = s.length; i-- != 0;) assertEquals(bin[i], Integer.parseInt(s[i]));
+
+
+ gap = 1;
+ totGap = 0;
+ tot = 0;
+ totLogGap = 0;
+ for(int i = 0; i < s.length; i++) {
+ totGap += (gap * 2 + gap - 1) * Integer.parseInt(s[i]);
+ totLogGap += (Fast.log2(gap * 2 + gap + 1) - 1) * Integer.parseInt(s[i]);
+ tot += Integer.parseInt(s[i]);
+ gap *= 2;
+ }
+ assertEquals((double)totGap / (tot * 2), Double.parseDouble(properties.getProperty("residualavggap")), 1E-3);
+ assertEquals(totLogGap / tot, Double.parseDouble(properties.getProperty("residualavgloggap")), 1E-3);
+ */
+
+ deleteGraph(path);
+ deleteGraph(path + "2");
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/BuildHostMapTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/BuildHostMapTest.java
new file mode 100644
index 0000000..5f4f991
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/BuildHostMapTest.java
@@ -0,0 +1,143 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2010-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertArrayEquals;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.io.FastByteArrayOutputStream;
+import it.unimi.dsi.fastutil.longs.LongIterators;
+import it.unimi.dsi.logging.ProgressLogger;
+
+import java.io.BufferedReader;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.PrintStream;
+import java.io.StringReader;
+import java.net.URISyntaxException;
+
+import org.junit.Test;
+
+public class BuildHostMapTest extends WebGraphTestCase {
+
+ @Test
+ public void testSimpleNoLogger() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://a/b\nhttp://c\nhttp://a.b:81/\nhttp://c/c\nhttp://a:80/\nhttps://a/\nhttps://a.b"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, false, null);
+ mapDos.close();
+ hosts.close();
+ DataInputStream dis = new DataInputStream(new FastByteArrayInputStream(mapFbaos.array, 0, mapFbaos.length));
+ assertEquals(0, dis.readLong());
+ assertEquals(1, dis.readLong());
+ assertEquals(2, dis.readLong());
+ assertEquals(1, dis.readLong());
+ assertEquals(0, dis.readLong());
+ assertEquals(0, dis.readLong());
+ assertEquals(2, dis.readLong());
+ assertEquals(0, dis.available());
+ dis.close();
+ BufferedReader hostsIn = new BufferedReader(new InputStreamReader(new FastByteArrayInputStream(hostsStream.array, 0, hostsStream.length)));
+ assertEquals("a", hostsIn.readLine());
+ assertEquals("c", hostsIn.readLine());
+ assertEquals("a.b", hostsIn.readLine());
+ assertEquals(null, hostsIn.readLine());
+ hostsIn.close();
+ assertArrayEquals(new long[] { 3, 2, 2 }, LongIterators.unwrap(BinIO.asLongIterator(new DataInputStream(new FastByteArrayInputStream(countFbaos.array, 0, countFbaos.length)))));
+ }
+
+ @Test
+ public void testSimpleLogger() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://a/b\nhttp://c\nhttp://a.b/\nhttp://c/c\nhttp://a/\nhttps://a/\nhttps://a.b"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, false, new ProgressLogger());
+ mapDos.close();
+ hosts.close();
+ DataInputStream dis = new DataInputStream(new FastByteArrayInputStream(mapFbaos.array, 0, mapFbaos.length));
+ assertEquals(0, dis.readLong());
+ assertEquals(1, dis.readLong());
+ assertEquals(2, dis.readLong());
+ assertEquals(1, dis.readLong());
+ assertEquals(0, dis.readLong());
+ assertEquals(0, dis.readLong());
+ assertEquals(2, dis.readLong());
+ assertEquals(0, dis.available());
+ dis.close();
+ BufferedReader hostsIn = new BufferedReader(new InputStreamReader(new FastByteArrayInputStream(hostsStream.array, 0, hostsStream.length)));
+ assertEquals("a", hostsIn.readLine());
+ assertEquals("c", hostsIn.readLine());
+ assertEquals("a.b", hostsIn.readLine());
+ assertEquals(null, hostsIn.readLine());
+ hostsIn.close();
+ assertArrayEquals(new long[] { 3, 2, 2 }, LongIterators.unwrap(BinIO.asLongIterator(new DataInputStream(new FastByteArrayInputStream(countFbaos.array, 0, countFbaos.length)))));
+ }
+
+ @Test
+ public void testTopPrivateDomainNoLogger() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://b.a.co.uk/b\nhttp://c.a.co.uk\nhttp://a.b.co.uk\nhttp://159.149.130.49/"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, true, null);
+ mapDos.close();
+ hosts.close();
+ DataInputStream dis = new DataInputStream(new FastByteArrayInputStream(mapFbaos.array, 0, mapFbaos.length));
+ assertEquals(0, dis.readLong());
+ assertEquals(0, dis.readLong());
+ assertEquals(1, dis.readLong());
+ assertEquals(2, dis.readLong());
+ assertEquals(0, dis.available());
+ dis.close();
+ BufferedReader hostsIn = new BufferedReader(new InputStreamReader(new FastByteArrayInputStream(hostsStream.array, 0, hostsStream.length)));
+ assertEquals("a.co.uk", hostsIn.readLine());
+ assertEquals("b.co.uk", hostsIn.readLine());
+ assertEquals("159.149.130.49", hostsIn.readLine());
+ assertEquals(null, hostsIn.readLine());
+ hostsIn.close();
+ assertArrayEquals(new long[] { 2, 1, 1 }, LongIterators.unwrap(BinIO.asLongIterator(new DataInputStream(new FastByteArrayInputStream(countFbaos.array, 0, countFbaos.length)))));
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testMalformed() throws IOException, URISyntaxException {
+ BufferedReader br = new BufferedReader(new StringReader("http://a/b\nhttp://c\nhttp//a.b/\nhttp://c/c\nhttp://a/\nhttps://a/\nhttps://a.b"));
+ FastByteArrayOutputStream mapFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream countFbaos = new FastByteArrayOutputStream();
+ FastByteArrayOutputStream hostsStream = new FastByteArrayOutputStream();
+ PrintStream hosts = new PrintStream(hostsStream);
+ DataOutputStream mapDos = new DataOutputStream(mapFbaos);
+ DataOutputStream countDos = new DataOutputStream(countFbaos);
+ BuildHostMap.run(br, hosts, mapDos, countDos, false, null);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/EFGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/EFGraphTest.java
new file mode 100644
index 0000000..3e31257
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/EFGraphTest.java
@@ -0,0 +1,173 @@
+package it.unimi.dsi.big.webgraph;
+
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.nio.ByteOrder;
+import java.util.Properties;
+
+import org.junit.Test;
+
+public class EFGraphTest extends WebGraphTestCase {
+
+ public static File storeTempGraph(final ImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(EFGraphTest.class.getSimpleName(), "test");
+ EFGraph.store(g, basename.toString());
+ return basename;
+ }
+
+ public static File storeTempGraph(final ImmutableGraph g, final int log2Quantum, final int cacheSize, final ByteOrder byteOrder) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(EFGraphTest.class.getSimpleName(), "test");
+ EFGraph.store(g, basename.toString(), log2Quantum, cacheSize, byteOrder, null);
+ return basename;
+ }
+
+ @Test
+ public void testCompression() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n = 1; n < 10; n++) { // Graph construction parameter
+ for(int type = 1; type < 3; type++) {
+ final ImmutableGraph g = ImmutableGraph.wrap(type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView());
+
+ for(ByteOrder byteOrder: new ByteOrder[] { ByteOrder.LITTLE_ENDIAN, ByteOrder.BIG_ENDIAN }) {
+ for(int cacheSize = 1; cacheSize < 128 * 1024; cacheSize *= 2) {
+ for(int log2Quantum = 0; log2Quantum < 8; log2Quantum++) {
+ System.err.println("Testing type " + type + ", n=" + n + ", byteOrder=" + byteOrder + ", cacheSize=" + cacheSize + ", log2Quantum=" + log2Quantum + "...");
+ final File basename = EFGraphTest.storeTempGraph(g, log2Quantum, cacheSize, byteOrder);
+ final Properties properties = new Properties();
+ final FileInputStream propertyFile = new FileInputStream(basename + EFGraph.PROPERTIES_EXTENSION);
+ properties.load(propertyFile);
+ propertyFile.close();
+
+ //System.err.println(properties);
+
+ ImmutableGraph h;
+
+ System.err.println("Testing standard...");
+ h = EFGraph.load(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ System.err.println("Testing mapped...");
+ h = EFGraph.loadMapped(basename.toString());
+ WebGraphTestCase.assertGraph(h);
+ assertEquals(g, h);
+
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testErdosRenyi() throws IOException {
+ for(int size: new int[] { 10, 100, 1000, 10000 }) {
+ for(boolean upperBound: new boolean[] { false, true }) {
+ final String basename = File.createTempFile(getClass().getSimpleName(), "test").toString();
+ final ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(new ErdosRenyiGraph(size, .001, 0, false)).immutableView());
+ EFGraph.store(g, upperBound ? size * size : size, basename, 3, 1024, ByteOrder.nativeOrder(), null);
+ final EFGraph efGraph = (EFGraph)ImmutableGraph.load(basename);
+ assertEquals(g, efGraph);
+
+ for(int i = 0; i < size; i++) {
+ for(int j = i + 1; j < size; j++) {
+ LongOpenHashSet a = new LongOpenHashSet();
+ LongOpenHashSet b = new LongOpenHashSet();
+ LazyLongIterator sa = g.successors(i);
+ LazyLongIterator sb = g.successors(j);
+ for(long s; (s = sa.nextLong()) != -1;) a.add(s);
+ for(long t; (t = sb.nextLong()) != -1;) b.add(t);
+
+ a.retainAll(b);
+ b.clear();
+ final LazyLongSkippableIterator sx = efGraph.successors(i);
+ final LazyLongSkippableIterator sy = efGraph.successors(j);
+
+ long x = sx.nextLong();
+ long y = sy.nextLong();
+
+ while(x != -1 && x != LazyLongSkippableIterator.END_OF_LIST && y != -1 && y != LazyLongSkippableIterator.END_OF_LIST) {
+ if (x == y) {
+ b.add (x);
+ x = sx.nextLong();
+ }
+ else if(x < y) x = sx.skipTo(y);
+ else y = sy.skipTo(x);
+ }
+
+ assertEquals(a, b);
+ }
+ }
+
+ new File(basename).delete();
+ new File(basename + EFGraph.GRAPH_EXTENSION).delete();
+ new File(basename + EFGraph.OFFSETS_EXTENSION).delete();
+ new File(basename + EFGraph.PROPERTIES_EXTENSION).delete();
+ }
+ }
+ }
+
+
+ @Test
+ public void testSkipFirst() throws IOException {
+ final String basename = File.createTempFile(getClass().getSimpleName(), "test").toString();
+ final ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(new ErdosRenyiGraph(1000, .01, 0, false)).immutableView());
+ EFGraph.store(g, 1000, basename, 3, 1024, ByteOrder.nativeOrder(), null);
+ final EFGraph efGraph = (EFGraph)ImmutableGraph.load(basename);
+ assertEquals(g, efGraph);
+
+ for(int i = 0; i < 1000; i++) {
+ for(int j = 0; j < 1000; j++) {
+ LazyLongSkippableIterator sa = efGraph.successors(i);
+ final long x = sa.skipTo(j);
+ sa = efGraph.successors(i);
+ for(;;) {
+ final long y = sa.nextLong();
+ if (y >= j) {
+ assertEquals(y, x);
+ break;
+ }
+ else if (y == -1) {
+ if (x != LazyLongSkippableIterator.END_OF_LIST) fail();
+ break;
+ }
+ }
+ }
+ }
+ new File(basename).delete();
+ new File(basename + EFGraph.GRAPH_EXTENSION).delete();
+ new File(basename + EFGraph.OFFSETS_EXTENSION).delete();
+ new File(basename + EFGraph.PROPERTIES_EXTENSION).delete();
+
+ }
+}
+
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/IncrementalImmutableSequentialGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/IncrementalImmutableSequentialGraphTest.java
new file mode 100644
index 0000000..c491bf8
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/IncrementalImmutableSequentialGraphTest.java
@@ -0,0 +1,64 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2013-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+
+import org.junit.Test;
+
+public class IncrementalImmutableSequentialGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testErdosRenyi() throws IOException, InterruptedException, ExecutionException {
+ final String basename = File.createTempFile(IncrementalImmutableSequentialGraph.class.getSimpleName() + "-", "-temp").toString();
+ for(int size: new int[] { 10, 100, 1000, 10000 }) {
+ final ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(new ErdosRenyiGraph(size, .001, 0, false)).immutableView());
+ final IncrementalImmutableSequentialGraph incrementalImmutableSequentialGraph = new IncrementalImmutableSequentialGraph();
+ final Future<Void> future = Executors.newSingleThreadExecutor().submit(new Callable<Void>() {
+ @Override
+ public Void call() throws IOException {
+ BVGraph.store(incrementalImmutableSequentialGraph, basename);
+ return null;
+ }
+ });
+
+ for(NodeIterator nodeIterator = g.nodeIterator(); nodeIterator.hasNext();) {
+ nodeIterator.nextLong();
+ incrementalImmutableSequentialGraph.add(nodeIterator.successorBigArray(), 0, nodeIterator.outdegree());
+ }
+
+ incrementalImmutableSequentialGraph.add(IncrementalImmutableSequentialGraph.END_OF_GRAPH);
+
+ future.get();
+ assertEquals(g, ImmutableGraph.load(basename));
+ }
+
+ deleteGraph(basename);
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/MaskedLongIteratorTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/MaskedLongIteratorTest.java
new file mode 100644
index 0000000..1dd4c68
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/MaskedLongIteratorTest.java
@@ -0,0 +1,107 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.longs.LongArrayList;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.longs.LongIterators;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+
+import org.junit.Test;
+
+public class MaskedLongIteratorTest {
+
+ public void test(final int length, final int numberOfZeroes) {
+ final XoRoShiRo128PlusRandom random = new XoRoShiRo128PlusRandom(0);
+ // Reads the length and number of 0s
+ final long x[] = new long[length];
+ boolean keep[] = new boolean[length];
+ LongArrayList res = new LongArrayList();
+ LongArrayList blocks = new LongArrayList();
+ int i, j;
+ long p = 0;
+ boolean dep;
+
+ // Generate
+ for (i = 0; i < length; i++) p = x[i] = p + (random.nextLong() & 0x7FFFFFFFFFFFFFFFL) % 1000;
+ for (i = 0; i < length-numberOfZeroes; i++) keep[i] = true;
+ for (i = 0; i < length; i++) {
+ j = i + (int)(Math.random() * (length - i));
+ dep = keep[i]; keep[i] = keep[j]; keep[j] = dep;
+ }
+
+ // Compute result
+ for (i = 0; i < length; i++) if (keep[i]) res.add(x[i]);
+ res.trim();
+ long result[] = res.elements();
+
+ // Prepare blocks
+ boolean lookAt = true;
+ int curr = 0;
+ for (i = 0; i < length; i++) {
+ if (keep[i] == lookAt) curr++;
+ else {
+ blocks.add(curr);
+ lookAt = !lookAt;
+ curr = 1;
+ }
+ }
+ blocks.trim();
+ final long bs[] = blocks.elements();
+
+ // Output
+ System.out.println("GENERATED:");
+ for (i = 0; i < length; i++) {
+ if (keep[i]) System.out.print("*");
+ System.out.print(x[i] + " ");
+ }
+ System.out.println("\nBLOCKS:");
+ for (i = 0; i < bs.length; i++)
+ System.out.print(bs[i] + " ");
+ System.out.println("\nEXPECTED RESULT:");
+ for (i = 0; i < result.length; i++)
+ System.out.print(result[i] + " ");
+ System.out.println();
+
+ LazyLongIterator maskedIterator = new MaskedLongIterator(bs, LazyLongIterators.lazy(new LongArrayList(x).iterator()));
+
+ for (i = 0; i < result.length; i++) assertEquals(i + ": ", result[i], maskedIterator.nextLong());
+ assertEquals(-1, maskedIterator.nextLong());
+
+ // Test skips
+ maskedIterator = new MaskedLongIterator(bs, LazyLongIterators.lazy(new LongArrayList(x).iterator()));
+ LongIterator results = LongIterators.wrap(result);
+
+ for (i = 0; i < result.length; i++) {
+ int toSkip = random.nextInt(5);
+ assertEquals(results.skip(toSkip), maskedIterator.skip(toSkip));
+ if (results.hasNext()) assertEquals(i + ": ", results.nextLong(), maskedIterator.nextLong());
+ }
+ assertEquals(-1, maskedIterator.nextLong());
+
+ }
+
+ @Test
+ public void test() {
+ for(int i = 0; i < 20; i++)
+ for(int j = 0; j < 20; j++)
+ test(i, j);
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/MergedLongIteratorTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/MergedLongIteratorTest.java
new file mode 100644
index 0000000..24c9745
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/MergedLongIteratorTest.java
@@ -0,0 +1,64 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.longs.LongAVLTreeSet;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+
+import java.util.Random;
+
+import org.junit.Test;
+
+public class MergedLongIteratorTest {
+
+ public void testMerge(int n0, int n1) {
+ long x0[] = new long[n0];
+ long x1[] = new long[n1];
+ int i;
+ long p = 0;
+ final Random random = new Random();
+
+ // Generate
+ for (i = 0; i < n0; i++) p = x0[i] = p + random.nextInt(10);
+ p = 0;
+ for (i = 0; i < n1; i++) p = x1[i] = p + random.nextInt(10);
+
+ LongAVLTreeSet s0 = new LongAVLTreeSet(x0);
+ LongAVLTreeSet s1 = new LongAVLTreeSet(x1);
+ LongAVLTreeSet res = new LongAVLTreeSet(s0);
+ res.addAll(s1);
+
+ MergedLongIterator m = new MergedLongIterator(LazyLongIterators.lazy(s0.iterator()), LazyLongIterators.lazy(s1.iterator()));
+ LongIterator it = res.iterator();
+
+ long x;
+ while ((x = m.nextLong()) != -1) assertEquals(it.nextLong(), x);
+ assertEquals(Boolean.valueOf(it.hasNext()), Boolean.valueOf(m.nextLong() != -1));
+ }
+
+ @Test
+ public void testMerge() {
+ for(int i = 0; i < 10; i++) {
+ testMerge(i, i);
+ testMerge(i, i + 1);
+ testMerge(i, i * 2);
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ScatteredArcsASCIIGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ScatteredArcsASCIIGraphTest.java
new file mode 100644
index 0000000..59239f2
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ScatteredArcsASCIIGraphTest.java
@@ -0,0 +1,154 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.objects.Object2LongArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2LongFunction;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+
+import org.junit.Test;
+
+public class ScatteredArcsASCIIGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testConstructor() throws UnsupportedEncodingException, IOException {
+
+ ScatteredArcsASCIIGraph g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("-1 15\n15 2\n2 -1\nOOPS!\n-1 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2},{1,2},{2,0}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { -1, 15, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 \t 2\n2 0\n2 1".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 0\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 2, 0, 1 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("1 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 2, 1 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n\n2 1".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n# comment\n2\n2 1\n2 X".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 \t 2\n2 0\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 0\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{0,2}}).immutableView()), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 2, 0, 1 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("1 2".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView()), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(2, new int[][] {{0,1}}).immutableView()), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 2, 1 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n\n2 1".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("\n0 1\n# comment\n2\n2 1\n2 X".getBytes("ASCII")), true, false, 1);
+ assertEquals(Transform.symmetrize(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView()), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ g = new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 0\n0 1\n0 2\n2 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")), true, true, 2);
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), g.ids);
+
+ }
+
+
+ @Test
+ public void testConstructorWithStrings() throws UnsupportedEncodingException, IOException {
+ Object2LongFunction<String> map = new Object2LongArrayMap<>();
+ map.defaultReturnValue(-1);
+
+ map.clear();
+ map.put("0", 0);
+ map.put("1", 1);
+ map.put("2", 2);
+ assertEquals(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView()), new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2\n1 0\n1 2\n2 0\n2 1".getBytes("ASCII")), map, null, 3));
+
+ map.clear();
+ map.put("-1", 1);
+ map.put("15", 0);
+ map.put("2", 2);
+ final ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {{0,2},{1,0},{1,2},{2,1}}).immutableView());
+ assertEquals(g, new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("-1 15\n15 2\n2 -1\nOOPS!\n-1 2".getBytes("ASCII")), map, null, 3));
+ assertEquals(g, new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("-1 15\n15 2\n2 -1\nOOPS!\n-1 2\n32 2\n2 32".getBytes("ASCII")), map, null, 3));
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testTargetOutOfRange() throws UnsupportedEncodingException, IOException {
+ Object2LongFunction<String> map = new Object2LongArrayMap<>();
+ map.defaultReturnValue(-1);
+ map.put("0", 0);
+ map.put("1", 1);
+ map.put("2", 2);
+ assertEquals(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView()), new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n0 2".getBytes("ASCII")), map, null, 2));
+ }
+
+ @Test(expected=IllegalArgumentException.class)
+ public void testSourceOutOfRange() throws UnsupportedEncodingException, IOException {
+ Object2LongFunction<String> map = new Object2LongArrayMap<>();
+ map.defaultReturnValue(-1);
+ map.put("0", 0);
+ map.put("1", 1);
+ map.put("2", 2);
+ assertEquals(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView()), new ScatteredArcsASCIIGraph(new FastByteArrayInputStream("0 1\n2 0".getBytes("ASCII")), map, null, 2));
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ShiftedByOneArcListASCIIGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ShiftedByOneArcListASCIIGraphTest.java
new file mode 100644
index 0000000..b54b663
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/ShiftedByOneArcListASCIIGraphTest.java
@@ -0,0 +1,103 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.fastutil.io.FastByteArrayInputStream;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.nio.charset.StandardCharsets;
+
+import org.apache.commons.io.FileUtils;
+import org.junit.Test;
+
+public class ShiftedByOneArcListASCIIGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testLoadOnce() throws UnsupportedEncodingException, IOException {
+
+ ArcListASCIIGraph g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("1 3\n1 2\n2 1\n2 3\n3 1\n3 2".getBytes("ASCII")));
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("3 1\n3 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("2 3".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("3 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ g = ShiftedByOneArcListASCIIGraph.loadOnce(new FastByteArrayInputStream("1 2\n3 2".getBytes("ASCII")));
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ }
+
+ @Test
+ public void testLoad() throws UnsupportedEncodingException, IOException {
+ File file = File.createTempFile(ShiftedByOneArcListASCIIGraphTest.class.getSimpleName(), ".txt");
+ file.deleteOnExit();
+ FileUtils.writeStringToFile(file, "1 3\n1 2\n2 1\n2 3\n3 1\n3 2", StandardCharsets.US_ASCII);
+ ImmutableGraph g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 1\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "2 3", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "1 2\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.load(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ }
+
+ @Test
+ public void testLoadMapped() throws IOException {
+ File file = File.createTempFile(ShiftedByOneArcListASCIIGraphTest.class.getSimpleName(), ".txt");
+ file.deleteOnExit();
+ FileUtils.writeStringToFile(file, "1 3\n1 2\n2 1\n2 3\n3 1\n3 2", StandardCharsets.US_ASCII);
+ ImmutableGraph g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(ArrayListMutableGraph.newCompleteGraph(3, false).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 1\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,0},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "2 3", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{1,2}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+
+ FileUtils.writeStringToFile(file, "1 2\n3 2", StandardCharsets.US_ASCII);
+ g = ShiftedByOneArcListASCIIGraph.loadMapped(file.toString());
+ assertEquals(new ArrayListMutableGraph(3, new int[][] {{0,1},{2,1}}).immutableView(), new ArrayListMutableGraph(ImmutableGraph.wrap(g)).immutableView());
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/TransformTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/TransformTest.java
new file mode 100644
index 0000000..aa13adf
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/TransformTest.java
@@ -0,0 +1,457 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.big.webgraph.labelling.BitStreamArcLabelledGraphTest;
+import it.unimi.dsi.big.webgraph.labelling.GammaCodedIntLabel;
+import it.unimi.dsi.big.webgraph.labelling.Label;
+import it.unimi.dsi.big.webgraph.labelling.LabelSemiring;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.ImmutableSequentialGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.File;
+import java.io.IOException;
+
+import org.junit.Test;
+
+public class TransformTest extends WebGraphTestCase {
+
+ @Test
+ public void testMapExpand() throws IOException {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(4, false).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 0, 2, 4, 6 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(7, new it.unimi.dsi.webgraph.Transform.ArcFilter() {
+ @Override
+ public boolean accept(int i, int j) {
+ return i % 2 == 0 && j % 2 == 0 && i != j;
+ }
+
+ }).immutableView()), g2);
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 0, 3, 3 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(4, new int[][] { { 0, 3 }, { 3, 0 }, { 3, 3 } }).immutableView()), g2);
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 4, 4, 4 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(5, new int[][] { { 4, 4 } }).immutableView()), g2);
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 6, 5, 4 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(7, new int[][] { { 6, 5 }, { 5, 4 }, { 4, 6 } }).immutableView()), g2);
+
+ }
+
+ @Test
+ public void testMapPermutation() throws IOException {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 2, 1, 0 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 0, 2 }, { 2, 1 }, { 1, 0 } }).immutableView()), g2);
+ }
+
+ @Test
+ public void testInjective() throws IOException {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 1, 2 }, { 0, 2 } }).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 2, -1, 0 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 2, 0 } }).immutableView()), g2);
+ }
+
+ @Test
+ public void testMapCollapse() throws IOException {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { 0, 0, 0 }), 10);
+ assertGraph(g2);
+ assertEquals(1, g2.numNodes());
+ }
+
+ @Test
+ public void testMapClear() throws IOException {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { -1, -1, -1 }), 10);
+ assertGraph(g2);
+ assertEquals(0, g2.numNodes());
+ }
+
+ @Test
+ public void testMapKeepMiddle() throws IOException {
+ ImmutableGraph g;
+ ImmutableGraph g2;
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { -1, 0, -1 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(1, false).immutableView()), g2);
+
+ g = ImmutableGraph.wrap(ArrayListMutableGraph.newDirectedCycle(3).immutableView());
+ g2 = Transform.mapOffline(g, LongBigArrays.wrap(new long[] { -1, 2, -1 }), 10);
+ assertGraph(g2);
+ assertEquals(ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] {}).immutableView()), g2);
+ }
+
+ @Test
+ public void testLex() {
+ ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 0, 2 }, { 1, 1 }, { 1, 2 }, { 2, 0 }, { 2, 1 }, { 2, 2 } }).immutableView());
+ long[][] p = Transform.lexicographicalPermutation(g);
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 0, 1, 2 }), p);
+
+ g = ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 0, 0 }, { 0, 1 }, { 0, 2 }, { 1, 1 }, { 1, 2 }, { 2, 2 } }).immutableView());
+ p = Transform.lexicographicalPermutation(g);
+ assertArrayEquals(LongBigArrays.wrap(new long[] { 2, 1, 0 }), p);
+ }
+
+
+ @Test
+ public void testFilters() throws IllegalArgumentException, SecurityException {
+ ImmutableGraph graph = ImmutableGraph.wrap(new ArrayListMutableGraph(6,
+ new int[][] {
+ { 0, 1 },
+ { 0, 2 },
+ { 1, 1 },
+ { 1, 3 },
+ { 2, 1 },
+ { 4, 5 },
+ }
+ ).immutableView());
+
+ ImmutableGraph filtered = Transform.filterArcs(graph, new Transform.ArcFilter() {
+ @Override
+ public boolean accept(long i, long j) {
+ return i < j;
+ }
+ }, null);
+
+ assertGraph(filtered);
+
+ NodeIterator nodeIterator = filtered.nodeIterator();
+ LazyLongIterator iterator;
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(0, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(1, iterator.nextLong());
+ assertEquals(2, iterator.nextLong());
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(1, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(3, iterator.nextLong());
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(2, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(3, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextLong());
+ assertEquals(4, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(5, iterator.nextLong());
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(5, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextLong());
+ assertFalse(nodeIterator.hasNext());
+ }
+
+
+ @Test
+ public void testLabelledFilters() throws IllegalArgumentException, SecurityException, IOException {
+ IntegerTriplesArcLabelledImmutableGraph graph = new IntegerTriplesArcLabelledImmutableGraph(
+ new int[][] {
+ { 0, 1, 2 },
+ { 0, 2, 3 },
+ { 1, 1, 4 },
+ { 1, 3, 5 },
+ { 2, 1, 6 },
+ { 4, 5, 7 },
+ }
+ );
+
+ ArcLabelledImmutableGraph filtered = Transform.filterArcs(graph, new Transform.LabelledArcFilter() {
+ @Override
+ public boolean accept(long i, long j, Label label) {
+ return i < j;
+ }
+ }, null);
+
+ ArcLabelledNodeIterator nodeIterator = filtered.nodeIterator();
+ LabelledArcIterator iterator;
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(0, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(1, iterator.nextLong());
+ assertEquals(2, iterator.label().getInt());
+ assertEquals(2, iterator.nextLong());
+ assertEquals(3, iterator.label().getInt());
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(1, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(3, iterator.nextLong());
+ assertEquals(5, iterator.label().getInt());
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(2, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(3, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextLong());
+ assertEquals(4, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(5, iterator.nextLong());
+ assertEquals(7, iterator.label().getInt());
+ assertEquals(-1, iterator.nextLong());
+ assertTrue(nodeIterator.hasNext());
+ assertEquals(5, nodeIterator.nextLong());
+ iterator = nodeIterator.successors();
+ assertEquals(-1, iterator.nextLong());
+ assertFalse(nodeIterator.hasNext());
+
+ File file = BitStreamArcLabelledGraphTest.storeTempGraph(graph);
+ ArcLabelledImmutableGraph graph2 = ArcLabelledImmutableGraph.load(file.toString());
+
+ filtered = Transform.filterArcs(graph2, new Transform.LabelledArcFilter() {
+ @Override
+ public boolean accept(long i, long j, Label label) {
+ return i < j;
+ }
+ }, null);
+
+ iterator = filtered.successors(0);
+ assertEquals(1, iterator.nextLong());
+ assertEquals(2, iterator.label().getInt());
+ assertEquals(2, iterator.nextLong());
+ assertEquals(3, iterator.label().getInt());
+ assertEquals(-1, iterator.nextLong());
+ iterator = filtered.successors(1);
+ assertEquals(3, iterator.nextLong());
+ assertEquals(5, iterator.label().getInt());
+ assertEquals(-1, iterator.nextLong());
+ iterator = filtered.successors(2);
+ assertEquals(-1, iterator.nextLong());
+ iterator = filtered.successors(3);
+ assertEquals(-1, iterator.nextLong());
+ iterator = filtered.successors(4);
+ assertEquals(5, iterator.nextLong());
+ assertEquals(7, iterator.label().getInt());
+ assertEquals(-1, iterator.nextLong());
+ iterator = filtered.successors(5);
+ assertEquals(-1, iterator.nextLong());
+
+ }
+
+ @Test
+ public void testCompose() {
+ ImmutableGraph g0 = ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 0, 1 }, { 0, 2 } }).immutableView());
+ ImmutableGraph g1 = ImmutableGraph.wrap(new ArrayListMutableGraph(3, new int[][] { { 1, 0 }, { 2, 1 } }).immutableView());
+
+ ImmutableGraph c = Transform.compose(g0, g1);
+
+ NodeIterator n = c.nodeIterator();
+ assertTrue(n.hasNext());
+ assertEquals(0, n.nextLong());
+ LazyLongIterator i = n.successors();
+ assertEquals(0, i.nextLong());
+ assertEquals(1, i.nextLong());
+ assertEquals(-1, i.nextLong());
+ assertEquals(1, n.nextLong());
+ i = n.successors();
+ assertEquals(-1, i.nextLong());
+ assertTrue(n.hasNext());
+ assertEquals(2, n.nextLong());
+ i = n.successors();
+ assertEquals(-1, i.nextLong());
+ assertFalse(n.hasNext());
+
+ assertEquals(c, c.copy());
+ assertEquals(c.copy(), c);
+ }
+
+ @Test
+ public void testLabelledCompose() throws IllegalArgumentException, SecurityException, IOException {
+ File file = BitStreamArcLabelledGraphTest.storeTempGraph(new IntegerTriplesArcLabelledImmutableGraph(
+ new int[][] {
+ { 0, 1, 2 },
+ { 0, 2, 10 },
+ { 0, 3, 1 },
+ { 1, 2, 4 },
+ { 3, 2, 1 },
+ }
+ ));
+ ArcLabelledImmutableGraph graph = ArcLabelledImmutableGraph.load(file.toString());
+
+ ArcLabelledImmutableGraph composed = Transform.compose(graph, graph, new LabelSemiring() {
+ private final GammaCodedIntLabel one = new GammaCodedIntLabel("FOO");
+ private final GammaCodedIntLabel zero = new GammaCodedIntLabel("FOO");
+ {
+ one.value = 0;
+ zero.value = Integer.MAX_VALUE;
+ }
+
+ @Override
+ public Label add(Label first, Label second) {
+ GammaCodedIntLabel result = new GammaCodedIntLabel("FOO");
+ result.value = Math.min(first.getInt(), second.getInt());
+ return result;
+ }
+
+ @Override
+ public Label multiply(Label first, Label second) {
+ GammaCodedIntLabel result = new GammaCodedIntLabel("FOO");
+ result.value = first.getInt() + second.getInt();
+ return result;
+ }
+
+ @Override
+ public Label one() {
+ return one;
+ }
+
+ @Override
+ public Label zero() {
+ return zero;
+ }
+ });
+
+ ArcLabelledNodeIterator n = composed.nodeIterator();
+ assertTrue(n.hasNext());
+ assertEquals(0, n.nextLong());
+ LabelledArcIterator i = n.successors();
+ assertEquals(2, i.nextLong());
+ assertEquals(2, i.label().getInt());
+ assertEquals(-1, i.nextLong());
+ assertEquals(1, n.nextLong());
+ i = n.successors();
+ assertEquals(-1, i.nextLong());
+ assertTrue(n.hasNext());
+ assertEquals(2, n.nextLong());
+ i = n.successors();
+ assertEquals(-1, i.nextLong());
+ assertTrue(n.hasNext());
+ assertEquals(3, n.nextLong());
+ i = n.successors();
+ assertEquals(-1, i.nextLong());
+ assertFalse(n.hasNext());
+ }
+
+ @Test
+ public void testMapOffline() throws IOException {
+ ImmutableSequentialGraph g = new ErdosRenyiGraph(10, .5, 0, false);
+ long[][] perm = Util.identity((long)g.numNodes());
+ LongBigArrays.shuffle(perm, new XoRoShiRo128PlusRandom(0));
+ long[][] inv = Util.invertPermutation(perm);
+ ImmutableGraph gm = Transform.mapOffline(ImmutableGraph.wrap(new ArrayListMutableGraph(g).immutableView()), perm, 100);
+ assertEquals(gm, Transform.mapOffline(ImmutableGraph.wrap(g), perm, 100));
+ assertEquals(ImmutableGraph.wrap(g), Transform.mapOffline(Transform.mapOffline(ImmutableGraph.wrap(g), perm, 100), inv, 100));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity((long)g.numNodes());
+ LongBigArrays.set(perm, LongBigArrays.length(perm) - 1, -1);
+ gm = Transform.mapOffline(ImmutableGraph.wrap(new ArrayListMutableGraph(g).immutableView()), perm, 100);
+ assertEquals(gm, Transform.mapOffline(ImmutableGraph.wrap(g), perm, 100));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity((long)g.numNodes());
+ LongBigArrays.shuffle(perm, new XoRoShiRo128PlusRandom(0));
+ LongBigArrays.set(perm, 0, -1); LongBigArrays.set(perm, LongBigArrays.length(perm) / 2, -1);
+ gm = Transform.mapOffline(ImmutableGraph.wrap(new ArrayListMutableGraph(g).immutableView()), perm, 100);
+ assertEquals(gm, Transform.mapOffline(ImmutableGraph.wrap(g), perm, 100));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity((long)g.numNodes());
+ LongBigArrays.set(perm, 1, 0); LongBigArrays.set(perm, LongBigArrays.length(perm) - 2, LongBigArrays.length(perm) - 1);
+ gm = Transform.mapOffline(ImmutableGraph.wrap(new ArrayListMutableGraph(g).immutableView()), perm, 100);
+ assertEquals(gm, Transform.mapOffline(ImmutableGraph.wrap(g), perm, 100));
+ assertEquals(gm, gm.copy());
+
+ g = new ErdosRenyiGraph(1000, .2, 0, false);
+ perm = Util.identity((long)g.numNodes());
+ LongBigArrays.shuffle(perm, new XoRoShiRo128PlusRandom(0));
+ inv = Util.invertPermutation(perm);
+ gm = Transform.mapOffline(ImmutableGraph.wrap(new ArrayListMutableGraph(g).immutableView()), perm, 100);
+ assertEquals(gm, Transform.mapOffline(ImmutableGraph.wrap(g), perm, 1000000));
+ assertEquals(ImmutableGraph.wrap(g), Transform.mapOffline(Transform.mapOffline(ImmutableGraph.wrap(g), perm, 10000), inv, 10000));
+ assertEquals(gm, gm.copy());
+
+ perm = Util.identity((long)g.numNodes());
+ LongBigArrays.shuffle(perm, new XoRoShiRo128PlusRandom(0));
+ LongBigArrays.set(perm, 0, -1); LongBigArrays.set(perm, LongBigArrays.length(perm) / 2, -1);
+ LongBigArrays.set(perm, LongBigArrays.length(perm) / 4, -1); LongBigArrays.set(perm, 3 * LongBigArrays.length(perm) / 4, -1);
+ gm = Transform.mapOffline(ImmutableGraph.wrap(new ArrayListMutableGraph(g).immutableView()), perm, 100);
+ assertEquals(gm, Transform.mapOffline(ImmutableGraph.wrap(g), perm, 1000000));
+ assertEquals(gm, gm.copy());
+ }
+
+ @Test
+ public void testSymmetrizeOffline() throws IOException {
+ ImmutableGraph g = ImmutableGraph.wrap(new ErdosRenyiGraph(5, .5, 0, false));
+ ImmutableGraph gs = Transform.symmetrizeOffline(g, 10);
+ assertEquals(gs, Transform.symmetrizeOffline(g, 5));
+ assertEquals(gs, Transform.symmetrizeOffline(Transform.symmetrizeOffline(g, 100), 5));
+ assertEquals(gs, gs.copy());
+
+ g = ImmutableGraph.wrap(new ErdosRenyiGraph(100, .5, 0, false));
+ gs = Transform.symmetrizeOffline(g, 100);
+ assertEquals(gs, Transform.symmetrizeOffline(g, 1000));
+ assertEquals(gs, Transform.symmetrizeOffline(Transform.symmetrizeOffline(g, 100), 10000));
+ assertEquals(gs, gs.copy());
+
+ g = ImmutableGraph.wrap(new ErdosRenyiGraph(1000, .2, 0, false));
+ gs = Transform.symmetrizeOffline(g, 1000);
+ assertEquals(gs, Transform.symmetrizeOffline(g, 10000));
+ assertEquals(gs, Transform.symmetrizeOffline(Transform.symmetrizeOffline(g, 10000), 100000));
+ assertEquals(gs, gs.copy());
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/WebGraphTestCase.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/WebGraphTestCase.java
new file mode 100644
index 0000000..127c03a
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/WebGraphTestCase.java
@@ -0,0 +1,187 @@
+package it.unimi.dsi.big.webgraph;
+
+/*
+ * Copyright (C) 2003-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.big.webgraph.labelling.Label;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+import it.unimi.dsi.webgraph.BVGraph;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+/** A JUnit test case providing additional assertions
+ * for {@linkplain it.unimi.dsi.big.webgraph.ImmutableGraph immutable graphs}.
+ */
+
+public abstract class WebGraphTestCase {
+
+ private static void copy(InputStream in, OutputStream out) throws IOException {
+ int c;
+ while((c = in.read()) != -1) out.write(c);
+ out.close();
+ }
+
+ /** Returns a path to a temporary graph that copies a resource graph with given basename.
+ *
+ * @param basename the basename.
+ * @return the graph.
+ * @throws IOException
+ */
+ public String getGraphPath(final String basename) throws IOException {
+ File file = File.createTempFile(getClass().getSimpleName(), "graph");
+ file.delete();
+
+ copy(getClass().getResourceAsStream(basename + BVGraph.GRAPH_EXTENSION), new FileOutputStream(file.getCanonicalPath() + BVGraph.GRAPH_EXTENSION));
+ copy(getClass().getResourceAsStream(basename + BVGraph.OFFSETS_EXTENSION), new FileOutputStream(file.getCanonicalPath() + BVGraph.OFFSETS_EXTENSION));
+ copy(getClass().getResourceAsStream(basename + BVGraph.PROPERTIES_EXTENSION), new FileOutputStream(file.getCanonicalPath() + BVGraph.PROPERTIES_EXTENSION));
+
+ return file.getCanonicalPath();
+ }
+
+ /** Cleans up a temporary graph.
+ *
+ * @param basename the basename.
+ */
+
+ public static void deleteGraph(final String basename) {
+ deleteGraph(new File(basename));
+ }
+
+
+ /** Cleans up a temporary graph.
+ *
+ * @param basename the basename.
+ */
+ public static void deleteGraph(final File basename) {
+ new File(basename + BVGraph.GRAPH_EXTENSION).delete();
+ new File(basename + BVGraph.OFFSETS_EXTENSION).delete();
+ new File(basename + BVGraph.OFFSETS_BIG_LIST_EXTENSION).delete();
+ new File(basename + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ }
+
+ /** Performs a stress-test of an immutable graph. All available methods
+ * for accessing outdegrees and successors are cross-checked.
+ *
+ * @param g the immutable graph to be tested.
+ */
+
+ public static void assertGraph(ImmutableGraph g) {
+ NodeIterator nodeIterator0 = g.nodeIterator(), nodeIterator1 = g.nodeIterator();
+ long d;
+ long[][] s0;
+ Label[][] l0;
+ LazyLongIterator s1;
+ int m = 0;
+ long curr;
+ // Check that iterator and array methods return the same values in sequential scans.
+ for(long i = g.numNodes(); i-- != 0;) {
+ curr = nodeIterator0.nextLong();
+ assertEquals(curr, nodeIterator1.nextLong());
+ d = nodeIterator0.outdegree();
+ m += d;
+ assertEquals(d, nodeIterator1.outdegree());
+
+ s0 = nodeIterator0.successorBigArray();
+ s1 = nodeIterator1.successors();
+ for(long k = 0; k < d; k++) assertEquals(LongBigArrays.get(s0, k), s1.nextLong());
+ assertEquals(-1, s1.nextLong());
+
+ if (g instanceof ArcLabelledImmutableGraph) {
+ l0 = ((ArcLabelledNodeIterator)nodeIterator0).labelBigArray();
+ s1 = ((ArcLabelledNodeIterator)nodeIterator1).successors();
+ for(long k = 0; k < d; k++) {
+ s1.nextLong();
+ assertEquals(ObjectBigArrays.get(l0, k), ((LabelledArcIterator)s1).label());
+ }
+ }
+
+ assertEquals(-1, s1.nextLong());
+ }
+
+ try {
+ assertEquals(m, g.numArcs());
+ }
+ catch(UnsupportedOperationException ignore) {} // A graph might not support numArcs().
+ assertFalse(nodeIterator0.hasNext());
+ assertFalse(nodeIterator1.hasNext());
+
+ if (! g.randomAccess()) return;
+
+ // Check that sequential iterator methods and random methods do coincide.
+ String msg;
+
+ for(long s = 0; s < g.numNodes() - 1; s++) {
+ nodeIterator1 = g.nodeIterator(s);
+ for(long i = g.numNodes() - s; i-- != 0;) {
+ curr = nodeIterator1.nextLong();
+ msg = "Node " + curr + ", starting from " + s + ":";
+ d = g.outdegree(curr);
+ assertEquals(msg, d, nodeIterator1.outdegree());
+ s0 = g.successorBigArray(curr);
+ s1 = nodeIterator1.successors();
+ for(long k = 0; k < d; k++) assertEquals(msg, LongBigArrays.get(s0, k), s1.nextLong());
+ s1 = g.successors(curr);
+ for(long k = 0; k < d; k++) assertEquals(msg, LongBigArrays.get(s0, k), s1.nextLong());
+ assertEquals(msg, -1, s1.nextLong());
+
+ if (g instanceof ArcLabelledImmutableGraph) {
+ l0 = ((ArcLabelledImmutableGraph)g).labelBigArray(curr);
+ s1 = ((ArcLabelledNodeIterator)nodeIterator1).successors();
+ for(long k = 0; k < d; k++) {
+ s1.nextLong();
+ assertEquals(msg, ObjectBigArrays.get(l0, k), ((LabelledArcIterator)s1).label());
+ }
+ s1 = g.successors(curr);
+ for(long k = 0; k < d; k++) {
+ s1.nextLong();
+ assertEquals(msg, ObjectBigArrays.get(l0, k), ((LabelledArcIterator)s1).label());
+ }
+ assertEquals(msg, -1, s1.nextLong());
+ }
+ }
+ }
+
+ // Check that cross-access works.
+
+ nodeIterator0 = g.nodeIterator();
+ for(long s = 0; s < g.numNodes(); s++) {
+ d = g.outdegree(s);
+ nodeIterator0.nextLong();
+ LazyLongIterator successors = g.successors(s);
+ long[][] succ = nodeIterator0.successorBigArray();
+ for(long i = 0; i < d; i++) {
+ final long t = successors.nextLong();
+ assertEquals(LongBigArrays.get(succ, i), t);
+ g.outdegree(t);
+ }
+
+ }
+ // Check copies
+ assertEquals(g, g.copy());
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/ConnectedComponentsTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/ConnectedComponentsTest.java
new file mode 100644
index 0000000..1369386
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/ConnectedComponentsTest.java
@@ -0,0 +1,66 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+
+public class ConnectedComponentsTest extends WebGraphTestCase {
+ public static void sameComponents(ImmutableGraph g) {
+ StronglyConnectedComponentsTarjan stronglyConnectedComponents = StronglyConnectedComponentsTarjan.compute(g, false, new ProgressLogger());
+ long[][] size2 = stronglyConnectedComponents.computeSizes();
+ stronglyConnectedComponents.sortBySize(size2);
+
+ for(int t = 0; t < 3; t++) {
+ ConnectedComponents connectedComponents = ConnectedComponents.compute(g, t, new ProgressLogger());
+ long[][] size = connectedComponents.computeSizes();
+ connectedComponents.sortBySize(size);
+ for(long i = g.numNodes(); i-- != 0;)
+ for(long j = i; j-- != 0;)
+ assert((LongBigArrays.get(connectedComponents.component, i) == LongBigArrays.get(connectedComponents.component, j))
+ == (LongBigArrays.get(stronglyConnectedComponents.component, i) == LongBigArrays.get(stronglyConnectedComponents.component, j)));
+ }
+ }
+
+ @Test
+ public void testSmall() {
+ sameComponents(ImmutableGraph.wrap(ArrayListMutableGraph.newBidirectionalCycle(40).immutableView()));
+ }
+
+ @Test
+ public void testBinaryTree() {
+ sameComponents(ImmutableGraph.wrap(Transform.symmetrize(ArrayListMutableGraph.newCompleteBinaryIntree(10).immutableView())));
+ }
+
+ @Test
+ public void testErdosRenyi() {
+ for(int size: new int[] { 10, 100, 1000 })
+ for(int attempt = 0; attempt < 5; attempt++)
+ sameComponents(ImmutableGraph.wrap(Transform.symmetrize(new ArrayListMutableGraph(new ErdosRenyiGraph(size, .001, attempt + 1, true)).immutableView())));
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/EliasFanoCumulativeOutdegreeListTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/EliasFanoCumulativeOutdegreeListTest.java
new file mode 100644
index 0000000..8e1c6b4
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/EliasFanoCumulativeOutdegreeListTest.java
@@ -0,0 +1,81 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+
+public class EliasFanoCumulativeOutdegreeListTest extends WebGraphTestCase {
+
+ @Test
+ public void testEliasFano() {
+ final ImmutableGraph graph = ImmutableGraph.wrap(new ArrayListMutableGraph(new ErdosRenyiGraph(10000, .001, 0, false)).immutableView());
+ for(long mask: new long[] { 0, 1, 3 }) {
+ final EliasFanoCumulativeOutdegreeList eliasFanoMonotoneLongBigList = new EliasFanoCumulativeOutdegreeList(graph, graph.numArcs(), mask);
+ final long n = graph.numNodes();
+ final long m = graph.numArcs();
+
+ for(long i = 1; i < m;) {
+ final long s = eliasFanoMonotoneLongBigList.skipTo(i);
+ assertEquals(0, eliasFanoMonotoneLongBigList.currentIndex() & mask);
+ long j = 0, c = 0;
+ while(j < n) if ((c += graph.outdegree(j++)) >= i && (j & mask) == 0) break;
+ assertEquals(j, eliasFanoMonotoneLongBigList.currentIndex());
+ assertEquals(c, s);
+ i = c + 1;
+ }
+
+ for(long i = 1; i < m;) {
+ final long s = eliasFanoMonotoneLongBigList.skipTo(i);
+ assertEquals(0, eliasFanoMonotoneLongBigList.currentIndex() & mask);
+ long j = 0, c = 0;
+ while(j < n) if ((c += graph.outdegree(j++)) >= i && (j & mask) == 0) break;
+ assertEquals(j, eliasFanoMonotoneLongBigList.currentIndex());
+ assertEquals(c, s);
+ i = c + (m - c) / 2;
+ }
+
+ if (mask == 0) {
+ long c = 0;
+ for(long i = 0; i < n - 1; i++) {
+ c += graph.outdegree(i);
+ long s = eliasFanoMonotoneLongBigList.skipTo(c);
+ assertEquals(i + 1, eliasFanoMonotoneLongBigList.currentIndex());
+ assertEquals(c, s);
+ s = eliasFanoMonotoneLongBigList.skipTo(c + 1);
+ assertEquals(i + 2, eliasFanoMonotoneLongBigList.currentIndex());
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testZeroLength() {
+ final ImmutableGraph graph = ImmutableGraph.wrap(new ArrayListMutableGraph().immutableView());
+ final EliasFanoCumulativeOutdegreeList eliasFanoMonotoneLongBigList = new EliasFanoCumulativeOutdegreeList(graph, graph.numArcs(), 0);
+ assertEquals(-1, eliasFanoMonotoneLongBigList.currentIndex());
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/HyperBallTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/HyperBallTest.java
new file mode 100644
index 0000000..8ea8f4a
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/HyperBallTest.java
@@ -0,0 +1,482 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi & Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.fastutil.floats.FloatBigArrays;
+import it.unimi.dsi.fastutil.ints.Int2DoubleFunction;
+import it.unimi.dsi.fastutil.ints.IntArrayFIFOQueue;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+import it.unimi.dsi.util.XoRoShiRo128PlusRandom;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.Transform;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import java.io.IOException;
+import java.util.Arrays;
+
+import org.junit.Test;
+
+
+public class HyperBallTest extends WebGraphTestCase {
+ // Below this threshold errors due to block-by-block summing start to appear.
+ public static final double THRESHOLD = 1E-9;
+
+ /** Checks that the state of two HyperBall implementation (as
+ * returned by {@link HyperLogLogCounterArray#registers()}) are exactly the same. */
+ public final static void assertState(final long size, final int log2m, final LongBigList[] a, final LongBigList[] b) {
+ final int m = 1 << log2m;
+ for(long i = 0; i < size; i++) {
+ for(int j = 0; j < m; j++) {
+ final long index = (i << log2m) + j;
+ final int chunk = (int)(index >>> HyperLogLogCounterArray.CHUNK_SHIFT);
+ final long offset = index & HyperLogLogCounterArray.CHUNK_MASK;
+ assertEquals("Counter " + i + ", register " + j + ": ", a[chunk].getLong(offset), b[chunk].getLong(offset));
+ }
+ }
+ }
+
+ @Test
+ public void testTrivial() throws IOException {
+ ImmutableGraph g = ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryIntree(10).immutableView());
+ HyperBall hyperBall = new HyperBall(g, g, 7, null, 0, 0, 0, false, false, false, null, 0);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.close();
+
+ hyperBall = new HyperBall(g, g, 7, null, 0, 0, 0, true, false, false, null, 0);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.run(Long.MAX_VALUE, -1);
+ hyperBall.close();
+
+ }
+
+ public static void assertRelativeError(double sequentialCurrent, double current, double threshold) {
+ assertTrue(sequentialCurrent + " != " + current + ", " + Math.abs(current - sequentialCurrent) / current + " > " + threshold, Math.abs(current - sequentialCurrent) / current <= THRESHOLD);
+ }
+
+ /* All tests in this class check that 2 times the theoretical relative standard deviation
+ * is attained in 9 trials out of 10. The theory (in particular, the Vysochanskii-Petunin inequality)
+ * indeed says it should happen 90% of the times. */
+
+ @Test
+ public void testClique() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6, 8 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 10, 100, 500 }) {
+ int correct = 0;
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ final it.unimi.dsi.webgraph.ImmutableGraph view = ArrayListMutableGraph.newCompleteGraph(size, false).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10, 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ if (Math.abs(size * size - current) <= 2 * rsd * size * size) correct++;
+
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ assertTrue(size + ":" + rsd + " " + correct + " < " + 9, correct >= 9);
+ }
+ }
+ }
+
+ @Test
+ public void testErdosRenyi() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6, 8 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ final it.unimi.dsi.webgraph.ImmutableGraph view = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ do {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ } while(hyperBall.modified() != 0);
+
+ hyperBall.init();
+ sequentialHyperBall.init();
+ do {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ } while(hyperBall.modified() != 0);
+
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testCycle() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 100, 500, 1000 }) {
+ final int[] correct = new int[size + 1];
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ final it.unimi.dsi.webgraph.ImmutableGraph view = ArrayListMutableGraph.newDirectedCycle(size).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ for(int i = 2; i <= size; i++) {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ if (Math.abs(size * i - current) <= 2 * rsd * size * i) correct[i]++;
+ }
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ for(int i = 2; i <= size; i++) assertTrue(size + ":" + rsd + " " + correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ }
+
+ }
+
+ @Test
+ public void testLine() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 100, 500, 1000 }) {
+ final int[] correct = new int[size + 1];
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ArrayListMutableGraph directedCycle = ArrayListMutableGraph.newDirectedCycle(size);
+ directedCycle.removeArc(0, 1);
+ it.unimi.dsi.webgraph.ImmutableGraph view = directedCycle.immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(Transform.transpose(view));
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ for(int i = 2; i <= size; i++) {
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ long result = 0;
+ for(int j = 0; j < i; j++) result += (size - j);
+ if (Math.abs(result - current) <= 2 * rsd * size * i) correct[i]++;
+ }
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ for(int i = 2; i <= size; i++) assertTrue(size + ":" + rsd + " " + correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ }
+
+ }
+
+ @Test
+ public void testOutdirectedStar() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6 }) {
+ final double rsd = HyperBall.relativeStandardDeviation(log2m);
+ for(int size: new int[] { 100, 500, 1000 }) {
+ int correct = 0;
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ ArrayListMutableGraph mg = new ArrayListMutableGraph(size);
+ for(int i = 1; i < size; i++) mg.addArc(0, i);
+ final it.unimi.dsi.webgraph.ImmutableGraph view = mg.immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+ hyperBall.iterate();
+ final double current = hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1);
+ final double sequentialCurrent = sequentialHyperBall.iterate();
+ assertState(size, log2m, sequentialHyperBall.registers(), hyperBall.registers());
+ assertRelativeError(sequentialCurrent, current, THRESHOLD);
+ if (Math.abs(size * 2 - 1 - current) <= 2 * rsd * (size * 2 - 1)) correct++;
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ assertTrue(size + ":" + rsd + " " + correct + " < " + 9, correct >= 9);
+ }
+ }
+ }
+
+ @Test
+ public void testTree() throws IOException {
+ for(int log2m: new int[] { 4, 5, 6, 7, 8, 10, 12 }) {
+ double rsd = HyperBall.relativeStandardDeviation(log2m);
+ final it.unimi.dsi.webgraph.ImmutableGraph view = ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+ final int[] correct = new int[3];
+ for(int attempt = 0; attempt < 10; attempt++) {
+ System.err.println("log2m: " + log2m + " attempt: " + attempt);
+ HyperBall hyperBall = new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, false, false, null, attempt);
+ SequentialHyperBall sequentialHyperBall = new SequentialHyperBall(g, log2m, null, attempt);
+ hyperBall.init();
+ sequentialHyperBall.init();
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 29) <= 2 * rsd * 29) correct[0]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 41) <= 2 * rsd * 41) correct[1]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 49) <= 2 * rsd * 49) correct[2]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ // Test that you can reuse the object
+
+ hyperBall.init();
+ sequentialHyperBall.init();
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 29) <= 2 * rsd * 29) correct[0]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 41) <= 2 * rsd * 41) correct[1]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.iterate();
+ if (Math.abs(hyperBall.neighbourhoodFunction.getDouble(hyperBall.neighbourhoodFunction.size() - 1) - 49) <= 2 * rsd * 49) correct[2]++;
+ sequentialHyperBall.iterate();
+ assertState(g.numNodes(), log2m, sequentialHyperBall.registers(), hyperBall.registers());
+
+ hyperBall.close();
+ sequentialHyperBall.close();
+ }
+ //System.err.println(Arrays.toString(correct));
+ for(int i = 0; i < 3; i++) assertTrue(rsd + " " + correct[i] + " < " + 9, correct[i] >= 9);
+ }
+ }
+
+ @Test(expected=IllegalStateException.class)
+ public void testInitClosed() throws IOException {
+ ImmutableGraph g = ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView());
+ HyperBall hyperBall = new HyperBall(g, 8);
+ hyperBall.close();
+ hyperBall.init();
+ }
+
+ @Test(expected=IllegalStateException.class)
+ public void testInitIterate() throws IOException {
+ ImmutableGraph g = ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView());
+ HyperBall hyperBall = new HyperBall(g, 8);
+ hyperBall.close();
+ hyperBall.iterate();
+ }
+
+ private int[] distancesFrom(final it.unimi.dsi.webgraph.ImmutableGraph graph, final int from) {
+ final IntArrayFIFOQueue queue = new IntArrayFIFOQueue();
+ final int n = graph.numNodes();
+ final int[] dist = new int[n];
+ Arrays.fill(dist, Integer.MAX_VALUE); // Initially, all distances are infinity.
+
+ queue.enqueue(from);
+ dist[from] = 0;
+
+ it.unimi.dsi.webgraph.LazyIntIterator successors;
+
+ while(! queue.isEmpty()) {
+ int curr = queue.dequeueInt();
+ successors = graph.successors(curr);
+ int d = graph.outdegree(curr);
+ while(d-- != 0) {
+ int succ = successors.nextInt();
+ if (dist[succ] == Integer.MAX_VALUE) {
+ dist[succ] = dist[curr] + 1;
+ queue.enqueue(succ);
+ }
+ }
+ }
+
+ return dist;
+ }
+
+ @Test
+ public void testErdosRenyiEccentricity() throws IOException {
+ XoRoShiRo128PlusRandom rand = new XoRoShiRo128PlusRandom(1);
+ for(int log2m: new int[] { 15 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+ final it.unimi.dsi.webgraph.ImmutableGraph view = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt + 1, false)).immutableView();
+
+ new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+
+ HyperBall hyperBall =
+ new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, true, false, null, attempt);
+ hyperBall.init();
+ do {
+ hyperBall.iterate();
+ } while(hyperBall.modified() != 0);
+
+ int n = (int)g.numNodes();
+ for (int i = 0; i < 10; i++) {
+ int from = rand.nextInt(n);
+ int dist[] = distancesFrom(view, from);
+ long totDist = 0;
+ int reachable = 0;
+ for (int k = 0; k < n; k++)
+ if (dist[k] < Integer.MAX_VALUE) {
+ reachable++;
+ totDist += dist[k];
+ }
+ assertEquals(1.0, reachable / hyperBall.count(from), 0.20);
+
+ double expEcc = (double)totDist / reachable;
+ double computedEcc = FloatBigArrays.get(hyperBall.sumOfDistances, from) / hyperBall.count(from);
+ if (expEcc == 0) assertEquals(0.0, computedEcc, 1E-3);
+ else assertEquals(1.0, expEcc / computedEcc, 0.15);
+ }
+
+ hyperBall.close();
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testErdosRenyiHarmonic() throws IOException {
+ XoRoShiRo128PlusRandom rand = new XoRoShiRo128PlusRandom(1);
+ for(int log2m: new int[] { 15 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+
+ final it.unimi.dsi.webgraph.ImmutableGraph view = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+
+ HyperBall hyperBall =
+ new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, true, true, null, attempt);
+ hyperBall.init();
+ do {
+ hyperBall.iterate();
+ } while(hyperBall.modified() != 0);
+
+ int n = (int)g.numNodes();
+ for (int i = 0; i < 10; i++) {
+ int from = rand.nextInt(n);
+ int dist[] = distancesFrom(view, from);
+ double totDist = 0;
+ for (int k = 0; k < n; k++)
+ if (dist[k] < Integer.MAX_VALUE && dist[k] > 0)
+ totDist += 1.0 / dist[k];
+ double expHarm = n / totDist;
+ double computedHarm = n / FloatBigArrays.get(hyperBall.sumOfInverseDistances, from);
+ if (totDist != 0) assertEquals(1.0, expHarm / computedHarm, 0.1);
+ }
+
+ hyperBall.close();
+ }
+ }
+ }
+ }
+
+
+ @Test
+ public void testErdosRenyiGain() throws IOException {
+ for(int log2m: new int[] { 15 }) {
+ for(int size: new int[] { 10, 100, 500 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ System.err.println("log2m: " + log2m + " size: " + size + " attempt: " + attempt);
+
+ final it.unimi.dsi.webgraph.ImmutableGraph view = new ArrayListMutableGraph(new ErdosRenyiGraph(size, .1, attempt, false)).immutableView();
+ ImmutableGraph g = ImmutableGraph.wrap(view);
+ ImmutableGraph gt = ImmutableGraph.wrap(it.unimi.dsi.webgraph.Transform.transpose(view));
+
+ HyperBall hyperBall =
+ new HyperBall(g, attempt % 3 == 0 ? null : gt, log2m, null, 0, 10 * (attempt % 3), 10, attempt % 2 == 0, true, true, new Int2DoubleFunction[] {
+ new it.unimi.dsi.webgraph.algo.HyperBall.AbstractDiscountFunction() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int distance) {
+ return distance;
+ }
+ },
+ new it.unimi.dsi.webgraph.algo.HyperBall.AbstractDiscountFunction() {
+ private static final long serialVersionUID = 1L;
+ @Override
+ public double get(int distance) {
+ return 1. / distance;
+ }
+ }
+ },
+ attempt);
+ hyperBall.init();
+ do {
+ hyperBall.iterate();
+ } while(hyperBall.modified() != 0);
+
+ int n = (int)g.numNodes();
+ for (int i = 0; i < n; i++) {
+ assertEquals(FloatBigArrays.get(hyperBall.sumOfDistances, i), FloatBigArrays.get(hyperBall.discountedCentrality[0], i), 1E-5);
+ assertEquals(FloatBigArrays.get(hyperBall.sumOfInverseDistances, i), FloatBigArrays.get(hyperBall.discountedCentrality[1], i), 1E-5);
+ }
+ hyperBall.close();
+ }
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/ParallelBreadthFirstVisitTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/ParallelBreadthFirstVisitTest.java
new file mode 100644
index 0000000..fda9849
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/ParallelBreadthFirstVisitTest.java
@@ -0,0 +1,67 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.bits.Fast;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import org.junit.Test;
+import org.slf4j.helpers.NOPLogger;
+
+/*
+ * Copyright (C) 2011-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+public class ParallelBreadthFirstVisitTest {
+ private final ProgressLogger pl = new ProgressLogger(NOPLogger.NOP_LOGGER);
+
+ @Test
+ public void testTree() {
+ ImmutableGraph graph = ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryOuttree(10).immutableView());
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(graph, 0, false, pl);
+ visit.visit(0);
+ final int d[] = new int[(int)graph.numNodes()];
+ for(int i = 0; i < visit.cutPoints.size64() - 1; i++)
+ for(long j = visit.cutPoints.getLong(i); j < visit.cutPoints.getLong(i + 1); j++) d[(int)visit.queue.getLong(j)] = i;
+ for(int i = 0; i < graph.numNodes(); i++) assertEquals(Integer.toString(i), Fast.mostSignificantBit(i + 1), d[i]);
+ }
+
+ @Test
+ public void testStar() {
+ ArrayListMutableGraph graph = new ArrayListMutableGraph(1 + 10 + 100 + 1000);
+ for(int i = 1; i <= 10; i++) {
+ graph.addArc(0, i);
+ graph.addArc(i, 0);
+ for(int j = 1; j <= 10; j++) {
+ graph.addArc(i, i * 10 + j);
+ graph.addArc(i * 10 + j, i);
+ for(int k = 1; k <= 10; k++) {
+ graph.addArc(i * 10 + j, (i * 10 + j) * 10 + k);
+ graph.addArc((i * 10 + j) * 10 + k, i * 10 + j);
+ }
+ }
+ }
+
+ ParallelBreadthFirstVisit visit = new ParallelBreadthFirstVisit(ImmutableGraph.wrap(graph.immutableView()), 0, false, pl);
+ long componentSize = visit.visit(0);
+ for(int i = 1; i < graph.numNodes(); i++) {
+ visit.clear();
+ assertEquals("Source: " + i, componentSize, visit.visit(i));
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/SequentialHyperBall.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/SequentialHyperBall.java
new file mode 100644
index 0000000..1d75f24
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/SequentialHyperBall.java
@@ -0,0 +1,415 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2010-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.GraphClassParser;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.doubles.DoubleArrayList;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.io.FastBufferedInputStream;
+import it.unimi.dsi.fastutil.io.FastBufferedOutputStream;
+import it.unimi.dsi.fastutil.io.TextIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongBigList;
+import it.unimi.dsi.io.SafelyCloseable;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.util.HyperLogLogCounterArray;
+
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.util.Arrays;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+/** <p>Computes the approximate neighbourhood function of a graph using a sequential version of HyperBall.
+ *
+ * @author Paolo Boldi and Sebastiano Vigna
+ */
+
+public class SequentialHyperBall extends HyperLogLogCounterArray implements SafelyCloseable {
+ private static final Logger LOGGER = LoggerFactory.getLogger(SequentialHyperBall.class);
+ private static final boolean ASSERTS = true;
+
+ private static final long serialVersionUID = 1L;
+
+ protected final static int NODE_MASK = (int)(CHUNK_MASK >>> 6);
+
+ /** The graph whose neighbourhood function we are going to approximate. */
+ private final ImmutableGraph g;
+ /** The number of nodes of {@link #g}, cached. */
+ private final long numNodes;
+ /** The square of {@link #numNodes}, cached. */
+ private final double squareNumNodes;
+ /** The name of the temporary file that will be used to dump the new set of counters. */
+ private final File tempFile;
+ /** The file output stream on {@link #tempFile} for writing newly computed registers. */
+ private final FileOutputStream fos;
+ /** A data output stream wrapping {@link FileOutputStream }. */
+ private final DataOutputStream dos;
+ /** An input stream on {@link #tempFile} for reading newly computed registers. */
+ private final FastBufferedInputStream fbis;
+ /** A progress logger, or <code>null</code>. */
+ private final ProgressLogger pl;
+ /** A temporary array used by {@link #subtract(long[], long[], int)}. */
+ private final long accumulator[];
+ /** A temporary array used by {@link #subtract(long[], long[], int)}. */
+ private final long mask[];
+ /** The value computed by the last call to {@link #iterate()} . */
+ private double last;
+ /** Whether this approximator has been already closed. */
+ private boolean closed;
+
+ private final static int ensureEnoughRegisters(final int log2m) {
+ if (log2m < 4) throw new IllegalArgumentException("There must be at least 16 registers per counter");
+ return log2m;
+ }
+
+ /** Creates a new approximator for the neighbourhood function.
+ *
+ * @param g the graph whosee neighbourhood function you want to compute.
+ * @param log2m the logarithm of the number of registers per counter.
+ * @param pl a progress logger, or <code>null</code>.
+ */
+ public SequentialHyperBall(final ImmutableGraph g, final int log2m, final ProgressLogger pl, final long seed) throws IOException {
+ super(g.numNodes(), g.numNodes(), ensureEnoughRegisters(log2m), seed);
+
+ if (pl != null) pl.logger().info("Precision: " + Util.format(100 * HyperLogLogCounterArray.relativeStandardDeviation(log2m)) + "% (" + m + " registers/counter, " + registerSize + " bits/counter)");
+
+ this.g = g;
+ this.pl = pl;
+
+ numNodes = g.numNodes();
+ squareNumNodes = (double)numNodes * numNodes;
+
+ tempFile = File.createTempFile(SequentialHyperBall.class.getName(), "temp");
+ tempFile.deleteOnExit();
+ dos = new DataOutputStream(new FastBufferedOutputStream(fos = new FileOutputStream(tempFile)));
+ fbis = new FastBufferedInputStream(new FileInputStream(tempFile));
+
+ accumulator = new long[counterLongwords];
+ mask = new long[counterLongwords];
+ }
+
+ /** Initialises the approximator.
+ *
+ * <p>This method must be call before a series of {@linkplain #iterate() iterations}.
+ */
+ public void init() {
+ if (pl != null) {
+ pl.itemsName = "iterates";
+ pl.start("Iterating...");
+ }
+ for(long[] a: bits) Arrays.fill(a, 0);
+ for(long i = numNodes; i-- != 0;) add(i, i);
+ last = numNodes;
+ }
+
+ @Override
+ public void close() throws IOException {
+ if (closed) return;
+ closed = true;
+ dos.close();
+ fbis.close();
+ tempFile.delete();
+ }
+
+ @Override
+ protected void finalize() throws Throwable {
+ try {
+ if (! closed) {
+ LOGGER.warn("This " + this.getClass().getName() + " [" + toString() + "] should have been closed.");
+ close();
+ }
+ }
+ finally {
+ super.finalize();
+ }
+ }
+
+
+ /** Performs a multiple precision subtraction, leaving the result in the first operand.
+ *
+ * @param x a vector of longs.
+ * @param y a vector of longs that will be subtracted from <code>x</code>.
+ * @param l the length of <code>x</code> and <code>y</code>.
+ */
+ private final static void subtract(final long[] x, final long[] y, final int l) {
+ boolean borrow = false;
+
+ for(int i = 0; i < l; i++) {
+ if (! borrow || x[i]-- != 0) borrow = x[i] < y[i] ^ x[i] < 0 ^ y[i] < 0; // This expression returns the result of an unsigned strict comparison.
+ x[i] -= y[i];
+ }
+ }
+
+ /** Computes the register-by-register maximum of two bit vectors.
+ *
+ * @param x first vector of longs, representing a bit vector in {@link LongArrayBitVector} format, where the result will be stored.
+ * @param y a second vector of longs, representing a bit vector in {@link LongArrayBitVector} format, that will be maximised with <code>x</code>.
+ * @param r the register size.
+ */
+
+ private final void max(final long[] x, final long[] y, final int r) {
+ final int l = x.length;
+
+ // Local copies of vectors used to store intermediate results.
+ final long[] accumulator = this.accumulator;
+ final long[] mask = this.mask;
+ final long[] msbMask = this.msbMask;
+
+ /* We work in two phases. Let H_r (msbMask) by the mask with the
+ * highest bit of each register (of size r) set, and L_r (lsbMask)
+ * be the mask with the lowest bit of each register set.
+ * We describe the algorithm on a single word.
+ *
+ * If the first phase we perform an unsigned strict register-by-register
+ * comparison of x and y, using the formula
+ *
+ * z = ((((x | H_r) - (y & ~H_r)) | (x ^ y))^ (x | ~y)) & H_r
+ *
+ * Then, we generate a register-by-register mask of all ones or
+ * all zeroes, depending on the result of the comparison, using the
+ * formula
+ *
+ * (((z >> r-1 | H_r) - L_r) | H_r) ^ z
+ *
+ * At that point, it is trivial to select from x and y the right values.
+ */
+
+ // We load x | H_r into the accumulator
+ for(int i = l; i-- != 0;) accumulator[i] = x[i] | msbMask[i];
+ // We subtract y & ~H_r, using mask as temporary storage
+ for(int i = l; i-- != 0;) mask[i] = y[i] & ~msbMask[i];
+ subtract(accumulator, mask, l);
+
+ // We OR with x ^ y, XOR with (x | ~y), and finally AND with H_r.
+ for(int i = l; i-- != 0;) accumulator[i] = ((accumulator[i] | (x[i] ^ y[i])) ^ (x[i] | ~y[i])) & msbMask[i];
+
+ if (ASSERTS) {
+ final LongBigList a = LongArrayBitVector.wrap(x).asLongBigList(r);
+ final LongBigList b = LongArrayBitVector.wrap(y).asLongBigList(r);
+ for(int i = 0; i < a.size(); i++) {
+ long pos = (i + 1) * (long)r - 1;
+ assert (a.getLong(i) < b.getLong(i)) == ((accumulator[(int)(pos / Long.SIZE)] & 1L << pos % Long.SIZE) != 0);
+ }
+ }
+
+ // We shift by r - 1 places and put the result into mask
+ final int rMinus1 = r - 1;
+ for(int i = l - 1; i-- != 0;) mask[i] = accumulator[i] >>> rMinus1 | accumulator[i + 1] << (Long.SIZE - rMinus1) | msbMask[i];
+ mask[l - 1] = accumulator[l - 1] >>> rMinus1 | msbMask[l - 1];
+
+ // We subtract L_r from mask
+ subtract(mask, lsbMask, l);
+
+ // We OR with H_r and XOR with the accumulator
+ for(int i = l; i-- != 0;) mask[i] = (mask[i] | msbMask[i]) ^ accumulator[i];
+
+ if (ASSERTS) {
+ final long[] t = x.clone();
+ LongBigList a = LongArrayBitVector.wrap(t).asLongBigList(r);
+ LongBigList b = LongArrayBitVector.wrap(y).asLongBigList(r);
+ for(int i = 0; i < Long.SIZE * l / r; i++) a.set(i, Math.max(a.getLong(i), b.getLong(i)));
+ // Note: this must be kept in sync with the line computing the result.
+ for(int i = l; i-- != 0;) assert t[i] == (mask[i] & x[i] | ~mask[i] & y[i]);
+ }
+
+ // Finally, we use mask to select the right bits from x and y and store the result.
+ for(int i = l; i-- != 0;) x[i] = mask[i] & x[i] | ~mask[i] & y[i];
+
+ }
+
+ private final void copyToLocal(final LongArrayBitVector chunk, final long[] t, final long node) {
+ // Offset in bits
+ final long counterLongwords = t.length;
+ long offset = (node << log2m & CHUNK_MASK) * registerSize;
+ // Note that we might copy a few bits in excess, but they will not be used anyway.
+ for(int i = 0; i < counterLongwords; i++, offset += Long.SIZE) t[i] = chunk.getLong(offset, Math.min(offset + Long.SIZE, chunk.length()));
+ }
+
+ /** Performs a new iteration of HyperBall.
+ *
+ * @return an approximation of the following value of the neighbourhood function (the
+ * first returned value is for distance one).
+ */
+ public double iterate() throws IOException {
+ final LongArrayBitVector bitVector[] = new LongArrayBitVector[bits.length];
+ for(int i = bits.length; i-- != 0;) bitVector[i] = LongArrayBitVector.wrap(bits[i]);
+
+ final NodeIterator nodeIterator = g.nodeIterator();
+ final int counterBits = registerSize << log2m;
+ final int nodeShift = this.counterShift;
+
+ final long t[] = new long[counterLongwords];
+ final long u[] = new long[counterLongwords];
+
+ final ProgressLogger nodeProgressLogger = pl == null ? null : new ProgressLogger(LOGGER, 10, TimeUnit.MINUTES, "nodes");
+
+ fbis.flush();
+ dos.flush();
+ fos.getChannel().position(0);
+
+ if (nodeProgressLogger != null) {
+ nodeProgressLogger.expectedUpdates = numNodes;
+ nodeProgressLogger.start("Scanning graph...");
+ }
+
+ for(long i = 0; i < numNodes; i++) {
+ nodeIterator.nextLong();
+ long d = nodeIterator.outdegree();
+ final long[][] successor = nodeIterator.successorBigArray();
+ copyToLocal(bitVector[(int)(i >>> nodeShift)], t, i);
+ while(d-- != 0) {
+ final long s = LongBigArrays.get(successor, d);
+ if (s != i) { // Self-loops to not influence the computation
+ copyToLocal(bitVector[(int)(s >>> nodeShift)], u, s);
+ max(t, u, registerSize);
+ }
+ }
+
+ if (ASSERTS) {
+ LongBigList test = LongArrayBitVector.wrap(t).asLongBigList(registerSize);
+ for(int rr = 0; rr < m; rr++) {
+ int max = (int)registers[(int)(((i << log2m) + rr) >> CHUNK_SHIFT)].getLong(((i << log2m) + rr) & CHUNK_MASK);
+ for(long j = nodeIterator.outdegree(); j-- != 0;) max = Math.max(max, (int)registers[(int)(((LongBigArrays.get(successor, j) << log2m) + rr) >> CHUNK_SHIFT)].getLong(((LongBigArrays.get(successor, j) << log2m) + rr) & CHUNK_MASK));
+ assert max == test.getLong(rr) : max + "!=" + test.getLong(rr) + " [" + rr + "]";
+ }
+ }
+
+ // We store long-size padded bits.
+ BinIO.storeLongs(t, dos);
+
+ if (nodeProgressLogger != null) nodeProgressLogger.lightUpdate();
+ }
+
+ if (nodeProgressLogger != null) nodeProgressLogger.done();
+
+ dos.flush();
+ fbis.position(0);
+ final DataInputStream dis = new DataInputStream(fbis);
+
+ for(int i = 0; i < bitVector.length; i++) {
+ final int numCounters = (int)(registers[i].size64() >> log2m);
+ bitVector[i].clear();
+ for(int j = 0; j < numCounters; j++) {
+ // We read long-size padded bits and store just the useful part.
+ BinIO.loadLongs(dis, t);
+ bitVector[i].append(LongArrayBitVector.wrap(t).subVector(0, counterBits));
+ }
+ }
+
+ double result = 0, c = 0, y, z;
+ // Kahan summation
+ for(long i = numNodes; i-- != 0;) {
+ y = count(i) - c;
+ z = result + y;
+ c = (z - result) - y;
+ result = z;
+ }
+
+ if (pl != null) {
+ pl.update();
+ pl.logger().info("Pairs: " + result + " (" + 100.0 * result / squareNumNodes + "%)");
+ }
+
+ if (result < last) result = last;
+ last = result;
+ return result;
+ }
+
+ /** Returns an approximation of the neighbourhood function.
+ *
+ * @param upperBound an upper bound to the number of iterations.
+ * @param threshold a value that will be used to stop the computation either by absolute or relative increment.
+ * @return an approximation of the neighbourhood function.
+ */
+ public double[] approximateNeighbourhoodFunction(long upperBound, double threshold) throws IOException {
+ DoubleArrayList approximateNeighbourhoodFunction = new DoubleArrayList();
+ upperBound = Math.min(upperBound, numNodes);
+ double last;
+ approximateNeighbourhoodFunction.add(last = numNodes);
+ init();
+
+ for(long i = 0; i < upperBound; i++) {
+ final double current = iterate();
+ LOGGER.info("Absolute increment: " + (current - last));
+ if (current - last <= threshold) {
+ LOGGER.info("Terminating approximation after " + i + " iteration(s) by absolute bound");
+ break;
+ }
+
+ LOGGER.info("Relative increment: " + (current / last));
+ if (i > 3 && current / last < (1 + threshold)) {
+ LOGGER.info("Terminating approximation after " + i + " iteration(s) by relative bound");
+ break;
+ }
+ approximateNeighbourhoodFunction.add(last = current);
+ }
+
+ if (pl != null) pl.done();
+ return approximateNeighbourhoodFunction.toDoubleArray();
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException, IllegalArgumentException, ClassNotFoundException, IllegalAccessException, InvocationTargetException, InstantiationException, NoSuchMethodException {
+ SimpleJSAP jsap = new SimpleJSAP(SequentialHyperBall.class.getName(), "Prints an approximation of the neighbourhood function.",
+ new Parameter[] {
+ new FlaggedOption("log2m", JSAP.INTEGER_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, 'l', "log2m", "The logarithm of the number of registers."),
+ new FlaggedOption("upperBound", JSAP.LONGSIZE_PARSER, Long.toString(Long.MAX_VALUE), JSAP.NOT_REQUIRED, 'u', "upper-bound", "An upper bound to the number of iteration (default: the graph size)."),
+ new FlaggedOption("threshold", JSAP.DOUBLE_PARSER, Double.toString(1E-3), JSAP.NOT_REQUIRED, 't', "threshould", "A threshould that will be used to stop the computation by absolute or relative increment."),
+ new Switch("spec", 's', "spec", "The source is not a basename but rather a specification of the form <ImmutableGraphImplementation>(arg,arg,...)."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final boolean spec = jsapResult.getBoolean("spec");
+ final String basename = jsapResult.getString("basename");
+ final ProgressLogger pl = new ProgressLogger(LOGGER);
+ final int log2m = jsapResult.getInt("log2m");
+
+ final ImmutableGraph graph = spec ? ObjectParser.fromSpec(basename, ImmutableGraph.class, GraphClassParser.PACKAGE) : ImmutableGraph.loadOffline(basename);
+
+ SequentialHyperBall shb = new SequentialHyperBall(graph, log2m, pl, Util.randomSeed());
+ TextIO.storeDoubles(shb.approximateNeighbourhoodFunction(jsapResult.getLong("upperBound"), jsapResult.getDouble("threshold")), System.out);
+ shb.close();
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponentsTarjan.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponentsTarjan.java
new file mode 100644
index 0000000..7a7dc88
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponentsTarjan.java
@@ -0,0 +1,394 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+/*
+ * Copyright (C) 2007-2017 Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import java.io.IOException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.martiansoftware.jsap.FlaggedOption;
+import com.martiansoftware.jsap.JSAP;
+import com.martiansoftware.jsap.JSAPException;
+import com.martiansoftware.jsap.JSAPResult;
+import com.martiansoftware.jsap.Parameter;
+import com.martiansoftware.jsap.SimpleJSAP;
+import com.martiansoftware.jsap.Switch;
+import com.martiansoftware.jsap.UnflaggedOption;
+
+import it.unimi.dsi.Util;
+import it.unimi.dsi.big.webgraph.GraphClassParser;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.Transform.LabelledArcFilter;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledImmutableGraph;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.io.BinIO;
+import it.unimi.dsi.fastutil.longs.LongBigArrayBigList;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.lang.ObjectParser;
+import it.unimi.dsi.logging.ProgressLogger;
+
+/** Computes the strongly connected components (and optionally the buckets) of an immutable graph.
+ *
+ * <p>This class is a double implementation for debugging purposes.
+ *
+ * <p>The {@link #compute(ImmutableGraph, boolean, ProgressLogger)} method of this class will return
+ * an instance that contains the data computed by running a variant of Tarjan's algorithm on an immutable graph.
+ * Besides the usually strongly connected components, it is possible to compute the <em>buckets</em> of the
+ * graph, that is, nodes belonging to components that are terminal, but not dangling, in the component DAG.
+ *
+ * <p>After getting an instance, it is possible to run the {@link #computeSizes()}
+ * methods to obtain further information. This scheme has been devised to exploit the available memory as much
+ * as possible&mdash;after the components have been computed, the returned instance keeps no track of
+ * the graph, and the related memory can be freed by the garbage collector.
+ *
+ * <h2>Stack size</h2>
+ *
+ * <p>The method {@link #compute(ImmutableGraph, boolean, ProgressLogger)} might require a large stack size,
+ * that should be set using suitable JVM options. Note, however,
+ * that the stack size must be enlarged also on the operating-system side&mdash;for instance, using <code>ulimit -s unlimited</code>.
+ */
+
+
+public class StronglyConnectedComponentsTarjan {
+ private static final Logger LOGGER = LoggerFactory.getLogger(StronglyConnectedComponentsTarjan.class);
+ /** The number of strongly connected components. */
+ final public long numberOfComponents;
+ /** The component of each node. */
+ final public long[][] component;
+ /** The bit vector for buckets, or <code>null</code>, in which case buckets have not been computed. */
+ final public LongArrayBitVector buckets;
+
+ protected StronglyConnectedComponentsTarjan(final long numberOfComponents, final long[][] component, final LongArrayBitVector buckets) {
+ this.numberOfComponents = numberOfComponents;
+ this.component = component;
+ this.buckets = buckets;
+ }
+
+ private final static class Visit {
+ /** The graph. */
+ private final ImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final long n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final long[][] status;
+ /** The buckets. */
+ private final LongArrayBitVector buckets;
+ /** The component stack. */
+ private final LongBigArrayBigList stack;
+
+ /** The first-visit clock (incremented at each visited node). */
+ private long clock;
+ /** The number of components already output. */
+ private long numberOfComponents;
+
+ private Visit(final ImmutableGraph graph, final long[][] status, final LongArrayBitVector buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ stack = new LongBigArrayBigList(n);
+ }
+
+ /** Visits a node.
+ *
+ * @param x the node to visit.
+ * @return true if <code>x</code> is a bucket.
+ */
+ private boolean visit(final long x) {
+ final long[][] status = this.status;
+ if (pl != null) pl.lightUpdate();
+ long statusX;
+ LongBigArrays.set(status, x, statusX = ++clock);
+ stack.add(x);
+
+ long d = graph.outdegree(x);
+ boolean noOlderNodeFound = true, isBucket = d != 0; // If we're dangling we're certainly not a bucket.
+
+ if (d != 0) {
+ final LazyLongIterator successors = graph.successors(x);
+ while(d-- != 0) {
+ final long s = successors.nextLong();
+ // If we can reach a non-bucket or another component we are not a bucket.
+ if (LongBigArrays.get(status, s) == 0 && ! visit(s) || LongBigArrays.get(status, s) < 0) isBucket = false;
+ final long statusS = LongBigArrays.get(status, s); // Might have changed during the visit.
+ if (statusS > 0 && statusS < statusX) {
+ LongBigArrays.set(status, x, statusX = statusS);
+ noOlderNodeFound = false;
+ }
+ }
+ }
+
+ if (noOlderNodeFound) {
+ numberOfComponents++;
+ long z;
+ do {
+ z = stack.removeLong(stack.size64() - 1);
+ // Component markers are -c-1, where c is the component number.
+ LongBigArrays.set(status, z, -numberOfComponents);
+ if (isBucket && computeBuckets) buckets.set(z, true);
+ } while(z != x);
+ }
+
+ return isBucket;
+ }
+
+
+ public void run() {
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (long x = 0; x < n; x++) if (LongBigArrays.get(status, x) == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for(int i = status.length; i-- != 0;) {
+ final long[] t = status[i];
+ for(int d = t.length; d-- != 0;) t[d] = -t[d] - 1;
+ }
+
+ stack.add(numberOfComponents); // Horrible kluge to return the number of components.
+ }
+ }
+
+ /** Computes the strongly connected components of a given graph.
+ *
+ * @param graph the graph whose strongly connected components are to be computed.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponentsTarjan compute(final ImmutableGraph graph, final boolean computeBuckets, final ProgressLogger pl) {
+ final long n = graph.numNodes();
+ final Visit visit = new Visit(graph, LongBigArrays.newBigArray(n), computeBuckets ? LongArrayBitVector.ofLength(n) : null, pl);
+ visit.run();
+ return new StronglyConnectedComponentsTarjan(visit.numberOfComponents, visit.status, visit.buckets);
+ }
+
+
+ private final static class FilteredVisit {
+ /** The graph. */
+ private final ArcLabelledImmutableGraph graph;
+ /** The number of nodes in {@link #graph}. */
+ private final long n;
+ /** A progress logger. */
+ private final ProgressLogger pl;
+ /** A filter on arc labels. */
+ private final LabelledArcFilter filter;
+ /** Whether we should compute buckets. */
+ private final boolean computeBuckets;
+ /** For non visited nodes, 0. For visited non emitted nodes the visit time. For emitted node -c-1, where c is the component number. */
+ private final long[][] status;
+ /** The buckets. */
+ private final LongArrayBitVector buckets;
+ /** The component stack. */
+ private final LongBigArrayBigList stack;
+
+
+ /** The first-visit clock (incremented at each visited node). */
+ private long clock;
+ /** The number of components already output. */
+ private long numberOfComponents;
+
+ private FilteredVisit(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final long[][] status, final LongArrayBitVector buckets, ProgressLogger pl) {
+ this.graph = graph;
+ this.filter = filter;
+ this.buckets = buckets;
+ this.status = status;
+ this.pl = pl;
+ this.computeBuckets = buckets != null;
+ this.n = graph.numNodes();
+ stack = new LongBigArrayBigList(n);
+ }
+
+ /** Visits a node.
+ *
+ * @param x the node to visit.
+ * @return true if <code>x</code> is a bucket.
+ */
+ private boolean visit(final long x) {
+ final long[][] status = this.status;
+ if (pl != null) pl.lightUpdate();
+ LongBigArrays.set(status, x, ++clock);
+ stack.add(x);
+ long statusX = LongBigArrays.get(status, x);
+
+ long d = graph.outdegree(x);
+ long filteredDegree = 0;
+ boolean noOlderNodeFound = true, isBucket = true;
+
+ if (d != 0) {
+ final LabelledArcIterator successors = graph.successors(x);
+ while(d-- != 0) {
+ final long s = successors.nextLong();
+ final long statusS = LongBigArrays.get(status, s);
+ if (! filter.accept(x, s, successors.label())) continue;
+ filteredDegree++;
+ // If we can reach a non-bucket or another component we are not a bucket.
+ if (statusS == 0 && ! visit(s) || statusS < 0) isBucket = false;
+ if (statusS > 0 && statusS < statusX) {
+ LongBigArrays.set(status, x, statusX = statusS);
+ noOlderNodeFound = false;
+ }
+ }
+ }
+
+ if (filteredDegree == 0) isBucket = false;
+
+ if (noOlderNodeFound) {
+ numberOfComponents++;
+ long z;
+ do {
+ z = stack.removeLong(stack.size64() - 1);
+ // Component markers are -c-1, where c is the component number.
+ LongBigArrays.set(status, z, -numberOfComponents);
+ if (isBucket && computeBuckets) buckets.set(z);
+ } while(z != x);
+ }
+
+ return isBucket;
+ }
+
+
+ public void run() {
+ if (pl != null) {
+ pl.itemsName = "nodes";
+ pl.expectedUpdates = n;
+ pl.displayFreeMemory = true;
+ pl.start("Computing strongly connected components...");
+ }
+ for (long x = 0; x < n; x++) if (LongBigArrays.get(status, x) == 0) visit(x);
+ if (pl != null) pl.done();
+
+ // Turn component markers into component numbers.
+ for(int i = status.length; i-- != 0;) {
+ final long[] t = status[i];
+ for(int d = t.length; d-- != 0;) t[d] = -t[d] - 1;
+ }
+
+ stack.add(numberOfComponents); // Horrible kluge to return the number of components.
+ }
+ }
+
+ /** Computes the strongly connected components of a given arc-labelled graph, filtering its arcs.
+ *
+ * @param graph the arc-labelled graph whose strongly connected components are to be computed.
+ * @param filter a filter selecting the arcs that must be taken into consideration.
+ * @param computeBuckets if true, buckets will be computed.
+ * @param pl a progress logger, or <code>null</code>.
+ * @return an instance of this class containing the computed components.
+ */
+ public static StronglyConnectedComponentsTarjan compute(final ArcLabelledImmutableGraph graph, final LabelledArcFilter filter, final boolean computeBuckets, final ProgressLogger pl) {
+ final long n = graph.numNodes();
+ FilteredVisit filteredVisit = new FilteredVisit(graph, filter, LongBigArrays.newBigArray(n), computeBuckets ? LongArrayBitVector.ofLength(n) : null, pl);
+ filteredVisit.run();
+ return new StronglyConnectedComponentsTarjan(filteredVisit.numberOfComponents, filteredVisit.status, filteredVisit.buckets);
+ }
+
+
+ /** Returns the size big array for this set of strongly connected components.
+ *
+ * @return the size big array for this set of strongly connected components.
+ */
+ public long[][] computeSizes() {
+ final long[][] size = LongBigArrays.newBigArray(numberOfComponents);
+ for(int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int d = t.length; d-- != 0;) LongBigArrays.incr(size, t[d]);
+ }
+ return size;
+ }
+
+ /** Renumbers by decreasing size the components of this set.
+ *
+ * <p>After a call to this method, both the internal status of this class and the argument
+ * array are permuted so that the sizes of strongly connected components are decreasing
+ * in the component index.
+ *
+ * @param size the components sizes, as returned by {@link #computeSizes()}.
+ */
+ public void sortBySize(final long[][] size) {
+ final long[][] perm = Util.identity(LongBigArrays.length(size));
+ LongBigArrays.quickSort(perm, 0, LongBigArrays.length(perm), (x, y) -> Long.compare(LongBigArrays.get(size, y), LongBigArrays.get(size, x)));
+ final long[][] copy = LongBigArrays.copy(size);
+
+ for(int i = size.length; i-- != 0;) {
+ final long[] t = size[i];
+ final long[] u = perm[i];
+ for(int d = t.length; d-- != 0;) t[d] = LongBigArrays.get(copy, u[d]);
+ }
+ Util.invertPermutationInPlace(perm);
+
+ for(int i = component.length; i-- != 0;) {
+ final long[] t = component[i];
+ for(int d = t.length; d-- != 0;) t[d] = LongBigArrays.get(perm, t[d]);
+ }
+ }
+
+ public static void main(String arg[]) throws IOException, JSAPException {
+ SimpleJSAP jsap = new SimpleJSAP(StronglyConnectedComponentsTarjan.class.getName(),
+ "Computes the strongly connected components (and optionally the buckets) of a graph of given basename. The resulting data is saved " +
+ "in files stemmed from the given basename with extension .scc (a list of binary integers specifying the " +
+ "component of each node), .sccsizes (a list of binary integer specifying the size of each component) and .buckets " +
+ " (a serialised BitSet specifying buckets). Please use suitable JVM options to set a large stack size.",
+ new Parameter[] {
+ new Switch("sizes", 's', "sizes", "Compute component sizes."),
+ new Switch("renumber", 'r', "renumber", "Renumber components in decreasing-size order."),
+ new Switch("buckets", 'b', "buckets", "Compute buckets (nodes belonging to a bucket component, i.e., a terminal nondangling component)."),
+ new FlaggedOption("filter", new ObjectParser(LabelledArcFilter.class, GraphClassParser.PACKAGE), JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, 'f', "filter", "A filter for labelled arcs; requires the provided graph to be arc labelled."),
+ new FlaggedOption("logInterval", JSAP.LONG_PARSER, Long.toString(ProgressLogger.DEFAULT_LOG_INTERVAL), JSAP.NOT_REQUIRED, 'l', "log-interval", "The minimum time interval between activity logs in milliseconds."),
+ new UnflaggedOption("basename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.REQUIRED, JSAP.NOT_GREEDY, "The basename of the graph."),
+ new UnflaggedOption("resultsBasename", JSAP.STRING_PARSER, JSAP.NO_DEFAULT, JSAP.NOT_REQUIRED, JSAP.NOT_GREEDY, "The basename of the resulting files."),
+ }
+ );
+
+ JSAPResult jsapResult = jsap.parse(arg);
+ if (jsap.messagePrinted()) System.exit(1);
+
+ final String basename = jsapResult.getString("basename");
+ final String resultsBasename = jsapResult.getString("resultsBasename", basename);
+ final LabelledArcFilter filter = (LabelledArcFilter)jsapResult.getObject("filter");
+ ProgressLogger pl = new ProgressLogger(LOGGER, jsapResult.getLong("logInterval"), TimeUnit.MILLISECONDS);
+
+ final StronglyConnectedComponentsTarjan components =
+ filter != null ? StronglyConnectedComponentsTarjan.compute(ArcLabelledImmutableGraph.load(basename), filter, jsapResult.getBoolean("buckets"), pl)
+ : StronglyConnectedComponentsTarjan.compute(ImmutableGraph.load(basename), jsapResult.getBoolean("buckets"), pl);
+
+ if (jsapResult.getBoolean("sizes") || jsapResult.getBoolean("renumber")) {
+ final long[][] size = components.computeSizes();
+ if (jsapResult.getBoolean("renumber")) components.sortBySize(size);
+ if (jsapResult.getBoolean("sizes")) BinIO.storeLongs(size, resultsBasename + ".sccsizes");
+ }
+ BinIO.storeLongs(components.component, resultsBasename + ".scc");
+ if (components.buckets != null) BinIO.storeObject(components.buckets, resultsBasename + ".buckets");
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponentsTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponentsTest.java
new file mode 100644
index 0000000..e94a621
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/algo/StronglyConnectedComponentsTest.java
@@ -0,0 +1,121 @@
+package it.unimi.dsi.big.webgraph.algo;
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.bits.LongArrayBitVector;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongOpenHashSet;
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+import it.unimi.dsi.webgraph.examples.ErdosRenyiGraph;
+
+import org.junit.Test;
+
+public class StronglyConnectedComponentsTest extends WebGraphTestCase {
+
+ public static void sameComponents(final long l, final StronglyConnectedComponentsTarjan componentsRecursive, final StronglyConnectedComponents componentsIterative) {
+ final LongOpenHashSet[] recursiveComponentsSet = new LongOpenHashSet[(int)componentsRecursive.numberOfComponents];
+ final LongOpenHashSet[] iterativeComponentsSet = new LongOpenHashSet[(int)componentsIterative.numberOfComponents];
+
+ for(int i = recursiveComponentsSet.length; i-- != 0;) {
+ recursiveComponentsSet[i] = new LongOpenHashSet();
+ iterativeComponentsSet[i] = new LongOpenHashSet();
+ }
+
+ for(long i = l; i-- != 0;) {
+ recursiveComponentsSet[(int)LongBigArrays.get(componentsRecursive.component, i)].add(i);
+ iterativeComponentsSet[(int)LongBigArrays.get(componentsIterative.component, i)].add(i);
+ }
+
+ assertEquals(new ObjectOpenHashSet<>(recursiveComponentsSet), new ObjectOpenHashSet<>(iterativeComponentsSet));
+ }
+
+ @Test
+ public void testBuckets() {
+ final ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(9,
+ new int[][] { { 0, 0 }, { 1, 0 }, { 1, 2 },
+ { 2, 1 }, { 2, 3 }, { 2, 4 }, { 2, 5 },
+ { 3, 4 }, { 4, 3 },
+ { 5, 5 }, { 5, 6 }, { 5, 7 }, { 5, 8 },
+ { 6, 7 },
+ { 8, 7 } }
+ ).immutableView());
+
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(g, true, null);
+
+ LongArrayBitVector buckets = LongArrayBitVector.ofLength(g.numNodes());
+ buckets.set(0, true);
+ buckets.set(3, true);
+ buckets.set(4, true);
+ assertEquals(buckets, components.buckets);
+ assertEquals(3, buckets.count());
+
+ final long[][] size = components.computeSizes();
+ components.sortBySize(size);
+
+ assertEquals(2, LongBigArrays.get(size, 0));
+ assertEquals(2, LongBigArrays.get(size, 1));
+ assertEquals(1, LongBigArrays.get(size, 2));
+ assertEquals(1, LongBigArrays.get(size, 3));
+ assertEquals(1, LongBigArrays.get(size, 4));
+ assertEquals(1, LongBigArrays.get(size, 5));
+ assertEquals(1, LongBigArrays.get(size, 6));
+
+ StronglyConnectedComponents.compute(g, false, null); // To increase coverage
+ }
+
+ @Test
+ public void testBuckets2() {
+ final ImmutableGraph g = ImmutableGraph.wrap(new ArrayListMutableGraph(4,
+ new int[][] { { 0, 1 }, { 1, 2 }, { 2, 0 }, { 1, 3 }, { 3, 3 } }
+ ).immutableView());
+
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(g, true, null);
+
+ LongArrayBitVector buckets = LongArrayBitVector.ofLength(g.numNodes());
+ buckets.set(3);
+ assertEquals(buckets, components.buckets);
+ assertEquals(1, buckets.count());
+ }
+
+
+ @Test
+ public void testCompleteGraph() {
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(5, false).immutableView()), true, null);
+ assertEquals(5, components.buckets.count());
+ for(int i = 5; i-- != 0;) assertEquals(0, LongBigArrays.get(components.component, i));
+ assertEquals(5, components.computeSizes()[0][0]);
+ }
+
+ @Test
+ public void testNoBuckets() {
+ StronglyConnectedComponentsTarjan.compute(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(5, false).immutableView()), false, null);
+ }
+
+ @Test
+ public void testWithProgressLogger() {
+ StronglyConnectedComponentsTarjan.compute(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(5, false).immutableView()), true, new ProgressLogger());
+ }
+
+ @Test
+ public void testTree() {
+ StronglyConnectedComponents components = StronglyConnectedComponents.compute(ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryIntree(3).immutableView()), true, null);
+ assertEquals(0, components.buckets.count());
+ assertEquals(15, components.numberOfComponents);
+ }
+
+ @Test
+ public void testErdosRrenyi() {
+ for(int size: new int[] { 10, 100, 1000 }) {
+ for(int attempt = 0; attempt < 5; attempt++) {
+ final ImmutableGraph view = ImmutableGraph.wrap(new ArrayListMutableGraph(new ErdosRenyiGraph(size, .05, attempt + 1, false)).immutableView());
+ final StronglyConnectedComponentsTarjan componentsRecursive = StronglyConnectedComponentsTarjan.compute(view, true, null);
+ final StronglyConnectedComponents componentsIterative = StronglyConnectedComponents.compute(view, true, null);
+ assertEquals(componentsRecursive.numberOfComponents, componentsIterative.numberOfComponents);
+ sameComponents(size, componentsRecursive, componentsIterative);
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/examples/IntegerTriplesArcLabelledImmutableGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/examples/IntegerTriplesArcLabelledImmutableGraphTest.java
new file mode 100644
index 0000000..86b33e4
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/examples/IntegerTriplesArcLabelledImmutableGraphTest.java
@@ -0,0 +1,29 @@
+package it.unimi.dsi.big.webgraph.examples;
+
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+
+import org.junit.Test;
+
+public class IntegerTriplesArcLabelledImmutableGraphTest extends WebGraphTestCase {
+
+ @Test
+ public void testEmpty() {
+ ImmutableGraph g = new IntegerTriplesArcLabelledImmutableGraph(new int[][] {});
+
+ assertGraph(g);
+ }
+
+ @Test
+ public void testCycle() {
+ ImmutableGraph g = new IntegerTriplesArcLabelledImmutableGraph(new int[][] {
+ { 0, 1, 2 },
+ { 1, 2, 0 },
+ { 2, 0, 1 },
+
+ });
+
+ assertGraph(g);
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/BitStreamArcLabelledGraphTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/BitStreamArcLabelledGraphTest.java
new file mode 100644
index 0000000..9f6c048
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/BitStreamArcLabelledGraphTest.java
@@ -0,0 +1,345 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2003-2017 Paolo Boldi and Sebastiano Vigna
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.BVGraphTest;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterator;
+import it.unimi.dsi.big.webgraph.LazyLongIterators;
+import it.unimi.dsi.big.webgraph.NodeIterator;
+import it.unimi.dsi.big.webgraph.Transform;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.File;
+import java.io.FileNotFoundException;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+
+import org.junit.Test;
+
+public class BitStreamArcLabelledGraphTest extends WebGraphTestCase {
+
+ private static final int[] SIZES = { 0, 1, 2, 3, 4, 7 };
+ private static final int MAX_WIDTH_FOR_FIXED = 32;
+ private static final int[] WIDTHS = { -1, 0, 1, 2, 3, 8, 32, 40, 41, 63 };
+ private static final int[] BATCH_SIZES = { 1, 2, 4, 5, 16 };
+
+ public static File storeTempGraph(final ArcLabelledImmutableGraph g) throws IOException, IllegalArgumentException, SecurityException {
+ File basename = File.createTempFile(BitStreamArcLabelledGraphTest.class.getSimpleName(), "test");
+ BitStreamArcLabelledImmutableGraph.store(g, basename.toString(), basename.toString() + "-underlying");
+ BVGraph.store(g, basename.toString() + "-underlying");
+ return basename;
+ }
+
+ private static OutputBitStream createTempBitStream(final String name) throws FileNotFoundException {
+ File f = new File(name);
+ f.deleteOnExit();
+ return new OutputBitStream(f.getAbsolutePath());
+ }
+
+ public String createGraphWithFixedWidthLabels(File basename, ImmutableGraph g, int width) throws IllegalArgumentException, SecurityException, IOException {
+ final int n = (int)g.numNodes();
+ System.err.println("Testing " + n + " nodes, width " + width+ ", basename " + basename);
+
+ OutputBitStream labels = createTempBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = createTempBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(LongIterator j = LazyLongIterators.eager(g.successors(i)); j.hasNext();) bits += labels.writeInt(i * (int)j.nextLong() + i, width);
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-fixedlabel.properties"));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + FixedWidthIntLabel.class.getName() + "(TEST," + width + ")");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-fixedlabel";
+ }
+
+ public String createGraphWithFixedWidthListLabels(File basename, ImmutableGraph g, int width) throws IllegalArgumentException, SecurityException, IOException {
+ final int n = (int)g.numNodes();
+ System.err.println("Testing " + n + " nodes, element width " + width+ ", basename " + basename);
+
+ OutputBitStream labels = createTempBitStream(basename + "-fixedlistlabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = createTempBitStream(basename + "-fixedlistlabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(LongIterator j = LazyLongIterators.eager(g.successors(i)); j.hasNext();) {
+ int succ = (int)j.nextLong();
+ bits += labels.writeGamma((succ + 1) * 2); // list length
+ for(int k = 0; k < (succ + 1) * 2 ; k++) bits += labels.writeInt(i * k + i, width);
+ }
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-fixedlistlabel" + ImmutableGraph.PROPERTIES_EXTENSION));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + FixedWidthIntListLabel.class.getName() + "(TEST," + width + ")");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-fixedlistlabel";
+ }
+
+ public String createGraphWithGammaLabels(File basename, ImmutableGraph g) throws IllegalArgumentException, SecurityException, IOException {
+ // We create a complete graph with labels
+ final int n = (int)g.numNodes();
+ System.err.println("Testing " + n + " nodes, gamma coding, basename " + basename);
+
+ OutputBitStream labels = createTempBitStream(basename + "-gammalabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = createTempBitStream(basename + "-gammalabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(LongIterator j = LazyLongIterators.eager(g.successors(i)); j.hasNext();) bits += labels.writeGamma(i * (int)j.nextLong() + i);
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-gammalabel" + ImmutableGraph.PROPERTIES_EXTENSION));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + GammaCodedIntLabel.class.getName() + "(TEST)");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-gammalabel";
+ }
+
+ public void testLabels(ArcLabelledImmutableGraph alg, final int width) {
+
+ final int mask = (int)(width == MAX_WIDTH_FOR_FIXED ? -1 : (1L << width) - 1);
+
+ // Sequential access, iterators
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = (int)nodeIterator.nextLong();
+ ArcLabelledNodeIterator.LabelledArcIterator l = nodeIterator.successors();
+ int d = (int)nodeIterator.outdegree();
+ while(d-- != 0) {
+ int succ = (int)l.nextLong();
+ if (l.label() instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + succ,(curr * succ + curr) & mask, l.label().getInt());
+ else {
+ int[] value = (int[]) l.label().get();
+ assertEquals((succ + 1) * 2, value.length);
+ for(int i = 0; i < value.length; i++) assertEquals("Successor of index " + i + " of " + curr + "(" + succ + ")", (curr * i + curr) & mask, value[i]);
+ }
+ }
+ }
+
+ // Sequential access, arrays
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ long curr = nodeIterator.nextLong();
+ long d = nodeIterator.outdegree();
+ long[][] succ = nodeIterator.successorBigArray();
+ Label[][] label = nodeIterator.labelBigArray();
+ for(int i = 0; i < d; i++) {
+ if (ObjectBigArrays.get(label, i) instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + LongBigArrays.get(succ, i), (curr * LongBigArrays.get(succ, i) + curr) & mask, ObjectBigArrays.get(label, i).getInt());
+ else {
+ int[] value = (int[]) ObjectBigArrays.get(label, i).get();
+ assertEquals((LongBigArrays.get(succ, i) + 1) * 2, value.length);
+ for(int j = 0; j < value.length; j++) assertEquals((curr * j + curr) & mask, value[j]);
+ }
+ }
+ }
+
+ if (! alg.randomAccess()) return;
+
+ // Random access, iterators
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ ArcLabelledNodeIterator.LabelledArcIterator l = alg.successors(curr);
+ int d = (int)alg.outdegree(curr);
+ while(d-- != 0) {
+ int succ = (int)l.nextLong();
+ if (l.label() instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + succ ,(curr * succ + curr) & mask, l.label().getInt());
+ else {
+ int[] value = (int[]) l.label().get();
+ assertEquals((succ + 1) * 2, value.length);
+ for(int i = 0; i < value.length; i++) assertEquals((curr * i + curr) & mask, value[i]);
+ }
+ }
+ }
+
+ // Random access, arrays
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ int d = (int)alg.outdegree(curr);
+ long[][] succ = alg.successorBigArray(curr);
+ Label[][] label = alg.labelBigArray(curr);
+ for(int i = 0; i < d; i++) {
+ if (ObjectBigArrays.get(label, i) instanceof AbstractIntLabel)
+ assertEquals(curr + " -> " + LongBigArrays.get(succ, i), (curr * LongBigArrays.get(succ, i) + curr) & mask, ObjectBigArrays.get(label, i).getInt());
+ else {
+ int[] value = (int[]) ObjectBigArrays.get(label, i).get();
+ assertEquals((LongBigArrays.get(succ, i) + 1) * 2, value.length);
+ for(int j = 0; j < value.length; j++) assertEquals((curr * j + curr) & mask, value[j]);
+ }
+ }
+ }
+ }
+
+ @Test
+ public void testLabels() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n: SIZES) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing type " + type + "...");
+ final ImmutableGraph g = ImmutableGraph.wrap(type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView());
+ final File basename = BVGraphTest.storeTempGraph(g);
+ // -1 means gamma coding
+ for(int width: WIDTHS) {
+ final String basenameLabel = width == -1 ?
+ createGraphWithGammaLabels(basename, g) :
+ width < MAX_WIDTH_FOR_FIXED ? createGraphWithFixedWidthLabels(basename, g, width) :
+ createGraphWithFixedWidthListLabels(basename, g, width - MAX_WIDTH_FOR_FIXED);
+
+ System.err.println("Testing offline...");
+ testLabels(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel), width % MAX_WIDTH_FOR_FIXED);
+ System.err.println("Testing standard...");
+ testLabels(BitStreamArcLabelledImmutableGraph.load(basenameLabel), width % MAX_WIDTH_FOR_FIXED);
+
+ new File(basenameLabel + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+
+ @Test
+ // Proceeds with the same test as before, but with a graph obtained as a union
+ public void testUnion() throws IllegalArgumentException, SecurityException, IOException {
+ for(int n: SIZES) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing arc-labelled union type " + type + "...");
+ final ImmutableGraph g = ImmutableGraph.wrap(type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView());
+
+ // Now split the graph g into two (possibly non-disjoint) graphs
+ ArrayListMutableGraph g0mut = new ArrayListMutableGraph();
+ ArrayListMutableGraph g1mut = new ArrayListMutableGraph();
+ g0mut.addNodes((int)g.numNodes()); g1mut.addNodes((int)g.numNodes());
+ NodeIterator nit = g.nodeIterator();
+ while (nit.hasNext()) {
+ int from = (int)nit.nextLong();
+ LazyLongIterator succ = nit.successors();
+ int d = (int)nit.outdegree();
+ while (d-- != 0) {
+ int to = (int)succ.nextLong();
+ if (Math.random() < .5) g0mut.addArc(from, to);
+ else if (Math.random() < .5) g1mut.addArc(from, to);
+ else { g0mut.addArc(from, to); g1mut.addArc(from, to); }
+ }
+ }
+ ImmutableGraph g0 = ImmutableGraph.wrap(g0mut.immutableView());
+ ImmutableGraph g1 = ImmutableGraph.wrap(g1mut.immutableView());
+
+ final File basename0 = BVGraphTest.storeTempGraph(g0);
+ final File basename1 = BVGraphTest.storeTempGraph(g1);
+ // -1 means gamma coding
+ for(int width: WIDTHS) {
+ final String basenameLabel0 = width == -1 ?
+ createGraphWithGammaLabels(basename0, g0) :
+ width < MAX_WIDTH_FOR_FIXED ? createGraphWithFixedWidthLabels(basename0, g0, width) :
+ createGraphWithFixedWidthListLabels(basename0, g0, width - MAX_WIDTH_FOR_FIXED);
+ final String basenameLabel1 = width == -1 ?
+ createGraphWithGammaLabels(basename1, g1) :
+ width < MAX_WIDTH_FOR_FIXED ? createGraphWithFixedWidthLabels(basename1, g1, width) :
+ createGraphWithFixedWidthListLabels(basename1, g1, width - MAX_WIDTH_FOR_FIXED);
+
+
+ System.err.println("Testing arc-labelled union offline...");
+ testLabels((ArcLabelledImmutableGraph) Transform.union(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel0), BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel1)), width % MAX_WIDTH_FOR_FIXED);
+ System.err.println("Testing arc-labelled union standard...");
+ testLabels((ArcLabelledImmutableGraph) Transform.union(BitStreamArcLabelledImmutableGraph.load(basenameLabel0), BitStreamArcLabelledImmutableGraph.load(basenameLabel1)), width % MAX_WIDTH_FOR_FIXED);
+
+ new File(basenameLabel0 + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel0 + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel0 + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ new File(basenameLabel1 + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel1 + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel1 + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename0.delete();
+ deleteGraph(basename0);
+ basename1.delete();
+ deleteGraph(basename1);
+ }
+ }
+ }
+
+ @Test
+ public void testTransposition() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n: new int[] {7}) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing arc-labelled transposition type " + type + "...");
+ final ImmutableGraph g = ImmutableGraph.wrap(type == 0 ? ArrayListMutableGraph.newCompleteGraph(n, false).immutableView() :
+ type == 1 ? ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView() :
+ ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView());
+ final File basename = BVGraphTest.storeTempGraph(g);
+ // -1 means gamma coding
+ for(int width: WIDTHS) {
+ final String basenameLabel;
+
+ if (width == -1) basenameLabel = createGraphWithGammaLabels(basename, g);
+ else if (width < MAX_WIDTH_FOR_FIXED) basenameLabel = createGraphWithFixedWidthLabels(basename, g, width);
+ else basenameLabel = createGraphWithFixedWidthListLabels(basename, g, width - MAX_WIDTH_FOR_FIXED);
+
+ for (int batchSize: BATCH_SIZES) {
+ ArcLabelledImmutableGraph gt = Transform.transposeOffline(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel),
+ batchSize, new File(System.getProperty("java.io.tmpdir")), null);
+
+ ArcLabelledImmutableGraph gtt = Transform.transposeOffline(gt,
+ batchSize, new File(System.getProperty("java.io.tmpdir")), null);
+ System.err.println("Testing with batch size " + batchSize + "...");
+ testLabels(gtt, width % MAX_WIDTH_FOR_FIXED);
+ }
+
+ new File(basenameLabel + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/CSSerializationTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/CSSerializationTest.java
new file mode 100644
index 0000000..2e7560b
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/CSSerializationTest.java
@@ -0,0 +1,147 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.BVGraphTest;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.LazyLongIterators;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.fastutil.longs.LongBigArrays;
+import it.unimi.dsi.fastutil.longs.LongIterator;
+import it.unimi.dsi.fastutil.objects.ObjectBigArrays;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.File;
+import java.io.FileWriter;
+import java.io.IOException;
+import java.io.PrintWriter;
+
+import org.junit.Test;
+
+public class CSSerializationTest extends WebGraphTestCase {
+
+ private static final int[] SIZES = { 0, 1, 2, 3, 4 };
+ private static final int[] WIDTHS = { 20, 21, 30, 31 };
+
+ public String createGraph(File basename, ImmutableGraph g, int width) throws IllegalArgumentException, SecurityException, IOException {
+ final int n = (int)g.numNodes();
+ System.err.println("Testing " + n + " nodes, width " + width+ ", basename " + basename);
+
+ OutputBitStream labels = new OutputBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION);
+ OutputBitStream offsets = new OutputBitStream(basename + "-fixedlabel" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION);
+ Label lab;
+ offsets.writeGamma(0);
+ for(int i = 0; i < n; i++) {
+ int bits = 0;
+ for(LongIterator j = LazyLongIterators.eager(g.successors(i)); j.hasNext();) {
+ int succ = (int)j.nextLong();
+ lab = new FakeCSFixedWidthIntLabel("TEST", width, i * succ + i);
+ bits += lab.toBitStream(labels, i);
+ }
+ offsets.writeGamma(bits);
+ }
+ labels.close();
+ offsets.close();
+
+ PrintWriter pw = new PrintWriter(new FileWriter(basename + "-fixedlabel" + ImmutableGraph.PROPERTIES_EXTENSION));
+ pw.println(ImmutableGraph.GRAPHCLASS_PROPERTY_KEY + " = " + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + " = " + FakeCSFixedWidthIntLabel.class.getName() + "(TEST," + width + ")");
+ pw.println(ArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + " = " + basename.getName());
+ pw.close();
+
+ return basename + "-fixedlabel";
+ }
+
+ public void testLabels(ArcLabelledImmutableGraph alg, final int width) {
+
+ final int mask = (int)((1L << width) - 1);
+
+ // Sequential access, iterators
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = (int)nodeIterator.nextLong();
+ ArcLabelledNodeIterator.LabelledArcIterator l = nodeIterator.successors();
+ int d = (int)nodeIterator.outdegree();
+ while(d-- != 0) {
+ int succ = (int)l.nextLong();
+ assertEquals(curr + " -> " + succ,(curr * succ + curr) & mask, l.label().getInt());
+ }
+ }
+
+ // Sequential access, arrays
+ for(ArcLabelledNodeIterator nodeIterator = alg.nodeIterator(); nodeIterator.hasNext();) {
+ int curr = (int)nodeIterator.nextLong();
+ int d = (int)nodeIterator.outdegree();
+ long[][] succ = nodeIterator.successorBigArray();
+ Label[][] label = nodeIterator.labelBigArray();
+ for(int i = 0; i < d; i++)
+ assertEquals(curr + " -> " + LongBigArrays.get(succ, i), (curr * LongBigArrays.get(succ, i) + curr) & mask, ObjectBigArrays.get(label, i).getInt());
+ }
+
+ if (! alg.randomAccess()) return;
+
+ // Random access, iterators
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ ArcLabelledNodeIterator.LabelledArcIterator l = alg.successors(curr);
+ int d = (int)alg.outdegree(curr);
+ while(d-- != 0) {
+ int succ = (int)l.nextLong();
+ assertEquals(curr + " -> " + succ ,(curr * succ + curr) & mask, l.label().getInt());
+ }
+ }
+
+ // Random access, arrays
+ for(int curr = 0; curr < alg.numNodes(); curr++) {
+ int d = (int)alg.outdegree(curr);
+ long[][] succ = alg.successorBigArray(curr);
+ Label[][] label = alg.labelBigArray(curr);
+ for(int i = 0; i < d; i++) {
+ assertEquals(curr + " -> " + LongBigArrays.get(succ, i), (curr * LongBigArrays.get(succ, i) + curr) & mask, ObjectBigArrays.get(label, i).getInt());
+ }
+ }
+ }
+
+ @Test
+ public void testLabels() throws IOException, IllegalArgumentException, SecurityException {
+ for(int n: SIZES) {
+ for(int type = 0; type < 3; type++) {
+ System.err.println("Testing type " + type + "...");
+ final ImmutableGraph g = type == 0 ? ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteGraph(n, false).immutableView()) :
+ type == 1 ? ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryIntree(n).immutableView()) :
+ ImmutableGraph.wrap(ArrayListMutableGraph.newCompleteBinaryOuttree(n).immutableView());
+ final File basename = BVGraphTest.storeTempGraph(g);
+ for(int width: WIDTHS) {
+ final String basenameLabel = createGraph(basename, g, width);
+
+ System.err.println("Testing offline...");
+ testLabels(BitStreamArcLabelledImmutableGraph.loadOffline(basenameLabel), width);
+ System.err.println("Testing standard...");
+ testLabels(BitStreamArcLabelledImmutableGraph.load(basenameLabel), width);
+
+ new File(basenameLabel + ImmutableGraph.PROPERTIES_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).delete();
+ new File(basenameLabel + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).delete();
+ }
+ basename.delete();
+ deleteGraph(basename);
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/FakeCSFixedWidthIntLabel.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/FakeCSFixedWidthIntLabel.java
new file mode 100644
index 0000000..c7f1505
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/FakeCSFixedWidthIntLabel.java
@@ -0,0 +1,102 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+
+import it.unimi.dsi.io.InputBitStream;
+import it.unimi.dsi.io.OutputBitStream;
+
+import java.io.IOException;
+
+/** An integer represented in fixed width, that fakely provides context sensitivity:
+ * when storing label <var>v</var> onto the arc (<var>x</var>,<var>y</var>),
+ * the value <var>v</var>*(<var>x</var>+1) is stored instead. The provided width must
+ * be smaller than 32.
+ */
+
+public class FakeCSFixedWidthIntLabel extends AbstractIntLabel {
+ /** The bit width used to represent the value of this label. */
+ private final int width;
+
+ /** Creates a new fixed-width int label.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ * @param value the value of this label.
+ */
+ public FakeCSFixedWidthIntLabel(String key, int width, int value) {
+ super(key, value);
+ if (width < 0 || width > 31) throw new IllegalArgumentException("Width out of range: " + width);
+ if (value < 0 || value >= 1L << width) throw new IllegalArgumentException("Value out of range: " + Integer.toString(value));
+ this.width = width;
+ }
+
+ /** Creates a new fixed-width int label of value 0.
+ *
+ * @param key the (only) key of this label.
+ * @param width the label width (in bits).
+ */
+ public FakeCSFixedWidthIntLabel(String key, int width) {
+ this(key, width, 0);
+ }
+
+ /** Creates a new fixed-width integer label using the given key and width
+ * with value 0.
+ *
+ * @param arg two strings containing the key and the width of this label.
+ */
+ public FakeCSFixedWidthIntLabel(String... arg) {
+ this(arg[0], Integer.parseInt(arg[1]));
+ }
+
+ @Override
+ public Label copy() {
+ return new FakeCSFixedWidthIntLabel(key, width, value);
+ }
+
+ /** Returns the width of this label (as provided at construction time).
+ * @return the width of this label.
+ */
+ @Override
+ public int fixedWidth() {
+ return width;
+ }
+
+ @Override
+ public String toString() {
+ return key + ":" + value + " (width:" + width + ")";
+ }
+
+ @Override
+ public int fromBitStream(InputBitStream inputBitStream, long source) throws IOException, UnsupportedOperationException {
+ int v = inputBitStream.readInt(width);
+ value = (int)(v / (source + 1));
+ return width;
+ }
+
+ @Override
+ public int toBitStream(OutputBitStream outputBitStream, long source) throws IOException, UnsupportedOperationException {
+ return outputBitStream.writeInt((int)(source + 1) * value, width);
+ }
+
+ @Override
+ public String toSpec() {
+ return this.getClass().getName() + "(" + key + "," + width + ")";
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/MoreLabelledTransformTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/MoreLabelledTransformTest.java
new file mode 100644
index 0000000..a35a16c
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/MoreLabelledTransformTest.java
@@ -0,0 +1,147 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.BVGraph;
+import it.unimi.dsi.big.webgraph.ImmutableGraph;
+import it.unimi.dsi.big.webgraph.Transform;
+import it.unimi.dsi.big.webgraph.labelling.ArcLabelledNodeIterator.LabelledArcIterator;
+import it.unimi.dsi.io.OutputBitStream;
+import it.unimi.dsi.logging.ProgressLogger;
+import it.unimi.dsi.webgraph.ArrayListMutableGraph;
+
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.PrintWriter;
+
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class MoreLabelledTransformTest {
+
+ private static final Logger LOGGER = LoggerFactory.getLogger(MoreLabelledTransformTest.class);
+
+ @Test
+ public void testTransform() throws IOException, IllegalArgumentException, SecurityException {
+ File f = File.createTempFile("test", "transform");
+ f.delete();
+ f.mkdir();
+ f.deleteOnExit();
+ System.out.println(f);
+ ProgressLogger pl = new ProgressLogger(LOGGER);
+ pl.logInterval = 1;
+
+ // Creates an arc-labelled graph
+ int[][] arcs;
+ ArrayListMutableGraph under = new ArrayListMutableGraph(6, arcs = new int[][] {
+ { 0, 3 }, { 1, 3 }, { 1, 4 }, { 2, 4 }, { 5, 4 }
+ });
+ BVGraph.store(ImmutableGraph.wrap(under.immutableView()), new File(f, "original" + BitStreamArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX).toString());
+ OutputBitStream obs = new OutputBitStream(new File(f, "original" + BitStreamArcLabelledImmutableGraph.LABELS_EXTENSION).toString());
+ OutputBitStream labobs = new OutputBitStream(new FileOutputStream(new File(f, "original" + BitStreamArcLabelledImmutableGraph.LABEL_OFFSETS_EXTENSION).toString()));
+ long prev = 0;
+ int curr = -1;
+ for (int[] arc: arcs) {
+ while (arc[0] != curr) {
+ labobs.writeGamma((int)(obs.writtenBits() - prev));
+ prev = obs.writtenBits();
+ curr++;
+ }
+ new FixedWidthIntLabel("fake", 8, arc[0] * arc[1]).toBitStream(obs, arc[0]);
+ }
+ labobs.writeGamma((int)(obs.writtenBits() - prev));
+ obs.close();
+ labobs.close();
+ String graphBasename = new File(f, "original").toString();
+ PrintWriter pw = new PrintWriter(graphBasename + ArcLabelledImmutableGraph.PROPERTIES_EXTENSION);
+ pw.println(BitStreamArcLabelledImmutableGraph.UNDERLYINGGRAPH_PROPERTY_KEY + "=original" + BitStreamArcLabelledImmutableGraph.UNDERLYINGGRAPH_SUFFIX);
+ pw.println(ArcLabelledImmutableGraph.GRAPHCLASS_PROPERTY_KEY + "=" + BitStreamArcLabelledImmutableGraph.class.getName());
+ pw.println(BitStreamArcLabelledImmutableGraph.LABELSPEC_PROPERTY_KEY + "=" + FixedWidthIntLabel.class.getName() + "(fake,8,0)");
+ pw.close();
+
+ // We transpose it
+ ArcLabelledImmutableGraph graph = ArcLabelledImmutableGraph.load(graphBasename, pl);
+ ArcLabelledImmutableGraph gT = Transform.transposeOffline(graph, 50, null, null);
+ String baseNameT = graphBasename + "t";
+ BVGraph.store(gT, baseNameT + "-underlying");
+ BitStreamArcLabelledImmutableGraph.store(gT, baseNameT, baseNameT + "-underlying");
+
+ // We reload the transpose
+ gT = ArcLabelledImmutableGraph.load(baseNameT, pl);
+
+ // We merge it with the original one
+ LabelMergeStrategy mergeStrategy = null;
+ ArcLabelledImmutableGraph gU = Transform.union(graph, gT, mergeStrategy);
+ String baseNameU = graphBasename + "u";
+ BVGraph.store(gU, baseNameU + "-underlying");
+ BitStreamArcLabelledImmutableGraph.store(gU, baseNameU, baseNameU + "-underlying");
+
+ // We reload it
+ gU = BitStreamArcLabelledImmutableGraph.load(baseNameU, pl);
+ System.out.println(gU);
+
+ // Here is what we expect to find
+ int[][] expectedSuccessors = new int[][] {
+ { 3 }, // successors of 0
+ { 3, 4 }, // successors of 1
+ { 4 }, // successors of 2
+ { 0, 1 }, // successors of 3
+ { 1, 2, 5 }, // successors of 4
+ { 4 }, // successors of 5
+ };
+ int[][] expectedLabels = new int[][] {
+ { 0 }, // successors of 0
+ { 3, 4 }, // successors of 1
+ { 8 }, // successors of 2
+ { 0, 3 }, // successors of 3
+ { 4, 8, 20 }, // successors of 4
+ { 20 }, // successors of 5
+ };
+ ArcLabelledNodeIterator nit = gU.nodeIterator();
+ while (nit.hasNext()) {
+ int node = (int)nit.nextLong();
+ assertEquals(expectedSuccessors[node].length, nit.outdegree());
+ LabelledArcIterator ait = nit.successors();
+ int d = (int)nit.outdegree();
+ int k = 0;
+ while (d-- != 0) {
+ assertEquals(expectedSuccessors[node][k], ait.nextLong());
+ assertEquals(expectedLabels[node][k], ait.label().getInt());
+ k++;
+ }
+ }
+
+ // Same test, but with iterators requested randomly
+ for (int node = (int)(gU.numNodes() - 1); node >= 0; node--) {
+ LabelledArcIterator ait = gU.successors(node);
+ assertEquals(expectedSuccessors[node].length, gU.outdegree(node));
+ int k = 0;
+ int d = (int)gU.outdegree(node);
+ while (d-- != 0) {
+ assertEquals(expectedSuccessors[node][k], ait.nextLong());
+ assertEquals(expectedLabels[node][k], ait.label().getInt());
+ k++;
+ }
+ }
+ }
+}
diff --git a/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/RelabellingTest.java b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/RelabellingTest.java
new file mode 100644
index 0000000..8995283
--- /dev/null
+++ b/third_party/webgraph-big-3.5.0/test/it/unimi/dsi/big/webgraph/labelling/RelabellingTest.java
@@ -0,0 +1,70 @@
+package it.unimi.dsi.big.webgraph.labelling;
+
+/*
+ * Copyright (C) 2007-2017 Paolo Boldi
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 3 of the License, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+ * for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+import static org.junit.Assert.assertEquals;
+import it.unimi.dsi.big.webgraph.WebGraphTestCase;
+import it.unimi.dsi.big.webgraph.examples.IntegerTriplesArcLabelledImmutableGraph;
+
+import org.junit.Test;
+
+public class RelabellingTest extends WebGraphTestCase {
+
+ @Test
+ public void testIntRelabelling() {
+ // Take a graph and convert from gamma to fixed-width
+ ArcLabelledImmutableGraph gorig = new IntegerTriplesArcLabelledImmutableGraph(new int[][]
+ {
+ { 0, 1, 203 }, { 0, 2, 104 }, { 1, 3, 102 }
+ });
+ ArcLabelledImmutableGraph gfixed = new ArcRelabelledImmutableGraph(gorig, new FixedWidthIntLabel("FOO", 15), ArcRelabelledImmutableGraph.INT_LABEL_CONVERSION_STRATEGY);
+ assertGraph(gorig);
+ assertGraph(gfixed);
+ assertEquals(gorig, gfixed);
+
+ // Convert its labels to lists, digitwise; e.g. 203-> [2,0,3]...
+ ArcLabelledImmutableGraph glist = new ArcRelabelledImmutableGraph(gorig, new FixedWidthIntListLabel("FOO", 15), new ArcRelabelledImmutableGraph.LabelConversionStrategy() {
+ @Override
+ public void convert(Label from, Label to, long source, long target) {
+ String sValue = Integer.toString(((AbstractIntLabel)from).value);
+ int[] s = new int[sValue.length()];
+ for (int i = 0; i < sValue.length(); i++) s[i] = sValue.charAt(i) - '0';
+ ((AbstractIntListLabel)to).value = s;
+ }
+ });
+ // ...and then back to integer, but backwards; e.g. [2,0,3] -> 302...
+ ArcLabelledImmutableGraph grevert = new ArcRelabelledImmutableGraph(glist, new FixedWidthIntLabel("FOO", 15), new ArcRelabelledImmutableGraph.LabelConversionStrategy() {
+ @Override
+ public void convert(Label from, Label to, long source, long target) {
+ int[] v = ((AbstractIntListLabel)from).value;
+ int tot = 0;
+ for (int i = v.length - 1; i >= 0; i--)
+ tot = tot * 10 + v[i];
+ ((AbstractIntLabel)to).value = tot;
+ }
+ });
+ assertGraph(glist);
+ assertGraph(grevert);
+ // Check the result is correct
+ assertEquals(grevert, new IntegerTriplesArcLabelledImmutableGraph(new int[][]
+ {
+ { 0, 1, 302 }, { 0, 2, 401 }, { 1, 3, 201 }
+ }));
+ }
+}

File Metadata

Mime Type
application/octet-stream
Expires
Mon, May 20, 5:27 PM (1 d, 23 h)
Storage Engine
local-disk
Storage Format
Raw Data
Storage Handle
c3/c9/bf834c2ba76653601358c933afca

Event Timeline