diff --git a/NEWS b/NEWS
index 330f8db6..26d04f81 100644
--- a/NEWS
+++ b/NEWS
@@ -1,1704 +1,1707 @@
0.19.0 UNRELEASED
BUG FIXES
* Make `dulwich.archive` set the gzip header file modification time so that
archives created from the same Git tree are always identical.
(#577, Jonas Haag)
* Allow comment characters (#, ;) within configuration file strings
(Daniel Andersson, #579)
* Raise exception when passing in invalid author/committer values
to Repo.do_commit(). (Jelmer Vernooij, #602)
IMPROVEMENTS
* Add a fastimport ``extra``. (Jelmer Vernooij)
* Start writing reflog entries. (Jelmer Vernooij)
* Add ability to use password and keyfile ssh options with SSHVendor. (Filipp Kucheryavy)
+ * Add ``change_type_same`` flag to ``tree_changes``.
+ (Jelmer Vernooij)
+
API CHANGES
* ``GitClient.send_pack`` now accepts a ``generate_pack_data``
rather than a ``generate_pack_contents`` function for
performance reasons. (Jelmer Vernooij)
* Dulwich now uses urllib3 internally for HTTP requests.
The `opener` argument to `dulwich.client.HttpGitClient` that took a
`urllib2` opener instance has been replaced by a `pool_manager` argument
that takes a `urllib3` pool manager instance.
(Daniel Andersson)
0.18.6 2017-11-11
BUG FIXES
* Fix handling of empty repositories in ``porcelain.clone``.
(#570, Jelmer Vernooij)
* Raise an error when attempting to add paths that are not under the
repository. (Jelmer Vernooij)
* Fix error message for missing trailing ]. (Daniel Andersson)
* Raise EmptyFileException when corruption (in the form of an empty
file) is detected. (Antoine R. Dumont, #582)
IMPROVEMENTS
* Enforce date field parsing consistency. This also add checks on
those date fields for potential overflow.
(Antoine R. Dumont, #567)
0.18.5 2017-10-29
BUG FIXES
* Fix cwd for hooks. (Fabian Grünbichler)
* Fix setting of origin in config when non-standard origin is passed into
``Repo.clone``. (Kenneth Lareau, #565)
* Prevent setting SSH arguments from SSH URLs when using SSH through a
subprocess. Note that Dulwich doesn't support cloning submodules.
(CVE-2017-16228) (Jelmer Vernooij)
IMPROVEMENTS
* Silently ignored directories in ``Repo.stage``.
(Jelmer Vernooij, #564)
API CHANGES
* GitFile now raises ``FileLocked`` when encountering a lock
rather than OSError(EEXIST). (Jelmer Vernooij)
0.18.4 2017-10-01
BUG FIXES
* Make default User-Agent start with "git/" because GitHub won't response to
HTTP smart server requests otherwise (and reply with a 404).
(Jelmer vernooij, #562)
0.18.3 2017-09-03
BUG FIXES
* Read config during porcelain operations that involve remotes.
(Jelmer Vernooij, #545)
* Fix headers of empty chunks in unified diffs. (Taras Postument, #543)
* Properly follow redirects over HTTP. (Jelmer Vernooij, #117)
IMPROVEMENTS
* Add ``dulwich.porcelain.update_head``. (Jelmer Vernooij, #439)
* ``GitClient.fetch_pack`` now returns symrefs.
(Jelmer Vernooij, #485)
* The server now supports providing symrefs.
(Jelmer Vernooij, #485)
* Add ``dulwich.object_store.commit_tree_changes`` to incrementally
commit changes to a tree structure. (Jelmer Vernooij)
* Add basic ``PackBasedObjectStore.repack`` method.
(Jelmer Vernooij, Earl Chew, #296, #549, #552)
0.18.2 2017-08-01
TEST FIXES
* Use constant timestamp so tests pass in all timezones, not just BST.
(Jelmer Vernooij)
0.18.1 2017-07-31
BUG FIXES
* Fix syntax error in dulwich.contrib.test_swift_smoke.
(Jelmer Vernooij)
0.18.0 2017-07-31
BUG FIXES
* Fix remaining tests on Windows. (Jelmer Vernooij, #493)
* Fix build of C extensions with Python 3 on Windows.
(Jelmer Vernooij)
* Pass 'mkdir' argument onto Repo.init_bare in Repo.clone.
(Jelmer Vernooij, #504)
* In ``dulwich.porcelain.add``, if no files are specified,
add from current working directory rather than repository root.
(Jelmer Vernooij, #521)
* Properly deal with submodules in 'porcelain.status'.
(Jelmer Vernooij, #517)
* ``dulwich.porcelain.remove`` now actually removes files from
disk, not just from the index. (Jelmer Vernooij, #488)
* Fix handling of "reset" command with markers and without
"from". (Antoine Pietri)
* Fix handling of "merge" command with markers. (Antoine Pietri)
* Support treeish argument to porcelain.reset(), rather than
requiring a ref/commit id. (Jelmer Vernooij)
* Handle race condition when mtime doesn't change between writes/reads.
(Jelmer Vernooij, #541)
* Fix ``dulwich.porcelain.show`` on commits with Python 3.
(Jelmer Vernooij, #532)
IMPROVEMENTS
* Add basic support for reading ignore files in ``dulwich.ignore``.
``dulwich.porcelain.add`` and ``dulwich.porcelain.status`` now honor
ignores. (Jelmer Vernooij, Segev Finer, #524, #526)
* New ``dulwich.porcelain.check_ignore`` command.
(Jelmer Vernooij)
* ``dulwich.porcelain.status`` now supports a ``ignored`` argument.
(Jelmer Vernooij)
DOCUMENTATION
* Clarified docstrings for Client.{send_pack,fetch_pack} implementations.
(Jelmer Vernooij, #523)
0.17.3 2017-03-20
PLATFORM SUPPORT
* List Python 3.3 as supported. (Jelmer Vernooij, #513)
BUG FIXES
* Fix compatibility with pypy 3. (Jelmer Vernooij)
0.17.2 2017-03-19
BUG FIXES
* Add workaround for
https://bitbucket.org/pypy/pypy/issues/2499/cpyext-pystring_asstring-doesnt-work,
fixing Dulwich when used with C extensions on pypy < 5.6. (Victor Stinner)
* Properly quote config values with a '#' character in them.
(Jelmer Vernooij, #511)
0.17.1 2017-03-01
IMPROVEMENTS
* Add basic 'dulwich pull' command. (Jelmer Vernooij)
BUG FIXES
* Cope with existing submodules during pull.
(Jelmer Vernooij, #505)
0.17.0 2017-03-01
TEST FIXES
* Skip test that requires sync to synchronize filesystems if os.sync is
not available. (Koen Martens)
IMPROVEMENTS
* Implement MemoryRepo.{set_description,get_description}.
(Jelmer Vernooij)
* Raise exception in Repo.stage() when absolute paths are
passed in. Allow passing in relative paths to
porcelain.add().(Jelmer Vernooij)
BUG FIXES
* Handle multi-line quoted values in config files.
(Jelmer Vernooij, #495)
* Allow porcelain.clone of repository without HEAD.
(Jelmer Vernooij, #501)
* Support passing tag ids to Walker()'s include argument.
(Jelmer Vernooij)
* Don't strip trailing newlines from extra headers.
(Nicolas Dandrimont)
* Set bufsize=0 for subprocess interaction with SSH client.
Fixes hangs on Python 3. (René Stern, #434)
* Don't drop first slash for SSH paths, except for those
starting with "~". (Jelmer Vernooij, René Stern, #463)
* Properly log off after retrieving just refs.
(Jelmer Vernooij)
0.16.3 2016-01-14
TEST FIXES
* Remove racy check that relies on clock time changing between writes.
(Jelmer Vernooij)
IMPROVEMENTS
* Add porcelain.remote_add. (Jelmer Vernooij)
0.16.2 2016-01-14
IMPROVEMENTS
* Fixed failing test-cases on windows.
(Koen Martens)
API CHANGES
* Repo is now a context manager, so that it can be easily
closed using a ``with`` statement. (Søren Løvborg)
TEST FIXES
* Only run worktree list compat tests against git 2.7.0,
when 'git worktree list' was introduced. (Jelmer Vernooij)
BUG FIXES
* Ignore filemode when building index when core.filemode
is false.
(Koen Martens)
* Initialize core.filemode configuration setting by
probing the filesystem for trustable permissions.
(Koen Martens)
* Fix ``porcelain.reset`` to respect the comittish argument.
(Koen Martens)
* Fix dulwich.porcelain.ls_remote() on Python 3.
(#471, Jelmer Vernooij)
* Allow both unicode and byte strings for host paths
in dulwich.client. (#435, Jelmer Vernooij)
* Add remote from porcelain.clone. (#466, Jelmer Vernooij)
* Fix unquoting of credentials before passing to urllib2.
(#475, Volodymyr Holovko)
* Cope with submodules in `build_index_from_tree`.
(#477, Jelmer Vernooij)
* Handle deleted files in `get_unstaged_changes`.
(#483, Doug Hellmann)
* Don't overwrite files when they haven't changed in
`build_file_from_blob`.
(#479, Benoît HERVIER)
* Check for existence of index file before opening pack.
Fixes a race when new packs are being added.
(#482, wme)
0.16.1 2016-12-25
BUG FIXES
* Fix python3 compatibility for dulwich.contrib.release_robot.
(Jelmer Vernooij)
0.16.0 2016-12-24
IMPROVEMENTS
* Add support for worktrees. See `git-worktree(1)` and
`gitrepository-layout(5)`. (Laurent Rineau)
* Add support for `commondir` file in Git control
directories. (Laurent Rineau)
* Add support for passwords in HTTP URLs.
(Jon Bain, Mika Mäenpää)
* Add `release_robot` script to contrib,
allowing easy finding of current version based on Git tags.
(Mark Mikofski)
* Add ``Blob.splitlines`` method.
(Jelmer Vernooij)
BUG FIXES
* Fix handling of ``Commit.tree`` being set to an actual
tree object rather than a tree id. (Jelmer Vernooij)
* Return remote refs from LocalGitClient.fetch_pack(),
consistent with the documentation for that method.
(#461, Jelmer Vernooij)
* Fix handling of unknown URL schemes in get_transport_and_path.
(#465, Jelmer Vernooij)
0.15.0 2016-10-09
BUG FIXES
* Allow missing trailing LF when reading service name from
HTTP servers. (Jelmer Vernooij, Andrew Shadura, #442)
* Fix dulwich.porcelain.pull() on Python3. (Jelmer Vernooij, #451)
* Properly pull in tags during dulwich.porcelain.clone.
(Jelmer Vernooij, #408)
CHANGES
* Changed license from "GNU General Public License, version 2.0 or later"
to "Apache License, version 2.0 or later or GNU General Public License,
version 2.0 or later". (#153)
IMPROVEMENTS
* Add ``dulwich.porcelain.ls_tree`` implementation. (Jelmer Vernooij)
0.14.1 2016-07-05
BUG FIXES
* Fix regression removing untouched refs when pushing over SSH.
(Jelmer Vernooij #441)
* Skip Python3 tests for SWIFT contrib module, as it has not yet
been ported.
0.14.0 2016-07-03
BUG FIXES
* Fix ShaFile.id after modification of a copied ShaFile.
(Félix Mattrat, Jelmer Vernooij)
* Support removing refs from porcelain.push.
(Jelmer Vernooij, #437)
* Stop magic protocol ref `capabilities^{}` from leaking out
to clients. (Jelmer Vernooij, #254)
IMPROVEMENTS
* Add `dulwich.config.parse_submodules` function.
* Add `RefsContainer.follow` method. (#438)
0.13.0 2016-04-24
IMPROVEMENTS
* Support `ssh://` URLs in get_transport_and_path_from_url().
(Jelmer Vernooij, #402)
* Support missing empty line after headers in Git commits and tags.
(Nicolas Dandrimont, #413)
* Fix `dulwich.porcelain.status` when used in empty trees.
(Jelmer Vernooij, #415)
* Return copies of objects in MemoryObjectStore rather than
references, making the behaviour more consistent with that of
DiskObjectStore. (Félix Mattrat, Jelmer Vernooij)
* Fix ``dulwich.web`` on Python3. (#295, Jonas Haag)
CHANGES
* Drop support for Python 2.6.
* Fix python3 client web support. (Jelmer Vernooij)
BUG FIXES
* Fix hang on Gzip decompression. (Jonas Haag)
* Don't rely on working tell() and seek() methods
on wsgi.input. (Jonas Haag)
* Support fastexport/fastimport functionality on python3 with newer
versions of fastimport (>= 0.9.5). (Jelmer Vernooij, Félix Mattrat)
0.12.0 2015-12-13
IMPROVEMENTS
* Add a `dulwich.archive` module that can create tarballs.
Based on code from Jonas Haag in klaus.
* Add a `dulwich.reflog` module for reading and writing reflogs.
(Jelmer Vernooij)
* Fix handling of ambiguous refs in `parse_ref` to make
it match the behaviour described in https://git-scm.com/docs/gitrevisions.
(Chris Bunney)
* Support Python3 in C modules. (Lele Gaifax)
BUG FIXES
* Simplify handling of SSH command invocation.
Fixes quoting of paths. Thanks, Thomas Liebetraut. (#384)
* Fix inconsistent handling of trailing slashes for DictRefsContainer. (#383)
* Add hack to support thin packs duing fetch(), albeit while requiring the
entire pack file to be loaded into memory. (jsbain)
CHANGES
* This will be the last release to support Python 2.6.
0.11.2 2015-09-18
IMPROVEMENTS
* Add support for agent= capability. (Jelmer Vernooij, #298)
* Add support for quiet capability. (Jelmer Vernooij)
CHANGES
* The ParamikoSSHVendor class has been moved to
* dulwich.contrib.paramiko_vendor, as it's currently untested.
(Jelmer Vernooij, #364)
0.11.1 2015-09-13
Fix-up release to exclude broken blame.py file.
0.11.0 2015-09-13
IMPROVEMENTS
* Extended Python3 support to most of the codebase.
(Gary van der Merwe, Jelmer Vernooij)
* The `Repo` object has a new `close` method that can be called to close any
open resources. (Gary van der Merwe)
* Support 'git.bat' in SubprocessGitClient on Windows.
(Stefan Zimmermann)
* Advertise 'ofs-delta' capability in receive-pack server side
capabilities. (Jelmer Vernooij)
* Switched `default_local_git_client_cls` to `LocalGitClient`.
(Gary van der Merwe)
* Add `porcelain.ls_remote` and `GitClient.get_refs`.
(Michael Edgar)
* Add `Repo.discover` method. (B. M. Corser)
* Add `dulwich.objectspec.parse_refspec`. (Jelmer Vernooij)
* Add `porcelain.pack_objects` and `porcelain.repack`.
(Jelmer Vernooij)
BUG FIXES
* Fix handling of 'done' in graph walker and implement the
'no-done' capability. (Tommy Yu, #88)
* Avoid recursion limit issues resolving deltas. (William Grant, #81)
* Allow arguments in local client binary path overrides.
(Jelmer Vernooij)
* Fix handling of commands with arguments in paramiko SSH
client. (Andreas Klöckner, Jelmer Vernooij, #363)
* Fix parsing of quoted strings in configs. (Jelmer Vernooij, #305)
0.10.1 2015-03-25
BUG FIXES
* Return `ApplyDeltaError` when encountering delta errors
in both C extensions and native delta application code.
(Jelmer Vernooij, #259)
0.10.0 2015-03-22
BUG FIXES
* In dulwich.index.build_index_from_tree, by default
refuse to create entries that start with .git/.
* Fix running of testsuite when installed.
(Jelmer Vernooij, #223)
* Use a block cache in _find_content_rename_candidates(),
improving performance. (Mike Williams)
* Add support for ``core.protectNTFS`` setting.
(Jelmer Vernooij)
* Fix TypeError when fetching empty updates.
(Hwee Miin Koh)
* Resolve delta refs when pulling into a MemoryRepo.
(Max Shawabkeh, #256)
* Fix handling of tags of non-commits in missing object finder.
(Augie Fackler, #211)
* Explicitly disable mmap on plan9 where it doesn't work.
(Jeff Sickel)
IMPROVEMENTS
* New public method `Repo.reset_index`. (Jelmer Vernooij)
* Prevent duplicate parsing of loose files in objects
directory when reading. Thanks to David Keijser for the
report. (Jelmer Vernooij, #231)
0.9.9 2015-03-20
SECURITY BUG FIXES
* Fix buffer overflow in C implementation of pack apply_delta().
(CVE-2015-0838)
Thanks to Ivan Fratric of the Google Security Team for
reporting this issue.
(Jelmer Vernooij)
0.9.8 2014-11-30
BUG FIXES
* Various fixes to improve test suite running on Windows.
(Gary van der Merwe)
* Limit delta copy length to 64K in v2 pack files. (Robert Brown)
* Strip newline from final ACKed SHA while fetching packs.
(Michael Edgar)
* Remove assignment to PyList_SIZE() that was causing segfaults on
pypy. (Jelmer Vernooij, #196)
IMPROVEMENTS
* Add porcelain 'receive-pack' and 'upload-pack'. (Jelmer Vernooij)
* Handle SIGINT signals in bin/dulwich. (Jelmer Vernooij)
* Add 'status' support to bin/dulwich. (Jelmer Vernooij)
* Add 'branch_create', 'branch_list', 'branch_delete' porcelain.
(Jelmer Vernooij)
* Add 'fetch' porcelain. (Jelmer Vernooij)
* Add 'tag_delete' porcelain. (Jelmer Vernooij)
* Add support for serializing/deserializing 'gpgsig' attributes in Commit.
(Jelmer Vernooij)
CHANGES
* dul-web is now available as 'dulwich web-daemon'.
(Jelmer Vernooij)
* dulwich.porcelain.tag has been renamed to tag_create.
dulwich.porcelain.list_tags has been renamed to tag_list.
(Jelmer Vernooij)
API CHANGES
* Restore support for Python 2.6. (Jelmer Vernooij, Gary van der Merwe)
0.9.7 2014-06-08
BUG FIXES
* Fix tests dependent on hash ordering. (Michael Edgar)
* Support staging symbolic links in Repo.stage.
(Robert Brown)
* Ensure that all files object are closed when running the test suite.
(Gary van der Merwe)
* When writing OFS_DELTA pack entries, write correct offset.
(Augie Fackler)
* Fix handler of larger copy operations in packs. (Augie Fackler)
* Various fixes to improve test suite running on Windows.
(Gary van der Merwe)
* Fix logic for extra adds of identical files in rename detector.
(Robert Brown)
IMPROVEMENTS
* Add porcelain 'status'. (Ryan Faulkner)
* Add porcelain 'daemon'. (Jelmer Vernooij)
* Add `dulwich.greenthreads` module which provides support
for concurrency of some object store operations.
(Fabien Boucher)
* Various changes to improve compatibility with Python 3.
(Gary van der Merwe, Hannu Valtonen, michael-k)
* Add OpenStack Swift backed repository implementation
in dulwich.contrib. See README.swift for details. (Fabien Boucher)
API CHANGES
* An optional close function can be passed to the Protocol class. This will
be called by its close method. (Gary van der Merwe)
* All classes with close methods are now context managers, so that they can
be easily closed using a `with` statement. (Gary van der Merwe)
* Remove deprecated `num_objects` argument to `write_pack` methods.
(Jelmer Vernooij)
OTHER CHANGES
* The 'dul-daemon' script has been removed. The same functionality
is now available as 'dulwich daemon'. (Jelmer Vernooij)
0.9.6 2014-04-23
IMPROVEMENTS
* Add support for recursive add in 'git add'.
(Ryan Faulkner, Jelmer Vernooij)
* Add porcelain 'list_tags'. (Ryan Faulkner)
* Add porcelain 'push'. (Ryan Faulkner)
* Add porcelain 'pull'. (Ryan Faulkner)
* Support 'http.proxy' in HttpGitClient.
(Jelmer Vernooij, #1096030)
* Support 'http.useragent' in HttpGitClient.
(Jelmer Vernooij)
* In server, wait for clients to send empty list of
wants when talking to empty repository.
(Damien Tournoud)
* Various changes to improve compatibility with
Python 3. (Gary van der Merwe)
BUG FIXES
* Support unseekable 'wsgi.input' streams.
(Jonas Haag)
* Raise TypeError when passing unicode() object
to Repo.__getitem__.
(Jonas Haag)
* Fix handling of `reset` command in dulwich.fastexport.
(Jelmer Vernooij, #1249029)
* In client, don't wait for server to close connection
first. Fixes hang when used against GitHub
server implementation. (Siddharth Agarwal)
* DeltaChainIterator: fix a corner case where an object is inflated as an
object already in the repository.
(Damien Tournoud, #135)
* Stop leaking file handles during pack reload. (Damien Tournoud)
* Avoid reopening packs during pack cache reload. (Jelmer Vernooij)
API CHANGES
* Drop support for Python 2.6. (Jelmer Vernooij)
0.9.5 2014-02-23
IMPROVEMENTS
* Add porcelain 'tag'. (Ryan Faulkner)
* New module `dulwich.objectspec` for parsing strings referencing
objects and commit ranges. (Jelmer Vernooij)
* Add shallow branch support. (milki)
* Allow passing urllib2 `opener` into HttpGitClient.
(Dov Feldstern, #909037)
CHANGES
* Drop support for Python 2.4 and 2.5. (Jelmer Vernooij)
API CHANGES
* Remove long deprecated ``Repo.commit``, ``Repo.get_blob``,
``Repo.tree`` and ``Repo.tag``. (Jelmer Vernooij)
* Remove long deprecated ``Repo.revision_history`` and ``Repo.ref``.
(Jelmer Vernooij)
* Remove long deprecated ``Tree.entries``. (Jelmer Vernooij)
BUG FIXES
* Raise KeyError rather than TypeError when passing in
unicode object of length 20 or 40 to Repo.__getitem__.
(Jelmer Vernooij)
* Use 'rm' rather than 'unlink' in tests, since the latter
does not exist on OpenBSD and other platforms.
(Dmitrij D. Czarkoff)
0.9.4 2013-11-30
IMPROVEMENTS
* Add ssh_kwargs attribute to ParamikoSSHVendor. (milki)
* Add Repo.set_description(). (Víðir Valberg Guðmundsson)
* Add a basic `dulwich.porcelain` module. (Jelmer Vernooij, Marcin Kuzminski)
* Various performance improvements for object access.
(Jelmer Vernooij)
* New function `get_transport_and_path_from_url`,
similar to `get_transport_and_path` but only
supports URLs.
(Jelmer Vernooij)
* Add support for file:// URLs in `get_transport_and_path_from_url`.
(Jelmer Vernooij)
* Add LocalGitClient implementation.
(Jelmer Vernooij)
BUG FIXES
* Support filesystems with 64bit inode and device numbers.
(André Roth)
CHANGES
* Ref handling has been moved to dulwich.refs.
(Jelmer Vernooij)
API CHANGES
* Remove long deprecated RefsContainer.set_ref().
(Jelmer Vernooij)
* Repo.ref() is now deprecated in favour of Repo.refs[].
(Jelmer Vernooij)
FEATURES
* Add support for graftpoints. (milki)
0.9.3 2013-09-27
BUG FIXES
* Fix path for stdint.h in MANIFEST.in. (Jelmer Vernooij)
0.9.2 2013-09-26
BUG FIXES
* Include stdint.h in MANIFEST.in (Mark Mikofski)
0.9.1 2013-09-22
BUG FIXES
* Support lookups of 40-character refs in BaseRepo.__getitem__. (Chow Loong Jin, Jelmer Vernooij)
* Fix fetching packs with side-band-64k capability disabled. (David Keijser, Jelmer Vernooij)
* Several fixes in send-pack protocol behaviour - handling of empty pack files and deletes.
(milki, #1063087)
* Fix capability negotiation when fetching packs over HTTP.
(#1072461, William Grant)
* Enforce determine_wants returning an empty list rather than None. (Fabien Boucher, Jelmer Vernooij)
* In the server, support pushes just removing refs. (Fabien Boucher, Jelmer Vernooij)
IMPROVEMENTS
* Support passing a single revision to BaseRepo.get_walker() rather than a list of revisions.
(Alberto Ruiz)
* Add `Repo.get_description` method. (Jelmer Vernooij)
* Support thin packs in Pack.iterobjects() and Pack.get_raw().
(William Grant)
* Add `MemoryObjectStore.add_pack` and `MemoryObjectStore.add_thin_pack` methods.
(David Bennett)
* Add paramiko-based SSH vendor. (Aaron O'Mullan)
* Support running 'dulwich.server' and 'dulwich.web' using 'python -m'.
(Jelmer Vernooij)
* Add ObjectStore.close(). (Jelmer Vernooij)
* Raise appropriate NotImplementedError when encountering dumb HTTP servers.
(Jelmer Vernooij)
API CHANGES
* SSHVendor.connect_ssh has been renamed to SSHVendor.run_command.
(Jelmer Vernooij)
* ObjectStore.add_pack() now returns a 3-tuple. The last element will be an
abort() method that can be used to cancel the pack operation.
(Jelmer Vernooij)
0.9.0 2013-05-31
BUG FIXES
* Push efficiency - report missing objects only. (#562676, Artem Tikhomirov)
* Use indentation consistent with C Git in config files.
(#1031356, Curt Moore, Jelmer Vernooij)
* Recognize and skip binary files in diff function.
(Takeshi Kanemoto)
* Fix handling of relative paths in dulwich.client.get_transport_and_path.
(Brian Visel, #1169368)
* Preserve ordering of entries in configuration.
(Benjamin Pollack)
* Support ~ expansion in SSH client paths. (milki, #1083439)
* Support relative paths in alternate paths.
(milki, Michel Lespinasse, #1175007)
* Log all error messages from wsgiref server to the logging module. This
makes the test suit quiet again. (Gary van der Merwe)
* Support passing None for empty tree in changes_from_tree.
(Kevin Watters)
* Support fetching empty repository in client. (milki, #1060462)
IMPROVEMENTS:
* Add optional honor_filemode flag to build_index_from_tree.
(Mark Mikofski)
* Support core/filemode setting when building trees. (Jelmer Vernooij)
* Add chapter on tags in tutorial. (Ryan Faulkner)
FEATURES
* Add support for mergetags. (milki, #963525)
* Add support for posix shell hooks. (milki)
0.8.7 2012-11-27
BUG FIXES
* Fix use of alternates in ``DiskObjectStore``.{__contains__,__iter__}.
(Dmitriy)
* Fix compatibility with Python 2.4. (David Carr)
0.8.6 2012-11-09
API CHANGES
* dulwich.__init__ no longer imports client, protocol, repo and
server modules. (Jelmer Vernooij)
FEATURES
* ConfigDict now behaves more like a dictionary.
(Adam 'Cezar' Jenkins, issue #58)
* HTTPGitApplication now takes an optional
`fallback_app` argument. (Jonas Haag, issue #67)
* Support for large pack index files. (Jameson Nash)
TESTING
* Make index entry tests a little bit less strict, to cope with
slightly different behaviour on various platforms.
(Jelmer Vernooij)
* ``setup.py test`` (available when setuptools is installed) now
runs all tests, not just the basic unit tests.
(Jelmer Vernooij)
BUG FIXES
* Commit._deserialize now actually deserializes the current state rather than
the previous one. (Yifan Zhang, issue #59)
* Handle None elements in lists of TreeChange objects. (Alex Holmes)
* Support cloning repositories without HEAD set.
(D-Key, Jelmer Vernooij, issue #69)
* Support ``MemoryRepo.get_config``. (Jelmer Vernooij)
* In ``get_transport_and_path``, pass extra keyword arguments on to
HttpGitClient. (Jelmer Vernooij)
0.8.5 2012-03-29
BUG FIXES
* Avoid use of 'with' in dulwich.index. (Jelmer Vernooij)
* Be a little bit strict about OS behaviour in index tests.
Should fix the tests on Debian GNU/kFreeBSD. (Jelmer Vernooij)
0.8.4 2012-03-28
BUG FIXES
* Options on the same line as sections in config files are now supported.
(Jelmer Vernooij, #920553)
* Only negotiate capabilities that are also supported by the server.
(Rod Cloutier, Risto Kankkunen)
* Fix parsing of invalid timezone offsets with two minus signs.
(Jason R. Coombs, #697828)
* Reset environment variables during tests, to avoid
test isolation leaks reading ~/.gitconfig. (Risto Kankkunen)
TESTS
* $HOME is now explicitly specified for tests that use it to read
``~/.gitconfig``, to prevent test isolation issues.
(Jelmer Vernooij, #920330)
FEATURES
* Additional arguments to get_transport_and_path are now passed
on to the constructor of the transport. (Sam Vilain)
* The WSGI server now transparently handles when a git client submits data
using Content-Encoding: gzip.
(David Blewett, Jelmer Vernooij)
* Add dulwich.index.build_index_from_tree(). (milki)
0.8.3 2012-01-21
FEATURES
* The config parser now supports the git-config file format as
described in git-config(1) and can write git config files.
(Jelmer Vernooij, #531092, #768687)
* ``Repo.do_commit`` will now use the user identity from
.git/config or ~/.gitconfig if none was explicitly specified.
(Jelmer Vernooij)
BUG FIXES
* Allow ``determine_wants`` methods to include the zero sha in their
return value. (Jelmer Vernooij)
0.8.2 2011-12-18
BUG FIXES
* Cope with different zlib buffer sizes in sha1 file parser.
(Jelmer Vernooij)
* Fix get_transport_and_path for HTTP/HTTPS URLs.
(Bruno Renié)
* Avoid calling free_objects() on NULL in error cases. (Chris Eberle)
* Fix use --bare argument to 'dulwich init'. (Chris Eberle)
* Properly abort connections when the determine_wants function
raises an exception. (Jelmer Vernooij, #856769)
* Tweak xcodebuild hack to deal with more error output.
(Jelmer Vernooij, #903840)
FEATURES
* Add support for retrieving tarballs from remote servers.
(Jelmer Vernooij, #379087)
* New method ``update_server_info`` which generates data
for dumb server access. (Jelmer Vernooij, #731235)
0.8.1 2011-10-31
FEATURES
* Repo.do_commit has a new argument 'ref'.
* Repo.do_commit has a new argument 'merge_heads'. (Jelmer Vernooij)
* New ``Repo.get_walker`` method. (Jelmer Vernooij)
* New ``Repo.clone`` method. (Jelmer Vernooij, #725369)
* ``GitClient.send_pack`` now supports the 'side-band-64k' capability.
(Jelmer Vernooij)
* ``HttpGitClient`` which supports the smart server protocol over
HTTP. "dumb" access is not yet supported. (Jelmer Vernooij, #373688)
* Add basic support for alternates. (Jelmer Vernooij, #810429)
CHANGES
* unittest2 or python >= 2.7 is now required for the testsuite.
testtools is no longer supported. (Jelmer Vernooij, #830713)
BUG FIXES
* Fix compilation with older versions of MSVC. (Martin gz)
* Special case 'refs/stash' as a valid ref. (Jelmer Vernooij, #695577)
* Smart protocol clients can now change refs even if they are
not uploading new data. (Jelmer Vernooij, #855993)
* Don't compile C extensions when running in pypy.
(Ronny Pfannschmidt, #881546)
* Use different name for strnlen replacement function to avoid clashing
with system strnlen. (Jelmer Vernooij, #880362)
API CHANGES
* ``Repo.revision_history`` is now deprecated in favor of ``Repo.get_walker``.
(Jelmer Vernooij)
0.8.0 2011-08-07
FEATURES
* New DeltaChainIterator abstract class for quickly iterating all objects in
a pack, with implementations for pack indexing and inflation.
(Dave Borowitz)
* New walk module with a Walker class for customizable commit walking.
(Dave Borowitz)
* New tree_changes_for_merge function in diff_tree. (Dave Borowitz)
* Easy rename detection in RenameDetector even without find_copies_harder.
(Dave Borowitz)
BUG FIXES
* Avoid storing all objects in memory when writing pack.
(Jelmer Vernooij, #813268)
* Support IPv6 for git:// connections. (Jelmer Vernooij, #801543)
* Improve performance of Repo.revision_history(). (Timo Schmid, #535118)
* Fix use of SubprocessWrapper on Windows. (Paulo Madeira, #670035)
* Fix compilation on newer versions of Mac OS X (Lion and up). (Ryan McKern, #794543)
* Prevent raising ValueError for correct refs in RefContainer.__delitem__.
* Correctly return a tuple from MemoryObjectStore.get_raw. (Dave Borowitz)
* Fix a bug in reading the pack checksum when there are fewer than 20 bytes
left in the buffer. (Dave Borowitz)
* Support ~ in git:// URL paths. (Jelmer Vernooij, #813555)
* Make ShaFile.__eq__ work when other is not a ShaFile. (Dave Borowitz)
* ObjectStore.get_graph_walker() now no longer yields the same
revision more than once. This has a significant improvement for
performance when wide revision graphs are involved.
(Jelmer Vernooij, #818168)
* Teach ReceivePackHandler how to read empty packs. (Dave Borowitz)
* Don't send a pack with duplicates of the same object. (Dave Borowitz)
* Teach the server how to serve a clone of an empty repo. (Dave Borowitz)
* Correctly advertise capabilities during receive-pack. (Dave Borowitz)
* Fix add/add and add/rename conflicts in tree_changes_for_merge.
(Dave Borowitz)
* Use correct MIME types in web server. (Dave Borowitz)
API CHANGES
* write_pack no longer takes the num_objects argument and requires an object
to be passed in that is iterable (rather than an iterator) and that
provides __len__. (Jelmer Vernooij)
* write_pack_data has been renamed to write_pack_objects and no longer takes a
num_objects argument. (Jelmer Vernooij)
* take_msb_bytes, read_zlib_chunks, unpack_objects, and
PackStreamReader.read_objects now take an additional argument indicating a
crc32 to compute. (Dave Borowitz)
* PackObjectIterator was removed; its functionality is still exposed by
PackData.iterobjects. (Dave Borowitz)
* Add a sha arg to write_pack_object to incrementally compute a SHA.
(Dave Borowitz)
* Include offset in PackStreamReader results. (Dave Borowitz)
* Move PackStreamReader from server to pack. (Dave Borowitz)
* Extract a check_length_and_checksum, compute_file_sha, and
pack_object_header pack helper functions. (Dave Borowitz)
* Extract a compute_file_sha function. (Dave Borowitz)
* Remove move_in_thin_pack as a separate method; add_thin_pack now completes
the thin pack and moves it in in one step. Remove ThinPackData as well.
(Dave Borowitz)
* Custom buffer size in read_zlib_chunks. (Dave Borowitz)
* New UnpackedObject data class that replaces ad-hoc tuples in the return
value of unpack_object and various DeltaChainIterator methods.
(Dave Borowitz)
* Add a lookup_path convenience method to Tree. (Dave Borowitz)
* Optionally create RenameDetectors without passing in tree SHAs.
(Dave Borowitz)
* Optionally include unchanged entries in RenameDetectors. (Dave Borowitz)
* Optionally pass a RenameDetector to tree_changes. (Dave Borowitz)
* Optionally pass a request object through to server handlers. (Dave Borowitz)
TEST CHANGES
* If setuptools is installed, "python setup.py test" will now run the testsuite.
(Jelmer Vernooij)
* Add a new build_pack test utility for building packs from a simple spec.
(Dave Borowitz)
* Add a new build_commit_graph test utility for building commits from a
simple spec. (Dave Borowitz)
0.7.1 2011-04-12
BUG FIXES
* Fix double decref in _diff_tree.c. (Ted Horst, #715528)
* Fix the build on Windows. (Pascal Quantin)
* Fix get_transport_and_path compatibility with pre-2.6.5 versions of Python.
(Max Bowsher, #707438)
* BaseObjectStore.determine_wants_all no longer breaks on zero SHAs.
(Jelmer Vernooij)
* write_tree_diff() now supports submodules.
(Jelmer Vernooij)
* Fix compilation for XCode 4 and older versions of distutils.sysconfig.
(Daniele Sluijters)
IMPROVEMENTS
* Sphinxified documentation. (Lukasz Balcerzak)
* Add Pack.keep.(Marc Brinkmann)
API CHANGES
* The order of the parameters to Tree.add(name, mode, sha) has changed, and
is now consistent with the rest of Dulwich. Existing code will still
work but print a DeprecationWarning. (Jelmer Vernooij, #663550)
* Tree.entries() is now deprecated in favour of Tree.items() and
Tree.iteritems(). (Jelmer Vernooij)
0.7.0 2011-01-21
FEATURES
* New `dulwich.diff_tree` module for simple content-based rename detection.
(Dave Borowitz)
* Add Tree.items(). (Jelmer Vernooij)
* Add eof() and unread_pkt_line() methods to Protocol. (Dave Borowitz)
* Add write_tree_diff(). (Jelmer Vernooij)
* Add `serve_command` function for git server commands as executables.
(Jelmer Vernooij)
* dulwich.client.get_transport_and_path now supports rsync-style repository URLs.
(Dave Borowitz, #568493)
BUG FIXES
* Correct short-circuiting operation for no-op fetches in the server.
(Dave Borowitz)
* Support parsing git mbox patches without a version tail, as generated by
Mercurial. (Jelmer Vernooij)
* Fix dul-receive-pack and dul-upload-pack. (Jelmer Vernooij)
* Zero-padded file modes in Tree objects no longer trigger an exception but
the check code warns about them. (Augie Fackler, #581064)
* Repo.init() now honors the mkdir flag. (#671159)
* The ref format is now checked when setting a ref rather than when reading it back.
(Dave Borowitz, #653527)
* Make sure pack files are closed correctly. (Tay Ray Chuan)
DOCUMENTATION
* Run the tutorial inside the test suite. (Jelmer Vernooij)
* Reorganized and updated the tutorial. (Jelmer Vernooij, Dave Borowitz, #610550,
#610540)
0.6.2 2010-10-16
BUG FIXES
* HTTP server correctly handles empty CONTENT_LENGTH. (Dave Borowitz)
* Don't error when creating GitFiles with the default mode. (Dave Borowitz)
* ThinPackData.from_file now works with resolve_ext_ref callback.
(Dave Borowitz)
* Provide strnlen() on mingw32 which doesn't have it. (Hans Kolek)
* Set bare=true in the configuratin for bare repositories. (Dirk Neumann)
FEATURES
* Use slots for core objects to save up on memory. (Jelmer Vernooij)
* Web server supports streaming progress/pack output. (Dave Borowitz)
* New public function dulwich.pack.write_pack_header. (Dave Borowitz)
* Distinguish between missing files and read errors in HTTP server.
(Dave Borowitz)
* Initial work on support for fastimport using python-fastimport.
(Jelmer Vernooij)
* New dulwich.pack.MemoryPackIndex class. (Jelmer Vernooij)
* Delegate SHA peeling to the object store. (Dave Borowitz)
TESTS
* Use GitFile when modifying packed-refs in tests. (Dave Borowitz)
* New tests in test_web with better coverage and fewer ad-hoc mocks.
(Dave Borowitz)
* Standardize quote delimiters in test_protocol. (Dave Borowitz)
* Fix use when testtools is installed. (Jelmer Vernooij)
* Add trivial test for write_pack_header. (Jelmer Vernooij)
* Refactor some of dulwich.tests.compat.server_utils. (Dave Borowitz)
* Allow overwriting id property of objects in test utils. (Dave Borowitz)
* Use real in-memory objects rather than stubs for server tests.
(Dave Borowitz)
* Clean up MissingObjectFinder. (Dave Borowitz)
API CHANGES
* ObjectStore.iter_tree_contents now walks contents in depth-first, sorted
order. (Dave Borowitz)
* ObjectStore.iter_tree_contents can optionally yield tree objects as well.
(Dave Borowitz).
* Add side-band-64k support to ReceivePackHandler. (Dave Borowitz)
* Change server capabilities methods to classmethods. (Dave Borowitz)
* Tweak server handler injection. (Dave Borowitz)
* PackIndex1 and PackIndex2 now subclass FilePackIndex, which is
itself a subclass of PackIndex. (Jelmer Vernooij)
DOCUMENTATION
* Add docstrings for various functions in dulwich.objects. (Jelmer Vernooij)
* Clean up docstrings in dulwich.protocol. (Dave Borowitz)
* Explicitly specify allowed protocol commands to
ProtocolGraphWalker.read_proto_line. (Dave Borowitz)
* Add utility functions to DictRefsContainer. (Dave Borowitz)
0.6.1 2010-07-22
BUG FIXES
* Fix memory leak in C implementation of sorted_tree_items. (Dave Borowitz)
* Use correct path separators for named repo files. (Dave Borowitz)
* python > 2.7 and testtools-based test runners will now also pick up skipped
tests correctly. (Jelmer Vernooij)
FEATURES
* Move named file initilization to BaseRepo. (Dave Borowitz)
* Add logging utilities and git/HTTP server logging. (Dave Borowitz)
* The GitClient interface has been cleaned up and instances are now reusable.
(Augie Fackler)
* Allow overriding paths to executables in GitSSHClient.
(Ross Light, Jelmer Vernooij, #585204)
* Add PackBasedObjectStore.pack_loose_objects(). (Jelmer Vernooij)
TESTS
* Add tests for sorted_tree_items and C implementation. (Dave Borowitz)
* Add a MemoryRepo that stores everything in memory. (Dave Borowitz)
* Quiet logging output from web tests. (Dave Borowitz)
* More flexible version checking for compat tests. (Dave Borowitz)
* Compat tests for servers with and without side-band-64k. (Dave Borowitz)
CLEANUP
* Clean up file headers. (Dave Borowitz)
TESTS
* Use GitFile when modifying packed-refs in tests. (Dave Borowitz)
API CHANGES
* dulwich.pack.write_pack_index_v{1,2} now take a file-like object
rather than a filename. (Jelmer Vernooij)
* Make dul-daemon/dul-web trivial wrappers around server functionality.
(Dave Borowitz)
* Move reference WSGI handler to web.py. (Dave Borowitz)
* Factor out _report_status in ReceivePackHandler. (Dave Borowitz)
* Factor out a function to convert a line to a pkt-line. (Dave Borowitz)
0.6.0 2010-05-22
note: This list is most likely incomplete for 0.6.0.
BUG FIXES
* Fix ReceivePackHandler to disallow removing refs without delete-refs.
(Dave Borowitz)
* Deal with capabilities required by the client, even if they
can not be disabled in the server. (Dave Borowitz)
* Fix trailing newlines in generated patch files.
(Jelmer Vernooij)
* Implement RefsContainer.__contains__. (Jelmer Vernooij)
* Cope with \r in ref files on Windows. (
http://github.com/jelmer/dulwich/issues/#issue/13, Jelmer Vernooij)
* Fix GitFile breakage on Windows. (Anatoly Techtonik, #557585)
* Support packed ref deletion with no peeled refs. (Augie Fackler)
* Fix send pack when there is nothing to fetch. (Augie Fackler)
* Fix fetch if no progress function is specified. (Augie Fackler)
* Allow double-staging of files that are deleted in the index.
(Dave Borowitz)
* Fix RefsContainer.add_if_new to support dangling symrefs.
(Dave Borowitz)
* Non-existant index files in non-bare repositories are now treated as
empty. (Dave Borowitz)
* Always update ShaFile.id when the contents of the object get changed.
(Jelmer Vernooij)
* Various Python2.4-compatibility fixes. (Dave Borowitz)
* Fix thin pack handling. (Dave Borowitz)
FEATURES
* Add include-tag capability to server. (Dave Borowitz)
* New dulwich.fastexport module that can generate fastexport
streams. (Jelmer Vernooij)
* Implemented BaseRepo.__contains__. (Jelmer Vernooij)
* Add __setitem__ to DictRefsContainer. (Dave Borowitz)
* Overall improvements checking Git objects. (Dave Borowitz)
* Packs are now verified while they are received. (Dave Borowitz)
TESTS
* Add framework for testing compatibility with C Git. (Dave Borowitz)
* Add various tests for the use of non-bare repositories. (Dave Borowitz)
* Cope with diffstat not being available on all platforms.
(Tay Ray Chuan, Jelmer Vernooij)
* Add make_object and make_commit convenience functions to test utils.
(Dave Borowitz)
API BREAKAGES
* The 'committer' and 'message' arguments to Repo.do_commit() have
been swapped. 'committer' is now optional. (Jelmer Vernooij)
* Repo.get_blob, Repo.commit, Repo.tag and Repo.tree are now deprecated.
(Jelmer Vernooij)
* RefsContainer.set_ref() was renamed to RefsContainer.set_symbolic_ref(),
for clarity. (Jelmer Vernooij)
API CHANGES
* The primary serialization APIs in dulwich.objects now work
with chunks of strings rather than with full-text strings.
(Jelmer Vernooij)
0.5.02010-03-03
BUG FIXES
* Support custom fields in commits (readonly). (Jelmer Vernooij)
* Improved ref handling. (Dave Borowitz)
* Rework server protocol to be smarter and interoperate with cgit client.
(Dave Borowitz)
* Add a GitFile class that uses the same locking protocol for writes as
cgit. (Dave Borowitz)
* Cope with forward slashes correctly in the index on Windows.
(Jelmer Vernooij, #526793)
FEATURES
* --pure option to setup.py to allow building/installing without the C
extensions. (Hal Wine, Anatoly Techtonik, Jelmer Vernooij, #434326)
* Implement Repo.get_config(). (Jelmer Vernooij, Augie Fackler)
* HTTP dumb and smart server. (Dave Borowitz)
* Add abstract baseclass for Repo that does not require file system
operations. (Dave Borowitz)
0.4.1 2010-01-03
FEATURES
* Add ObjectStore.iter_tree_contents(). (Jelmer Vernooij)
* Add Index.changes_from_tree(). (Jelmer Vernooij)
* Add ObjectStore.tree_changes(). (Jelmer Vernooij)
* Add functionality for writing patches in dulwich.patch.
(Jelmer Vernooij)
0.4.0 2009-10-07
DOCUMENTATION
* Added tutorial.
API CHANGES
* dulwich.object_store.tree_lookup_path will now return the mode and
sha of the object found rather than the object itself.
BUG FIXES
* Use binascii.hexlify / binascii.unhexlify for better performance.
* Cope with extra unknown data in index files by ignoring it (for now).
* Add proper error message when server unexpectedly hangs up. (#415843)
* Correctly write opcode for equal in create_delta.
0.3.3 2009-07-23
FEATURES
* Implement ShaFile.__hash__().
* Implement Tree.__len__()
BUG FIXES
* Check for 'objects' and 'refs' directories
when looking for a Git repository. (#380818)
0.3.2 2009-05-20
BUG FIXES
* Support the encoding field in Commits.
* Some Windows compatibility fixes.
* Fixed several issues in commit support.
FEATURES
* Basic support for handling submodules.
0.3.1 2009-05-13
FEATURES
* Implemented Repo.__getitem__, Repo.__setitem__ and Repo.__delitem__ to
access content.
API CHANGES
* Removed Repo.set_ref, Repo.remove_ref, Repo.tags, Repo.get_refs and
Repo.heads in favor of Repo.refs, a dictionary-like object for accessing
refs.
BUG FIXES
* Removed import of 'sha' module in objects.py, which was causing
deprecation warnings on Python 2.6.
0.3.0 2009-05-10
FEATURES
* A new function 'commit_tree' has been added that can commit a tree
based on an index.
BUG FIXES
* The memory usage when generating indexes has been significantly reduced.
* A memory leak in the C implementation of parse_tree has been fixed.
* The send-pack smart server command now works. (Thanks Scott Chacon)
* The handling of short timestamps (less than 10 digits) has been fixed.
* The handling of timezones has been fixed.
0.2.1 2009-04-30
BUG FIXES
* Fix compatibility with Python2.4.
0.2.0 2009-04-30
FEATURES
* Support for activity reporting in smart protocol client.
* Optional C extensions for better performance in a couple of
places that are performance-critical.
0.1.1 2009-03-13
BUG FIXES
* Fixed regression in Repo.find_missing_objects()
* Don't fetch ^{} objects from remote hosts, as requesting them
causes a hangup.
* Always write pack to disk completely before calculating checksum.
FEATURES
* Allow disabling thin packs when talking to remote hosts.
0.1.0 2009-01-24
* Initial release.
diff --git a/dulwich/diff_tree.py b/dulwich/diff_tree.py
index fb0704e4..167cb28e 100644
--- a/dulwich/diff_tree.py
+++ b/dulwich/diff_tree.py
@@ -1,604 +1,608 @@
# diff_tree.py -- Utilities for diffing files and trees.
# Copyright (C) 2010 Google, Inc.
#
# Dulwich is dual-licensed under the Apache License, Version 2.0 and the GNU
# General Public License as public by the Free Software Foundation; version 2.0
# or (at your option) any later version. You can redistribute it and/or
# modify it under the terms of either of these two licenses.
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# You should have received a copy of the licenses; if not, see
# for a copy of the GNU General Public License
# and for a copy of the Apache
# License, Version 2.0.
#
"""Utilities for diffing files and trees."""
import sys
from collections import (
defaultdict,
namedtuple,
)
from io import BytesIO
from itertools import chain
import stat
from dulwich.objects import (
S_ISGITLINK,
TreeEntry,
)
# TreeChange type constants.
CHANGE_ADD = 'add'
CHANGE_MODIFY = 'modify'
CHANGE_DELETE = 'delete'
CHANGE_RENAME = 'rename'
CHANGE_COPY = 'copy'
CHANGE_UNCHANGED = 'unchanged'
RENAME_CHANGE_TYPES = (CHANGE_RENAME, CHANGE_COPY)
_NULL_ENTRY = TreeEntry(None, None, None)
_MAX_SCORE = 100
RENAME_THRESHOLD = 60
MAX_FILES = 200
REWRITE_THRESHOLD = None
class TreeChange(namedtuple('TreeChange', ['type', 'old', 'new'])):
"""Named tuple a single change between two trees."""
@classmethod
def add(cls, new):
return cls(CHANGE_ADD, _NULL_ENTRY, new)
@classmethod
def delete(cls, old):
return cls(CHANGE_DELETE, old, _NULL_ENTRY)
def _tree_entries(path, tree):
result = []
if not tree:
return result
for entry in tree.iteritems(name_order=True):
result.append(entry.in_path(path))
return result
def _merge_entries(path, tree1, tree2):
"""Merge the entries of two trees.
:param path: A path to prepend to all tree entry names.
:param tree1: The first Tree object to iterate, or None.
:param tree2: The second Tree object to iterate, or None.
:return: A list of pairs of TreeEntry objects for each pair of entries in
the trees. If an entry exists in one tree but not the other, the other
entry will have all attributes set to None. If neither entry's path is
None, they are guaranteed to match.
"""
entries1 = _tree_entries(path, tree1)
entries2 = _tree_entries(path, tree2)
i1 = i2 = 0
len1 = len(entries1)
len2 = len(entries2)
result = []
while i1 < len1 and i2 < len2:
entry1 = entries1[i1]
entry2 = entries2[i2]
if entry1.path < entry2.path:
result.append((entry1, _NULL_ENTRY))
i1 += 1
elif entry1.path > entry2.path:
result.append((_NULL_ENTRY, entry2))
i2 += 1
else:
result.append((entry1, entry2))
i1 += 1
i2 += 1
for i in range(i1, len1):
result.append((entries1[i], _NULL_ENTRY))
for i in range(i2, len2):
result.append((_NULL_ENTRY, entries2[i]))
return result
def _is_tree(entry):
mode = entry.mode
if mode is None:
return False
return stat.S_ISDIR(mode)
def walk_trees(store, tree1_id, tree2_id, prune_identical=False):
"""Recursively walk all the entries of two trees.
Iteration is depth-first pre-order, as in e.g. os.walk.
:param store: An ObjectStore for looking up objects.
:param tree1_id: The SHA of the first Tree object to iterate, or None.
:param tree2_id: The SHA of the second Tree object to iterate, or None.
:param prune_identical: If True, identical subtrees will not be walked.
:return: Iterator over Pairs of TreeEntry objects for each pair of entries
in the trees and their subtrees recursively. If an entry exists in one
tree but not the other, the other entry will have all attributes set
to None. If neither entry's path is None, they are guaranteed to
match.
"""
# This could be fairly easily generalized to >2 trees if we find a use
# case.
mode1 = tree1_id and stat.S_IFDIR or None
mode2 = tree2_id and stat.S_IFDIR or None
todo = [(TreeEntry(b'', mode1, tree1_id), TreeEntry(b'', mode2, tree2_id))]
while todo:
entry1, entry2 = todo.pop()
is_tree1 = _is_tree(entry1)
is_tree2 = _is_tree(entry2)
if prune_identical and is_tree1 and is_tree2 and entry1 == entry2:
continue
tree1 = is_tree1 and store[entry1.sha] or None
tree2 = is_tree2 and store[entry2.sha] or None
path = entry1.path or entry2.path
todo.extend(reversed(_merge_entries(path, tree1, tree2)))
yield entry1, entry2
def _skip_tree(entry, include_trees):
if entry.mode is None or (not include_trees and stat.S_ISDIR(entry.mode)):
return _NULL_ENTRY
return entry
def tree_changes(store, tree1_id, tree2_id, want_unchanged=False,
- rename_detector=None, include_trees=False):
+ rename_detector=None, include_trees=False,
+ change_type_same=False):
"""Find the differences between the contents of two trees.
:param store: An ObjectStore for looking up objects.
:param tree1_id: The SHA of the source tree.
:param tree2_id: The SHA of the target tree.
:param want_unchanged: If True, include TreeChanges for unmodified entries
as well.
:param include_trees: Whether to include trees
:param rename_detector: RenameDetector object for detecting renames.
+ :param change_type_same: Whether to report change types in the same
+ entry or as delete+add.
:return: Iterator over TreeChange instances for each change between the
source and target tree.
"""
if include_trees and rename_detector is not None:
raise NotImplementedError(
'rename_detector and include_trees are mutually exclusive')
if (rename_detector is not None and tree1_id is not None and
tree2_id is not None):
for change in rename_detector.changes_with_renames(
tree1_id, tree2_id, want_unchanged=want_unchanged):
yield change
return
entries = walk_trees(store, tree1_id, tree2_id,
prune_identical=(not want_unchanged))
for entry1, entry2 in entries:
if entry1 == entry2 and not want_unchanged:
continue
# Treat entries for trees as missing.
entry1 = _skip_tree(entry1, include_trees)
entry2 = _skip_tree(entry2, include_trees)
if entry1 != _NULL_ENTRY and entry2 != _NULL_ENTRY:
- if stat.S_IFMT(entry1.mode) != stat.S_IFMT(entry2.mode):
+ if (stat.S_IFMT(entry1.mode) != stat.S_IFMT(entry2.mode)
+ and not change_type_same):
# File type changed: report as delete/add.
yield TreeChange.delete(entry1)
entry1 = _NULL_ENTRY
change_type = CHANGE_ADD
elif entry1 == entry2:
change_type = CHANGE_UNCHANGED
else:
change_type = CHANGE_MODIFY
elif entry1 != _NULL_ENTRY:
change_type = CHANGE_DELETE
elif entry2 != _NULL_ENTRY:
change_type = CHANGE_ADD
else:
# Both were None because at least one was a tree.
continue
yield TreeChange(change_type, entry1, entry2)
def _all_eq(seq, key, value):
for e in seq:
if key(e) != value:
return False
return True
def _all_same(seq, key):
return _all_eq(seq[1:], key, key(seq[0]))
def tree_changes_for_merge(store, parent_tree_ids, tree_id,
rename_detector=None):
"""Get the tree changes for a merge tree relative to all its parents.
:param store: An ObjectStore for looking up objects.
:param parent_tree_ids: An iterable of the SHAs of the parent trees.
:param tree_id: The SHA of the merge tree.
:param rename_detector: RenameDetector object for detecting renames.
:return: Iterator over lists of TreeChange objects, one per conflicted path
in the merge.
Each list contains one element per parent, with the TreeChange for that
path relative to that parent. An element may be None if it never
existed in one parent and was deleted in two others.
A path is only included in the output if it is a conflict, i.e. its SHA
in the merge tree is not found in any of the parents, or in the case of
deletes, if not all of the old SHAs match.
"""
all_parent_changes = [tree_changes(store, t, tree_id,
rename_detector=rename_detector)
for t in parent_tree_ids]
num_parents = len(parent_tree_ids)
changes_by_path = defaultdict(lambda: [None] * num_parents)
# Organize by path.
for i, parent_changes in enumerate(all_parent_changes):
for change in parent_changes:
if change.type == CHANGE_DELETE:
path = change.old.path
else:
path = change.new.path
changes_by_path[path][i] = change
def old_sha(c):
return c.old.sha
def change_type(c):
return c.type
# Yield only conflicting changes.
for _, changes in sorted(changes_by_path.items()):
assert len(changes) == num_parents
have = [c for c in changes if c is not None]
if _all_eq(have, change_type, CHANGE_DELETE):
if not _all_same(have, old_sha):
yield changes
elif not _all_same(have, change_type):
yield changes
elif None not in changes:
# If no change was found relative to one parent, that means the SHA
# must have matched the SHA in that parent, so it is not a
# conflict.
yield changes
_BLOCK_SIZE = 64
def _count_blocks(obj):
"""Count the blocks in an object.
Splits the data into blocks either on lines or <=64-byte chunks of lines.
:param obj: The object to count blocks for.
:return: A dict of block hashcode -> total bytes occurring.
"""
block_counts = defaultdict(int)
block = BytesIO()
n = 0
# Cache attrs as locals to avoid expensive lookups in the inner loop.
block_write = block.write
block_seek = block.seek
block_truncate = block.truncate
block_getvalue = block.getvalue
for c in chain(*obj.as_raw_chunks()):
if sys.version_info[0] == 3:
c = c.to_bytes(1, 'big')
block_write(c)
n += 1
if c == b'\n' or n == _BLOCK_SIZE:
value = block_getvalue()
block_counts[hash(value)] += len(value)
block_seek(0)
block_truncate()
n = 0
if n > 0:
last_block = block_getvalue()
block_counts[hash(last_block)] += len(last_block)
return block_counts
def _common_bytes(blocks1, blocks2):
"""Count the number of common bytes in two block count dicts.
:param block1: The first dict of block hashcode -> total bytes.
:param block2: The second dict of block hashcode -> total bytes.
:return: The number of bytes in common between blocks1 and blocks2. This is
only approximate due to possible hash collisions.
"""
# Iterate over the smaller of the two dicts, since this is symmetrical.
if len(blocks1) > len(blocks2):
blocks1, blocks2 = blocks2, blocks1
score = 0
for block, count1 in blocks1.items():
count2 = blocks2.get(block)
if count2:
score += min(count1, count2)
return score
def _similarity_score(obj1, obj2, block_cache=None):
"""Compute a similarity score for two objects.
:param obj1: The first object to score.
:param obj2: The second object to score.
:param block_cache: An optional dict of SHA to block counts to cache
results between calls.
:return: The similarity score between the two objects, defined as the
number of bytes in common between the two objects divided by the
maximum size, scaled to the range 0-100.
"""
if block_cache is None:
block_cache = {}
if obj1.id not in block_cache:
block_cache[obj1.id] = _count_blocks(obj1)
if obj2.id not in block_cache:
block_cache[obj2.id] = _count_blocks(obj2)
common_bytes = _common_bytes(block_cache[obj1.id], block_cache[obj2.id])
max_size = max(obj1.raw_length(), obj2.raw_length())
if not max_size:
return _MAX_SCORE
return int(float(common_bytes) * _MAX_SCORE / max_size)
def _tree_change_key(entry):
# Sort by old path then new path. If only one exists, use it for both keys.
path1 = entry.old.path
path2 = entry.new.path
if path1 is None:
path1 = path2
if path2 is None:
path2 = path1
return (path1, path2)
class RenameDetector(object):
"""Object for handling rename detection between two trees."""
def __init__(self, store, rename_threshold=RENAME_THRESHOLD,
max_files=MAX_FILES,
rewrite_threshold=REWRITE_THRESHOLD,
find_copies_harder=False):
"""Initialize the rename detector.
:param store: An ObjectStore for looking up objects.
:param rename_threshold: The threshold similarity score for considering
an add/delete pair to be a rename/copy; see _similarity_score.
:param max_files: The maximum number of adds and deletes to consider,
or None for no limit. The detector is guaranteed to compare no more
than max_files ** 2 add/delete pairs. This limit is provided
because rename detection can be quadratic in the project size. If
the limit is exceeded, no content rename detection is attempted.
:param rewrite_threshold: The threshold similarity score below which a
modify should be considered a delete/add, or None to not break
modifies; see _similarity_score.
:param find_copies_harder: If True, consider unmodified files when
detecting copies.
"""
self._store = store
self._rename_threshold = rename_threshold
self._rewrite_threshold = rewrite_threshold
self._max_files = max_files
self._find_copies_harder = find_copies_harder
self._want_unchanged = False
def _reset(self):
self._adds = []
self._deletes = []
self._changes = []
def _should_split(self, change):
if (self._rewrite_threshold is None or change.type != CHANGE_MODIFY or
change.old.sha == change.new.sha):
return False
old_obj = self._store[change.old.sha]
new_obj = self._store[change.new.sha]
return _similarity_score(old_obj, new_obj) < self._rewrite_threshold
def _add_change(self, change):
if change.type == CHANGE_ADD:
self._adds.append(change)
elif change.type == CHANGE_DELETE:
self._deletes.append(change)
elif self._should_split(change):
self._deletes.append(TreeChange.delete(change.old))
self._adds.append(TreeChange.add(change.new))
elif ((self._find_copies_harder and change.type == CHANGE_UNCHANGED)
or change.type == CHANGE_MODIFY):
# Treat all modifies as potential deletes for rename detection,
# but don't split them (to avoid spurious renames). Setting
# find_copies_harder means we treat unchanged the same as
# modified.
self._deletes.append(change)
else:
self._changes.append(change)
def _collect_changes(self, tree1_id, tree2_id):
want_unchanged = self._find_copies_harder or self._want_unchanged
for change in tree_changes(self._store, tree1_id, tree2_id,
want_unchanged=want_unchanged):
self._add_change(change)
def _prune(self, add_paths, delete_paths):
self._adds = [a for a in self._adds if a.new.path not in add_paths]
self._deletes = [d for d in self._deletes
if d.old.path not in delete_paths]
def _find_exact_renames(self):
add_map = defaultdict(list)
for add in self._adds:
add_map[add.new.sha].append(add.new)
delete_map = defaultdict(list)
for delete in self._deletes:
# Keep track of whether the delete was actually marked as a delete.
# If not, it needs to be marked as a copy.
is_delete = delete.type == CHANGE_DELETE
delete_map[delete.old.sha].append((delete.old, is_delete))
add_paths = set()
delete_paths = set()
for sha, sha_deletes in delete_map.items():
sha_adds = add_map[sha]
for (old, is_delete), new in zip(sha_deletes, sha_adds):
if stat.S_IFMT(old.mode) != stat.S_IFMT(new.mode):
continue
if is_delete:
delete_paths.add(old.path)
add_paths.add(new.path)
new_type = is_delete and CHANGE_RENAME or CHANGE_COPY
self._changes.append(TreeChange(new_type, old, new))
num_extra_adds = len(sha_adds) - len(sha_deletes)
# TODO(dborowitz): Less arbitrary way of dealing with extra copies.
old = sha_deletes[0][0]
if num_extra_adds > 0:
for new in sha_adds[-num_extra_adds:]:
add_paths.add(new.path)
self._changes.append(TreeChange(CHANGE_COPY, old, new))
self._prune(add_paths, delete_paths)
def _should_find_content_renames(self):
return len(self._adds) * len(self._deletes) <= self._max_files ** 2
def _rename_type(self, check_paths, delete, add):
if check_paths and delete.old.path == add.new.path:
# If the paths match, this must be a split modify, so make sure it
# comes out as a modify.
return CHANGE_MODIFY
elif delete.type != CHANGE_DELETE:
# If it's in deletes but not marked as a delete, it must have been
# added due to find_copies_harder, and needs to be marked as a
# copy.
return CHANGE_COPY
return CHANGE_RENAME
def _find_content_rename_candidates(self):
candidates = self._candidates = []
# TODO: Optimizations:
# - Compare object sizes before counting blocks.
# - Skip if delete's S_IFMT differs from all adds.
# - Skip if adds or deletes is empty.
# Match C git's behavior of not attempting to find content renames if
# the matrix size exceeds the threshold.
if not self._should_find_content_renames():
return
block_cache = {}
check_paths = self._rename_threshold is not None
for delete in self._deletes:
if S_ISGITLINK(delete.old.mode):
continue # Git links don't exist in this repo.
old_sha = delete.old.sha
old_obj = self._store[old_sha]
block_cache[old_sha] = _count_blocks(old_obj)
for add in self._adds:
if stat.S_IFMT(delete.old.mode) != stat.S_IFMT(add.new.mode):
continue
new_obj = self._store[add.new.sha]
score = _similarity_score(old_obj, new_obj,
block_cache=block_cache)
if score > self._rename_threshold:
new_type = self._rename_type(check_paths, delete, add)
rename = TreeChange(new_type, delete.old, add.new)
candidates.append((-score, rename))
def _choose_content_renames(self):
# Sort scores from highest to lowest, but keep names in ascending
# order.
self._candidates.sort()
delete_paths = set()
add_paths = set()
for _, change in self._candidates:
new_path = change.new.path
if new_path in add_paths:
continue
old_path = change.old.path
orig_type = change.type
if old_path in delete_paths:
change = TreeChange(CHANGE_COPY, change.old, change.new)
# If the candidate was originally a copy, that means it came from a
# modified or unchanged path, so we don't want to prune it.
if orig_type != CHANGE_COPY:
delete_paths.add(old_path)
add_paths.add(new_path)
self._changes.append(change)
self._prune(add_paths, delete_paths)
def _join_modifies(self):
if self._rewrite_threshold is None:
return
modifies = {}
delete_map = dict((d.old.path, d) for d in self._deletes)
for add in self._adds:
path = add.new.path
delete = delete_map.get(path)
if (delete is not None and
stat.S_IFMT(delete.old.mode) == stat.S_IFMT(add.new.mode)):
modifies[path] = TreeChange(CHANGE_MODIFY, delete.old, add.new)
self._adds = [a for a in self._adds if a.new.path not in modifies]
self._deletes = [a for a in self._deletes if a.new.path not in
modifies]
self._changes += modifies.values()
def _sorted_changes(self):
result = []
result.extend(self._adds)
result.extend(self._deletes)
result.extend(self._changes)
result.sort(key=_tree_change_key)
return result
def _prune_unchanged(self):
if self._want_unchanged:
return
self._deletes = [
d for d in self._deletes if d.type != CHANGE_UNCHANGED]
def changes_with_renames(self, tree1_id, tree2_id, want_unchanged=False):
"""Iterate TreeChanges between two tree SHAs, with rename detection."""
self._reset()
self._want_unchanged = want_unchanged
self._collect_changes(tree1_id, tree2_id)
self._find_exact_renames()
self._find_content_rename_candidates()
self._choose_content_renames()
self._join_modifies()
self._prune_unchanged()
return self._sorted_changes()
# Hold on to the pure-python implementations for testing.
_is_tree_py = _is_tree
_merge_entries_py = _merge_entries
_count_blocks_py = _count_blocks
try:
# Try to import C versions
from dulwich._diff_tree import _is_tree, _merge_entries, _count_blocks
except ImportError:
pass
diff --git a/dulwich/object_store.py b/dulwich/object_store.py
index b34ea9c8..fac3e291 100644
--- a/dulwich/object_store.py
+++ b/dulwich/object_store.py
@@ -1,1303 +1,1306 @@
# object_store.py -- Object store for git objects
# Copyright (C) 2008-2013 Jelmer Vernooij
# and others
#
# Dulwich is dual-licensed under the Apache License, Version 2.0 and the GNU
# General Public License as public by the Free Software Foundation; version 2.0
# or (at your option) any later version. You can redistribute it and/or
# modify it under the terms of either of these two licenses.
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# You should have received a copy of the licenses; if not, see
# for a copy of the GNU General Public License
# and for a copy of the Apache
# License, Version 2.0.
#
"""Git object store interfaces and implementation."""
from io import BytesIO
import errno
from itertools import chain
import os
import stat
import sys
import tempfile
import time
from dulwich.diff_tree import (
tree_changes,
walk_trees,
)
from dulwich.errors import (
NotTreeError,
)
from dulwich.file import GitFile
from dulwich.objects import (
Commit,
ShaFile,
Tag,
Tree,
ZERO_SHA,
hex_to_sha,
sha_to_hex,
hex_to_filename,
S_ISGITLINK,
object_class,
)
from dulwich.pack import (
Pack,
PackData,
PackInflater,
iter_sha1,
pack_objects_to_data,
write_pack_header,
write_pack_index_v2,
write_pack_data,
write_pack_object,
compute_file_sha,
PackIndexer,
PackStreamCopier,
)
INFODIR = 'info'
PACKDIR = 'pack'
class BaseObjectStore(object):
"""Object store interface."""
def determine_wants_all(self, refs):
return [sha for (ref, sha) in refs.items()
if sha not in self and not ref.endswith(b"^{}") and
not sha == ZERO_SHA]
def iter_shas(self, shas):
"""Iterate over the objects for the specified shas.
:param shas: Iterable object with SHAs
:return: Object iterator
"""
return ObjectStoreIterator(self, shas)
def contains_loose(self, sha):
"""Check if a particular object is present by SHA1 and is loose."""
raise NotImplementedError(self.contains_loose)
def contains_packed(self, sha):
"""Check if a particular object is present by SHA1 and is packed."""
raise NotImplementedError(self.contains_packed)
def __contains__(self, sha):
"""Check if a particular object is present by SHA1.
This method makes no distinction between loose and packed objects.
"""
return self.contains_packed(sha) or self.contains_loose(sha)
@property
def packs(self):
"""Iterable of pack objects."""
raise NotImplementedError
def get_raw(self, name):
"""Obtain the raw text for an object.
:param name: sha for the object.
:return: tuple with numeric type and object contents.
"""
raise NotImplementedError(self.get_raw)
def __getitem__(self, sha):
"""Obtain an object by SHA1."""
type_num, uncomp = self.get_raw(sha)
return ShaFile.from_raw_string(type_num, uncomp, sha=sha)
def __iter__(self):
"""Iterate over the SHAs that are present in this store."""
raise NotImplementedError(self.__iter__)
def add_object(self, obj):
"""Add a single object to this object store.
"""
raise NotImplementedError(self.add_object)
def add_objects(self, objects):
"""Add a set of objects to this object store.
:param objects: Iterable over a list of (object, path) tuples
"""
raise NotImplementedError(self.add_objects)
def add_pack_data(self, count, pack_data):
"""Add pack data to this object store.
:param num_items: Number of items to add
:param pack_data: Iterator over pack data tuples
"""
if count == 0:
# Don't bother writing an empty pack file
return
f, commit, abort = self.add_pack()
try:
write_pack_data(f, count, pack_data)
except BaseException:
abort()
raise
else:
return commit()
def tree_changes(self, source, target, want_unchanged=False,
- include_trees=False):
+ include_trees=False, change_type_same=False):
"""Find the differences between the contents of two trees
:param source: SHA1 of the source tree
:param target: SHA1 of the target tree
:param want_unchanged: Whether unchanged files should be reported
:param include_trees: Whether to include trees
+ :param change_type_same: Whether to report files changing
+ type in the same entry.
:return: Iterator over tuples with
(oldpath, newpath), (oldmode, newmode), (oldsha, newsha)
"""
for change in tree_changes(self, source, target,
want_unchanged=want_unchanged,
- include_trees=include_trees):
+ include_trees=include_trees,
+ change_type_same=change_type_same):
yield ((change.old.path, change.new.path),
(change.old.mode, change.new.mode),
(change.old.sha, change.new.sha))
def iter_tree_contents(self, tree_id, include_trees=False):
"""Iterate the contents of a tree and all subtrees.
Iteration is depth-first pre-order, as in e.g. os.walk.
:param tree_id: SHA1 of the tree.
:param include_trees: If True, include tree objects in the iteration.
:return: Iterator over TreeEntry namedtuples for all the objects in a
tree.
"""
for entry, _ in walk_trees(self, tree_id, None):
if ((entry.mode is not None and
not stat.S_ISDIR(entry.mode)) or include_trees):
yield entry
def find_missing_objects(self, haves, wants, progress=None,
get_tagged=None,
get_parents=lambda commit: commit.parents):
"""Find the missing objects required for a set of revisions.
:param haves: Iterable over SHAs already in common.
:param wants: Iterable over SHAs of objects to fetch.
:param progress: Simple progress function that will be called with
updated progress strings.
:param get_tagged: Function that returns a dict of pointed-to sha ->
tag sha for including tags.
:param get_parents: Optional function for getting the parents of a
commit.
:return: Iterator over (sha, path) pairs.
"""
finder = MissingObjectFinder(self, haves, wants, progress, get_tagged,
get_parents=get_parents)
return iter(finder.next, None)
def find_common_revisions(self, graphwalker):
"""Find which revisions this store has in common using graphwalker.
:param graphwalker: A graphwalker object.
:return: List of SHAs that are in common
"""
haves = []
sha = next(graphwalker)
while sha:
if sha in self:
haves.append(sha)
graphwalker.ack(sha)
sha = next(graphwalker)
return haves
def generate_pack_contents(self, have, want, progress=None):
"""Iterate over the contents of a pack file.
:param have: List of SHA1s of objects that should not be sent
:param want: List of SHA1s of objects that should be sent
:param progress: Optional progress reporting method
"""
return self.iter_shas(self.find_missing_objects(have, want, progress))
def generate_pack_data(self, have, want, progress=None, ofs_delta=True):
"""Generate pack data objects for a set of wants/haves.
:param have: List of SHA1s of objects that should not be sent
:param want: List of SHA1s of objects that should be sent
:param ofs_delta: Whether OFS deltas can be included
:param progress: Optional progress reporting method
"""
# TODO(jelmer): More efficient implementation
return pack_objects_to_data(
self.generate_pack_contents(have, want, progress))
def peel_sha(self, sha):
"""Peel all tags from a SHA.
:param sha: The object SHA to peel.
:return: The fully-peeled SHA1 of a tag object, after peeling all
intermediate tags; if the original ref does not point to a tag,
this will equal the original SHA1.
"""
obj = self[sha]
obj_class = object_class(obj.type_name)
while obj_class is Tag:
obj_class, sha = obj.object
obj = self[sha]
return obj
def _collect_ancestors(self, heads, common=set(),
get_parents=lambda commit: commit.parents):
"""Collect all ancestors of heads up to (excluding) those in common.
:param heads: commits to start from
:param common: commits to end at, or empty set to walk repository
completely
:param get_parents: Optional function for getting the parents of a
commit.
:return: a tuple (A, B) where A - all commits reachable
from heads but not present in common, B - common (shared) elements
that are directly reachable from heads
"""
bases = set()
commits = set()
queue = []
queue.extend(heads)
while queue:
e = queue.pop(0)
if e in common:
bases.add(e)
elif e not in commits:
commits.add(e)
cmt = self[e]
queue.extend(get_parents(cmt))
return (commits, bases)
def close(self):
"""Close any files opened by this object store."""
# Default implementation is a NO-OP
class PackBasedObjectStore(BaseObjectStore):
def __init__(self):
self._pack_cache = {}
@property
def alternates(self):
return []
def contains_packed(self, sha):
"""Check if a particular object is present by SHA1 and is packed.
This does not check alternates.
"""
for pack in self.packs:
if sha in pack:
return True
return False
def __contains__(self, sha):
"""Check if a particular object is present by SHA1.
This method makes no distinction between loose and packed objects.
"""
if self.contains_packed(sha) or self.contains_loose(sha):
return True
for alternate in self.alternates:
if sha in alternate:
return True
return False
def _pack_cache_stale(self):
"""Check whether the pack cache is stale."""
raise NotImplementedError(self._pack_cache_stale)
def _add_known_pack(self, base_name, pack):
"""Add a newly appeared pack to the cache by path.
"""
prev_pack = self._pack_cache.get(base_name)
if prev_pack is not pack:
self._pack_cache[base_name] = pack
if prev_pack:
prev_pack.close()
def _flush_pack_cache(self):
pack_cache = self._pack_cache
self._pack_cache = {}
while pack_cache:
(name, pack) = pack_cache.popitem()
pack.close()
def close(self):
self._flush_pack_cache()
@property
def packs(self):
"""List with pack objects."""
if self._pack_cache is None or self._pack_cache_stale():
self._update_pack_cache()
return self._pack_cache.values()
def _iter_alternate_objects(self):
"""Iterate over the SHAs of all the objects in alternate stores."""
for alternate in self.alternates:
for alternate_object in alternate:
yield alternate_object
def _iter_loose_objects(self):
"""Iterate over the SHAs of all loose objects."""
raise NotImplementedError(self._iter_loose_objects)
def _get_loose_object(self, sha):
raise NotImplementedError(self._get_loose_object)
def _remove_loose_object(self, sha):
raise NotImplementedError(self._remove_loose_object)
def _remove_pack(self, name):
raise NotImplementedError(self._remove_pack)
def pack_loose_objects(self):
"""Pack loose objects.
:return: Number of objects packed
"""
objects = set()
for sha in self._iter_loose_objects():
objects.add((self._get_loose_object(sha), None))
self.add_objects(list(objects))
for obj, path in objects:
self._remove_loose_object(obj.id)
return len(objects)
def repack(self):
"""Repack the packs in this repository.
Note that this implementation is fairly naive and currently keeps all
objects in memory while it repacks.
"""
loose_objects = set()
for sha in self._iter_loose_objects():
loose_objects.add(self._get_loose_object(sha))
objects = {(obj, None) for obj in loose_objects}
old_packs = {p.name(): p for p in self.packs}
for name, pack in old_packs.items():
objects.update((obj, None) for obj in pack.iterobjects())
self._flush_pack_cache()
# The name of the consolidated pack might match the name of a
# pre-existing pack. Take care not to remove the newly created
# consolidated pack.
consolidated = self.add_objects(objects)
old_packs.pop(consolidated.name(), None)
for obj in loose_objects:
self._remove_loose_object(obj.id)
for name, pack in old_packs.items():
self._remove_pack(pack)
self._update_pack_cache()
return len(objects)
def __iter__(self):
"""Iterate over the SHAs that are present in this store."""
iterables = (list(self.packs) + [self._iter_loose_objects()] +
[self._iter_alternate_objects()])
return chain(*iterables)
def contains_loose(self, sha):
"""Check if a particular object is present by SHA1 and is loose.
This does not check alternates.
"""
return self._get_loose_object(sha) is not None
def get_raw(self, name):
"""Obtain the raw fulltext for an object.
:param name: sha for the object.
:return: tuple with numeric type and object contents.
"""
if len(name) == 40:
sha = hex_to_sha(name)
hexsha = name
elif len(name) == 20:
sha = name
hexsha = None
else:
raise AssertionError("Invalid object name %r" % name)
for pack in self.packs:
try:
return pack.get_raw(sha)
except KeyError:
pass
if hexsha is None:
hexsha = sha_to_hex(name)
ret = self._get_loose_object(hexsha)
if ret is not None:
return ret.type_num, ret.as_raw_string()
for alternate in self.alternates:
try:
return alternate.get_raw(hexsha)
except KeyError:
pass
raise KeyError(hexsha)
def add_objects(self, objects):
"""Add a set of objects to this object store.
:param objects: Iterable over (object, path) tuples, should support
__len__.
:return: Pack object of the objects written.
"""
return self.add_pack_data(*pack_objects_to_data(objects))
class DiskObjectStore(PackBasedObjectStore):
"""Git-style object store that exists on disk."""
def __init__(self, path):
"""Open an object store.
:param path: Path of the object store.
"""
super(DiskObjectStore, self).__init__()
self.path = path
self.pack_dir = os.path.join(self.path, PACKDIR)
self._pack_cache_time = 0
self._pack_cache = {}
self._alternates = None
def __repr__(self):
return "<%s(%r)>" % (self.__class__.__name__, self.path)
@property
def alternates(self):
if self._alternates is not None:
return self._alternates
self._alternates = []
for path in self._read_alternate_paths():
self._alternates.append(DiskObjectStore(path))
return self._alternates
def _read_alternate_paths(self):
try:
f = GitFile(os.path.join(self.path, INFODIR, "alternates"), 'rb')
except (OSError, IOError) as e:
if e.errno == errno.ENOENT:
return
raise
with f:
for line in f.readlines():
line = line.rstrip(b"\n")
if line[0] == b"#":
continue
if os.path.isabs(line):
yield line.decode(sys.getfilesystemencoding())
else:
yield os.path.join(self.path, line).decode(
sys.getfilesystemencoding())
def add_alternate_path(self, path):
"""Add an alternate path to this object store.
"""
try:
os.mkdir(os.path.join(self.path, INFODIR))
except OSError as e:
if e.errno != errno.EEXIST:
raise
alternates_path = os.path.join(self.path, INFODIR, "alternates")
with GitFile(alternates_path, 'wb') as f:
try:
orig_f = open(alternates_path, 'rb')
except (OSError, IOError) as e:
if e.errno != errno.ENOENT:
raise
else:
with orig_f:
f.write(orig_f.read())
f.write(path.encode(sys.getfilesystemencoding()) + b"\n")
if not os.path.isabs(path):
path = os.path.join(self.path, path)
self.alternates.append(DiskObjectStore(path))
def _update_pack_cache(self):
try:
pack_dir_contents = os.listdir(self.pack_dir)
except OSError as e:
if e.errno == errno.ENOENT:
self._pack_cache_time = 0
self.close()
return
raise
self._pack_cache_time = max(
os.stat(self.pack_dir).st_mtime, time.time())
pack_files = set()
for name in pack_dir_contents:
if name.startswith("pack-") and name.endswith(".pack"):
# verify that idx exists first (otherwise the pack was not yet
# fully written)
idx_name = os.path.splitext(name)[0] + ".idx"
if idx_name in pack_dir_contents:
pack_name = name[:-len(".pack")]
pack_files.add(pack_name)
# Open newly appeared pack files
for f in pack_files:
if f not in self._pack_cache:
self._pack_cache[f] = Pack(os.path.join(self.pack_dir, f))
# Remove disappeared pack files
for f in set(self._pack_cache) - pack_files:
self._pack_cache.pop(f).close()
def _pack_cache_stale(self):
try:
return os.stat(self.pack_dir).st_mtime >= self._pack_cache_time
except OSError as e:
if e.errno == errno.ENOENT:
return True
raise
def _get_shafile_path(self, sha):
# Check from object dir
return hex_to_filename(self.path, sha)
def _iter_loose_objects(self):
for base in os.listdir(self.path):
if len(base) != 2:
continue
for rest in os.listdir(os.path.join(self.path, base)):
yield (base+rest).encode(sys.getfilesystemencoding())
def _get_loose_object(self, sha):
path = self._get_shafile_path(sha)
try:
return ShaFile.from_path(path)
except (OSError, IOError) as e:
if e.errno == errno.ENOENT:
return None
raise
def _remove_loose_object(self, sha):
os.remove(self._get_shafile_path(sha))
def _remove_pack(self, pack):
os.remove(pack.data.path)
os.remove(pack.index.path)
def _get_pack_basepath(self, entries):
suffix = iter_sha1(entry[0] for entry in entries)
# TODO: Handle self.pack_dir being bytes
suffix = suffix.decode('ascii')
return os.path.join(self.pack_dir, "pack-" + suffix)
def _complete_thin_pack(self, f, path, copier, indexer):
"""Move a specific file containing a pack into the pack directory.
:note: The file should be on the same file system as the
packs directory.
:param f: Open file object for the pack.
:param path: Path to the pack file.
:param copier: A PackStreamCopier to use for writing pack data.
:param indexer: A PackIndexer for indexing the pack.
"""
entries = list(indexer)
# Update the header with the new number of objects.
f.seek(0)
write_pack_header(f, len(entries) + len(indexer.ext_refs()))
# Must flush before reading (http://bugs.python.org/issue3207)
f.flush()
# Rescan the rest of the pack, computing the SHA with the new header.
new_sha = compute_file_sha(f, end_ofs=-20)
# Must reposition before writing (http://bugs.python.org/issue3207)
f.seek(0, os.SEEK_CUR)
# Complete the pack.
for ext_sha in indexer.ext_refs():
assert len(ext_sha) == 20
type_num, data = self.get_raw(ext_sha)
offset = f.tell()
crc32 = write_pack_object(f, type_num, data, sha=new_sha)
entries.append((ext_sha, offset, crc32))
pack_sha = new_sha.digest()
f.write(pack_sha)
f.close()
# Move the pack in.
entries.sort()
pack_base_name = self._get_pack_basepath(entries)
if sys.platform == 'win32':
try:
os.rename(path, pack_base_name + '.pack')
except WindowsError:
os.remove(pack_base_name + '.pack')
os.rename(path, pack_base_name + '.pack')
else:
os.rename(path, pack_base_name + '.pack')
# Write the index.
index_file = GitFile(pack_base_name + '.idx', 'wb')
try:
write_pack_index_v2(index_file, entries, pack_sha)
index_file.close()
finally:
index_file.abort()
# Add the pack to the store and return it.
final_pack = Pack(pack_base_name)
final_pack.check_length_and_checksum()
self._add_known_pack(pack_base_name, final_pack)
return final_pack
def add_thin_pack(self, read_all, read_some):
"""Add a new thin pack to this object store.
Thin packs are packs that contain deltas with parents that exist
outside the pack. They should never be placed in the object store
directly, and always indexed and completed as they are copied.
:param read_all: Read function that blocks until the number of
requested bytes are read.
:param read_some: Read function that returns at least one byte, but may
not return the number of bytes requested.
:return: A Pack object pointing at the now-completed thin pack in the
objects/pack directory.
"""
fd, path = tempfile.mkstemp(dir=self.path, prefix='tmp_pack_')
with os.fdopen(fd, 'w+b') as f:
indexer = PackIndexer(f, resolve_ext_ref=self.get_raw)
copier = PackStreamCopier(read_all, read_some, f,
delta_iter=indexer)
copier.verify()
return self._complete_thin_pack(f, path, copier, indexer)
def move_in_pack(self, path):
"""Move a specific file containing a pack into the pack directory.
:note: The file should be on the same file system as the
packs directory.
:param path: Path to the pack file.
"""
with PackData(path) as p:
entries = p.sorted_entries()
basename = self._get_pack_basepath(entries)
with GitFile(basename+".idx", "wb") as f:
write_pack_index_v2(f, entries, p.get_stored_checksum())
if self._pack_cache is None or self._pack_cache_stale():
self._update_pack_cache()
try:
return self._pack_cache[basename]
except KeyError:
pass
else:
os.unlink(path)
os.rename(path, basename + ".pack")
final_pack = Pack(basename)
self._add_known_pack(basename, final_pack)
return final_pack
def add_pack(self):
"""Add a new pack to this object store.
:return: Fileobject to write to, a commit function to
call when the pack is finished and an abort
function.
"""
fd, path = tempfile.mkstemp(dir=self.pack_dir, suffix=".pack")
f = os.fdopen(fd, 'wb')
def commit():
f.flush()
os.fsync(fd)
f.close()
if os.path.getsize(path) > 0:
return self.move_in_pack(path)
else:
os.remove(path)
return None
def abort():
f.close()
os.remove(path)
return f, commit, abort
def add_object(self, obj):
"""Add a single object to this object store.
:param obj: Object to add
"""
path = self._get_shafile_path(obj.id)
dir = os.path.dirname(path)
try:
os.mkdir(dir)
except OSError as e:
if e.errno != errno.EEXIST:
raise
if os.path.exists(path):
return # Already there, no need to write again
with GitFile(path, 'wb') as f:
f.write(obj.as_legacy_object())
@classmethod
def init(cls, path):
try:
os.mkdir(path)
except OSError as e:
if e.errno != errno.EEXIST:
raise
os.mkdir(os.path.join(path, "info"))
os.mkdir(os.path.join(path, PACKDIR))
return cls(path)
class MemoryObjectStore(BaseObjectStore):
"""Object store that keeps all objects in memory."""
def __init__(self):
super(MemoryObjectStore, self).__init__()
self._data = {}
def _to_hexsha(self, sha):
if len(sha) == 40:
return sha
elif len(sha) == 20:
return sha_to_hex(sha)
else:
raise ValueError("Invalid sha %r" % (sha,))
def contains_loose(self, sha):
"""Check if a particular object is present by SHA1 and is loose."""
return self._to_hexsha(sha) in self._data
def contains_packed(self, sha):
"""Check if a particular object is present by SHA1 and is packed."""
return False
def __iter__(self):
"""Iterate over the SHAs that are present in this store."""
return iter(self._data.keys())
@property
def packs(self):
"""List with pack objects."""
return []
def get_raw(self, name):
"""Obtain the raw text for an object.
:param name: sha for the object.
:return: tuple with numeric type and object contents.
"""
obj = self[self._to_hexsha(name)]
return obj.type_num, obj.as_raw_string()
def __getitem__(self, name):
return self._data[self._to_hexsha(name)].copy()
def __delitem__(self, name):
"""Delete an object from this store, for testing only."""
del self._data[self._to_hexsha(name)]
def add_object(self, obj):
"""Add a single object to this object store.
"""
self._data[obj.id] = obj.copy()
def add_objects(self, objects):
"""Add a set of objects to this object store.
:param objects: Iterable over a list of (object, path) tuples
"""
for obj, path in objects:
self.add_object(obj)
def add_pack(self):
"""Add a new pack to this object store.
Because this object store doesn't support packs, we extract and add the
individual objects.
:return: Fileobject to write to and a commit function to
call when the pack is finished.
"""
f = BytesIO()
def commit():
p = PackData.from_file(BytesIO(f.getvalue()), f.tell())
f.close()
for obj in PackInflater.for_pack_data(p, self.get_raw):
self.add_object(obj)
def abort():
pass
return f, commit, abort
def _complete_thin_pack(self, f, indexer):
"""Complete a thin pack by adding external references.
:param f: Open file object for the pack.
:param indexer: A PackIndexer for indexing the pack.
"""
entries = list(indexer)
# Update the header with the new number of objects.
f.seek(0)
write_pack_header(f, len(entries) + len(indexer.ext_refs()))
# Rescan the rest of the pack, computing the SHA with the new header.
new_sha = compute_file_sha(f, end_ofs=-20)
# Complete the pack.
for ext_sha in indexer.ext_refs():
assert len(ext_sha) == 20
type_num, data = self.get_raw(ext_sha)
write_pack_object(f, type_num, data, sha=new_sha)
pack_sha = new_sha.digest()
f.write(pack_sha)
def add_thin_pack(self, read_all, read_some):
"""Add a new thin pack to this object store.
Thin packs are packs that contain deltas with parents that exist
outside the pack. Because this object store doesn't support packs, we
extract and add the individual objects.
:param read_all: Read function that blocks until the number of
requested bytes are read.
:param read_some: Read function that returns at least one byte, but may
not return the number of bytes requested.
"""
f, commit, abort = self.add_pack()
try:
indexer = PackIndexer(f, resolve_ext_ref=self.get_raw)
copier = PackStreamCopier(read_all, read_some, f,
delta_iter=indexer)
copier.verify()
self._complete_thin_pack(f, indexer)
except BaseException:
abort()
raise
else:
commit()
class ObjectIterator(object):
"""Interface for iterating over objects."""
def iterobjects(self):
raise NotImplementedError(self.iterobjects)
class ObjectStoreIterator(ObjectIterator):
"""ObjectIterator that works on top of an ObjectStore."""
def __init__(self, store, sha_iter):
"""Create a new ObjectIterator.
:param store: Object store to retrieve from
:param sha_iter: Iterator over (sha, path) tuples
"""
self.store = store
self.sha_iter = sha_iter
self._shas = []
def __iter__(self):
"""Yield tuple with next object and path."""
for sha, path in self.itershas():
yield self.store[sha], path
def iterobjects(self):
"""Iterate over just the objects."""
for o, path in self:
yield o
def itershas(self):
"""Iterate over the SHAs."""
for sha in self._shas:
yield sha
for sha in self.sha_iter:
self._shas.append(sha)
yield sha
def __contains__(self, needle):
"""Check if an object is present.
:note: This checks if the object is present in
the underlying object store, not if it would
be yielded by the iterator.
:param needle: SHA1 of the object to check for
"""
return needle in self.store
def __getitem__(self, key):
"""Find an object by SHA1.
:note: This retrieves the object from the underlying
object store. It will also succeed if the object would
not be returned by the iterator.
"""
return self.store[key]
def __len__(self):
"""Return the number of objects."""
return len(list(self.itershas()))
def empty(self):
iter = self.itershas()
try:
iter()
except StopIteration:
return True
else:
return False
def __bool__(self):
"""Indicate whether this object has contents."""
return not self.empty()
def tree_lookup_path(lookup_obj, root_sha, path):
"""Look up an object in a Git tree.
:param lookup_obj: Callback for retrieving object by SHA1
:param root_sha: SHA1 of the root tree
:param path: Path to lookup
:return: A tuple of (mode, SHA) of the resulting path.
"""
tree = lookup_obj(root_sha)
if not isinstance(tree, Tree):
raise NotTreeError(root_sha)
return tree.lookup_path(lookup_obj, path)
def _collect_filetree_revs(obj_store, tree_sha, kset):
"""Collect SHA1s of files and directories for specified tree.
:param obj_store: Object store to get objects by SHA from
:param tree_sha: tree reference to walk
:param kset: set to fill with references to files and directories
"""
filetree = obj_store[tree_sha]
for name, mode, sha in filetree.iteritems():
if not S_ISGITLINK(mode) and sha not in kset:
kset.add(sha)
if stat.S_ISDIR(mode):
_collect_filetree_revs(obj_store, sha, kset)
def _split_commits_and_tags(obj_store, lst, ignore_unknown=False):
"""Split object id list into three lists with commit, tag, and other SHAs.
Commits referenced by tags are included into commits
list as well. Only SHA1s known in this repository will get
through, and unless ignore_unknown argument is True, KeyError
is thrown for SHA1 missing in the repository
:param obj_store: Object store to get objects by SHA1 from
:param lst: Collection of commit and tag SHAs
:param ignore_unknown: True to skip SHA1 missing in the repository
silently.
:return: A tuple of (commits, tags, others) SHA1s
"""
commits = set()
tags = set()
others = set()
for e in lst:
try:
o = obj_store[e]
except KeyError:
if not ignore_unknown:
raise
else:
if isinstance(o, Commit):
commits.add(e)
elif isinstance(o, Tag):
tags.add(e)
tagged = o.object[1]
c, t, o = _split_commits_and_tags(
obj_store, [tagged], ignore_unknown=ignore_unknown)
commits |= c
tags |= t
others |= o
else:
others.add(e)
return (commits, tags, others)
class MissingObjectFinder(object):
"""Find the objects missing from another object store.
:param object_store: Object store containing at least all objects to be
sent
:param haves: SHA1s of commits not to send (already present in target)
:param wants: SHA1s of commits to send
:param progress: Optional function to report progress to.
:param get_tagged: Function that returns a dict of pointed-to sha -> tag
sha for including tags.
:param get_parents: Optional function for getting the parents of a commit.
:param tagged: dict of pointed-to sha -> tag sha for including tags
"""
def __init__(self, object_store, haves, wants, progress=None,
get_tagged=None, get_parents=lambda commit: commit.parents):
self.object_store = object_store
self._get_parents = get_parents
# process Commits and Tags differently
# Note, while haves may list commits/tags not available locally,
# and such SHAs would get filtered out by _split_commits_and_tags,
# wants shall list only known SHAs, and otherwise
# _split_commits_and_tags fails with KeyError
have_commits, have_tags, have_others = (
_split_commits_and_tags(object_store, haves, True))
want_commits, want_tags, want_others = (
_split_commits_and_tags(object_store, wants, False))
# all_ancestors is a set of commits that shall not be sent
# (complete repository up to 'haves')
all_ancestors = object_store._collect_ancestors(
have_commits, get_parents=self._get_parents)[0]
# all_missing - complete set of commits between haves and wants
# common - commits from all_ancestors we hit into while
# traversing parent hierarchy of wants
missing_commits, common_commits = object_store._collect_ancestors(
want_commits, all_ancestors, get_parents=self._get_parents)
self.sha_done = set()
# Now, fill sha_done with commits and revisions of
# files and directories known to be both locally
# and on target. Thus these commits and files
# won't get selected for fetch
for h in common_commits:
self.sha_done.add(h)
cmt = object_store[h]
_collect_filetree_revs(object_store, cmt.tree, self.sha_done)
# record tags we have as visited, too
for t in have_tags:
self.sha_done.add(t)
missing_tags = want_tags.difference(have_tags)
missing_others = want_others.difference(have_others)
# in fact, what we 'want' is commits, tags, and others
# we've found missing
wants = missing_commits.union(missing_tags)
wants = wants.union(missing_others)
self.objects_to_send = set([(w, None, False) for w in wants])
if progress is None:
self.progress = lambda x: None
else:
self.progress = progress
self._tagged = get_tagged and get_tagged() or {}
def add_todo(self, entries):
self.objects_to_send.update([e for e in entries
if not e[0] in self.sha_done])
def next(self):
while True:
if not self.objects_to_send:
return None
(sha, name, leaf) = self.objects_to_send.pop()
if sha not in self.sha_done:
break
if not leaf:
o = self.object_store[sha]
if isinstance(o, Commit):
self.add_todo([(o.tree, "", False)])
elif isinstance(o, Tree):
self.add_todo([(s, n, not stat.S_ISDIR(m))
for n, m, s in o.iteritems()
if not S_ISGITLINK(m)])
elif isinstance(o, Tag):
self.add_todo([(o.object[1], None, False)])
if sha in self._tagged:
self.add_todo([(self._tagged[sha], None, True)])
self.sha_done.add(sha)
self.progress(("counting objects: %d\r" %
len(self.sha_done)).encode('ascii'))
return (sha, name)
__next__ = next
class ObjectStoreGraphWalker(object):
"""Graph walker that finds what commits are missing from an object store.
:ivar heads: Revisions without descendants in the local repo
:ivar get_parents: Function to retrieve parents in the local repo
"""
def __init__(self, local_heads, get_parents):
"""Create a new instance.
:param local_heads: Heads to start search with
:param get_parents: Function for finding the parents of a SHA1.
"""
self.heads = set(local_heads)
self.get_parents = get_parents
self.parents = {}
def ack(self, sha):
"""Ack that a revision and its ancestors are present in the source."""
if len(sha) != 40:
raise ValueError("unexpected sha %r received" % sha)
ancestors = set([sha])
# stop if we run out of heads to remove
while self.heads:
for a in ancestors:
if a in self.heads:
self.heads.remove(a)
# collect all ancestors
new_ancestors = set()
for a in ancestors:
ps = self.parents.get(a)
if ps is not None:
new_ancestors.update(ps)
self.parents[a] = None
# no more ancestors; stop
if not new_ancestors:
break
ancestors = new_ancestors
def next(self):
"""Iterate over ancestors of heads in the target."""
if self.heads:
ret = self.heads.pop()
ps = self.get_parents(ret)
self.parents[ret] = ps
self.heads.update(
[p for p in ps if p not in self.parents])
return ret
return None
__next__ = next
def commit_tree_changes(object_store, tree, changes):
"""Commit a specified set of changes to a tree structure.
This will apply a set of changes on top of an existing tree, storing new
objects in object_store.
changes are a list of tuples with (path, mode, object_sha).
Paths can be both blobs and trees. See the mode and
object sha to None deletes the path.
This method works especially well if there are only a small
number of changes to a big tree. For a large number of changes
to a large tree, use e.g. commit_tree.
:param object_store: Object store to store new objects in
and retrieve old ones from.
:param tree: Original tree root
:param changes: changes to apply
:return: New tree root object
"""
# TODO(jelmer): Save up the objects and add them using .add_objects
# rather than with individual calls to .add_object.
nested_changes = {}
for (path, new_mode, new_sha) in changes:
try:
(dirname, subpath) = path.split(b'/', 1)
except ValueError:
if new_sha is None:
del tree[path]
else:
tree[path] = (new_mode, new_sha)
else:
nested_changes.setdefault(dirname, []).append(
(subpath, new_mode, new_sha))
for name, subchanges in nested_changes.items():
try:
orig_subtree = object_store[tree[name][1]]
except KeyError:
orig_subtree = Tree()
subtree = commit_tree_changes(object_store, orig_subtree, subchanges)
if len(subtree) == 0:
del tree[name]
else:
tree[name] = (stat.S_IFDIR, subtree.id)
object_store.add_object(tree)
return tree
class OverlayObjectStore(BaseObjectStore):
"""Object store that can overlay multiple object stores."""
def __init__(self, bases, add_store=None):
self.bases = bases
self.add_store = add_store
def add_object(self, object):
if self.add_store is None:
raise NotImplementedError(self.add_object)
return self.add_store.add_object(object)
def add_objects(self, objects):
if self.add_store is None:
raise NotImplementedError(self.add_object)
return self.add_store.add_objects(objects)
@property
def packs(self):
ret = []
for b in self.bases:
ret.extend(b.packs)
return ret
def __iter__(self):
done = set()
for b in self.bases:
for o_id in b:
if o_id not in done:
yield o_id
done.add(o_id)
def get_raw(self, sha_id):
for b in self.bases:
try:
return b.get_raw(sha_id)
except KeyError:
pass
else:
raise KeyError(sha_id)
def contains_packed(self, sha):
for b in self.bases:
if b.contains_packed(sha):
return True
else:
return False
def contains_loose(self, sha):
for b in self.bases:
if b.contains_loose(sha):
return True
else:
return False
diff --git a/dulwich/tests/test_diff_tree.py b/dulwich/tests/test_diff_tree.py
index c53cf7a1..cc5fbbeb 100644
--- a/dulwich/tests/test_diff_tree.py
+++ b/dulwich/tests/test_diff_tree.py
@@ -1,938 +1,948 @@
# test_diff_tree.py -- Tests for file and tree diff utilities.
# Copyright (C) 2010 Google, Inc.
#
# Dulwich is dual-licensed under the Apache License, Version 2.0 and the GNU
# General Public License as public by the Free Software Foundation; version 2.0
# or (at your option) any later version. You can redistribute it and/or
# modify it under the terms of either of these two licenses.
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# You should have received a copy of the licenses; if not, see
# for a copy of the GNU General Public License
# and for a copy of the Apache
# License, Version 2.0.
#
"""Tests for file and tree diff utilities."""
from itertools import permutations
from dulwich.diff_tree import (
CHANGE_MODIFY,
CHANGE_RENAME,
CHANGE_COPY,
CHANGE_UNCHANGED,
TreeChange,
_merge_entries,
_merge_entries_py,
tree_changes,
tree_changes_for_merge,
_count_blocks,
_count_blocks_py,
_similarity_score,
_tree_change_key,
RenameDetector,
_is_tree,
_is_tree_py
)
from dulwich.index import (
commit_tree,
)
from dulwich.object_store import (
MemoryObjectStore,
)
from dulwich.objects import (
ShaFile,
Blob,
TreeEntry,
Tree,
)
from dulwich.tests import (
TestCase,
)
from dulwich.tests.utils import (
F,
make_object,
functest_builder,
ext_functest_builder,
)
class DiffTestCase(TestCase):
def setUp(self):
super(DiffTestCase, self).setUp()
self.store = MemoryObjectStore()
self.empty_tree = self.commit_tree([])
def commit_tree(self, entries):
commit_blobs = []
for entry in entries:
if len(entry) == 2:
path, obj = entry
mode = F
else:
path, obj, mode = entry
if isinstance(obj, Blob):
self.store.add_object(obj)
sha = obj.id
else:
sha = obj
commit_blobs.append((path, sha, mode))
return self.store[commit_tree(self.store, commit_blobs)]
class TreeChangesTest(DiffTestCase):
def setUp(self):
super(TreeChangesTest, self).setUp()
self.detector = RenameDetector(self.store)
def assertMergeFails(self, merge_entries, name, mode, sha):
t = Tree()
t[name] = (mode, sha)
self.assertRaises((TypeError, ValueError), merge_entries, '', t, t)
def _do_test_merge_entries(self, merge_entries):
blob_a1 = make_object(Blob, data=b'a1')
blob_a2 = make_object(Blob, data=b'a2')
blob_b1 = make_object(Blob, data=b'b1')
blob_c2 = make_object(Blob, data=b'c2')
tree1 = self.commit_tree([(b'a', blob_a1, 0o100644),
(b'b', blob_b1, 0o100755)])
tree2 = self.commit_tree([(b'a', blob_a2, 0o100644),
(b'c', blob_c2, 0o100755)])
self.assertEqual([], merge_entries(b'', self.empty_tree,
self.empty_tree))
self.assertEqual(
[((None, None, None), (b'a', 0o100644, blob_a1.id)),
((None, None, None), (b'b', 0o100755, blob_b1.id)), ],
merge_entries(b'', self.empty_tree, tree1))
self.assertEqual(
[((None, None, None), (b'x/a', 0o100644, blob_a1.id)),
((None, None, None), (b'x/b', 0o100755, blob_b1.id)), ],
merge_entries(b'x', self.empty_tree, tree1))
self.assertEqual(
[((b'a', 0o100644, blob_a2.id), (None, None, None)),
((b'c', 0o100755, blob_c2.id), (None, None, None)), ],
merge_entries(b'', tree2, self.empty_tree))
self.assertEqual(
[((b'a', 0o100644, blob_a1.id), (b'a', 0o100644, blob_a2.id)),
((b'b', 0o100755, blob_b1.id), (None, None, None)),
((None, None, None), (b'c', 0o100755, blob_c2.id)), ],
merge_entries(b'', tree1, tree2))
self.assertEqual(
[((b'a', 0o100644, blob_a2.id), (b'a', 0o100644, blob_a1.id)),
((None, None, None), (b'b', 0o100755, blob_b1.id)),
((b'c', 0o100755, blob_c2.id), (None, None, None)), ],
merge_entries(b'', tree2, tree1))
self.assertMergeFails(merge_entries, 0xdeadbeef, 0o100644, '1' * 40)
self.assertMergeFails(merge_entries, b'a', b'deadbeef', '1' * 40)
self.assertMergeFails(merge_entries, b'a', 0o100644, 0xdeadbeef)
test_merge_entries = functest_builder(_do_test_merge_entries,
_merge_entries_py)
test_merge_entries_extension = ext_functest_builder(_do_test_merge_entries,
_merge_entries)
def _do_test_is_tree(self, is_tree):
self.assertFalse(is_tree(TreeEntry(None, None, None)))
self.assertFalse(is_tree(TreeEntry(b'a', 0o100644, b'a' * 40)))
self.assertFalse(is_tree(TreeEntry(b'a', 0o100755, b'a' * 40)))
self.assertFalse(is_tree(TreeEntry(b'a', 0o120000, b'a' * 40)))
self.assertTrue(is_tree(TreeEntry(b'a', 0o040000, b'a' * 40)))
self.assertRaises(TypeError, is_tree, TreeEntry(b'a', b'x', b'a' * 40))
self.assertRaises(AttributeError, is_tree, 1234)
test_is_tree = functest_builder(_do_test_is_tree, _is_tree_py)
test_is_tree_extension = ext_functest_builder(_do_test_is_tree, _is_tree)
def assertChangesEqual(self, expected, tree1, tree2, **kwargs):
actual = list(tree_changes(self.store, tree1.id, tree2.id, **kwargs))
self.assertEqual(expected, actual)
# For brevity, the following tests use tuples instead of TreeEntry objects.
def test_tree_changes_empty(self):
self.assertChangesEqual([], self.empty_tree, self.empty_tree)
def test_tree_changes_no_changes(self):
blob = make_object(Blob, data=b'blob')
tree = self.commit_tree([(b'a', blob), (b'b/c', blob)])
self.assertChangesEqual([], self.empty_tree, self.empty_tree)
self.assertChangesEqual([], tree, tree)
self.assertChangesEqual(
[TreeChange(CHANGE_UNCHANGED, (b'a', F, blob.id),
(b'a', F, blob.id)),
TreeChange(CHANGE_UNCHANGED, (b'b/c', F, blob.id),
(b'b/c', F, blob.id))],
tree, tree, want_unchanged=True)
def test_tree_changes_add_delete(self):
blob_a = make_object(Blob, data=b'a')
blob_b = make_object(Blob, data=b'b')
tree = self.commit_tree([(b'a', blob_a, 0o100644),
(b'x/b', blob_b, 0o100755)])
self.assertChangesEqual(
[TreeChange.add((b'a', 0o100644, blob_a.id)),
TreeChange.add((b'x/b', 0o100755, blob_b.id))],
self.empty_tree, tree)
self.assertChangesEqual(
[TreeChange.delete((b'a', 0o100644, blob_a.id)),
TreeChange.delete((b'x/b', 0o100755, blob_b.id))],
tree, self.empty_tree)
def test_tree_changes_modify_contents(self):
blob_a1 = make_object(Blob, data=b'a1')
blob_a2 = make_object(Blob, data=b'a2')
tree1 = self.commit_tree([(b'a', blob_a1)])
tree2 = self.commit_tree([(b'a', blob_a2)])
self.assertChangesEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob_a1.id),
(b'a', F, blob_a2.id))],
tree1, tree2)
def test_tree_changes_modify_mode(self):
blob_a = make_object(Blob, data=b'a')
tree1 = self.commit_tree([(b'a', blob_a, 0o100644)])
tree2 = self.commit_tree([(b'a', blob_a, 0o100755)])
self.assertChangesEqual(
[TreeChange(CHANGE_MODIFY, (b'a', 0o100644, blob_a.id),
(b'a', 0o100755, blob_a.id))],
tree1, tree2)
def test_tree_changes_change_type(self):
blob_a1 = make_object(Blob, data=b'a')
blob_a2 = make_object(Blob, data=b'/foo/bar')
tree1 = self.commit_tree([(b'a', blob_a1, 0o100644)])
tree2 = self.commit_tree([(b'a', blob_a2, 0o120000)])
self.assertChangesEqual(
[TreeChange.delete((b'a', 0o100644, blob_a1.id)),
TreeChange.add((b'a', 0o120000, blob_a2.id))],
tree1, tree2)
+ def test_tree_changes_change_type_same(self):
+ blob_a1 = make_object(Blob, data=b'a')
+ blob_a2 = make_object(Blob, data=b'/foo/bar')
+ tree1 = self.commit_tree([(b'a', blob_a1, 0o100644)])
+ tree2 = self.commit_tree([(b'a', blob_a2, 0o120000)])
+ self.assertChangesEqual(
+ [TreeChange(CHANGE_MODIFY, (b'a', 0o100644, blob_a1.id),
+ (b'a', 0o120000, blob_a2.id))],
+ tree1, tree2, change_type_same=True)
+
def test_tree_changes_to_tree(self):
blob_a = make_object(Blob, data=b'a')
blob_x = make_object(Blob, data=b'x')
tree1 = self.commit_tree([(b'a', blob_a)])
tree2 = self.commit_tree([(b'a/x', blob_x)])
self.assertChangesEqual(
[TreeChange.delete((b'a', F, blob_a.id)),
TreeChange.add((b'a/x', F, blob_x.id))],
tree1, tree2)
def test_tree_changes_complex(self):
blob_a_1 = make_object(Blob, data=b'a1_1')
blob_bx1_1 = make_object(Blob, data=b'bx1_1')
blob_bx2_1 = make_object(Blob, data=b'bx2_1')
blob_by1_1 = make_object(Blob, data=b'by1_1')
blob_by2_1 = make_object(Blob, data=b'by2_1')
tree1 = self.commit_tree([
(b'a', blob_a_1),
(b'b/x/1', blob_bx1_1),
(b'b/x/2', blob_bx2_1),
(b'b/y/1', blob_by1_1),
(b'b/y/2', blob_by2_1),
])
blob_a_2 = make_object(Blob, data=b'a1_2')
blob_bx1_2 = blob_bx1_1
blob_by_2 = make_object(Blob, data=b'by_2')
blob_c_2 = make_object(Blob, data=b'c_2')
tree2 = self.commit_tree([
(b'a', blob_a_2),
(b'b/x/1', blob_bx1_2),
(b'b/y', blob_by_2),
(b'c', blob_c_2),
])
self.assertChangesEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob_a_1.id),
(b'a', F, blob_a_2.id)),
TreeChange.delete((b'b/x/2', F, blob_bx2_1.id)),
TreeChange.add((b'b/y', F, blob_by_2.id)),
TreeChange.delete((b'b/y/1', F, blob_by1_1.id)),
TreeChange.delete((b'b/y/2', F, blob_by2_1.id)),
TreeChange.add((b'c', F, blob_c_2.id))],
tree1, tree2)
def test_tree_changes_name_order(self):
blob = make_object(Blob, data=b'a')
tree1 = self.commit_tree([(b'a', blob), (b'a.', blob), (b'a..', blob)])
# Tree order is the reverse of this, so if we used tree order, 'a..'
# would not be merged.
tree2 = self.commit_tree(
[(b'a/x', blob), (b'a./x', blob), (b'a..', blob)])
self.assertChangesEqual(
[TreeChange.delete((b'a', F, blob.id)),
TreeChange.add((b'a/x', F, blob.id)),
TreeChange.delete((b'a.', F, blob.id)),
TreeChange.add((b'a./x', F, blob.id))],
tree1, tree2)
def test_tree_changes_prune(self):
blob_a1 = make_object(Blob, data=b'a1')
blob_a2 = make_object(Blob, data=b'a2')
blob_x = make_object(Blob, data=b'x')
tree1 = self.commit_tree([(b'a', blob_a1), (b'b/x', blob_x)])
tree2 = self.commit_tree([(b'a', blob_a2), (b'b/x', blob_x)])
# Remove identical items so lookups will fail unless we prune.
subtree = self.store[tree1[b'b'][1]]
for entry in subtree.items():
del self.store[entry.sha]
del self.store[subtree.id]
self.assertChangesEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob_a1.id),
(b'a', F, blob_a2.id))],
tree1, tree2)
def test_tree_changes_rename_detector(self):
blob_a1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob_a2 = make_object(Blob, data=b'a\nb\nc\ne\n')
blob_b = make_object(Blob, data=b'b')
tree1 = self.commit_tree([(b'a', blob_a1), (b'b', blob_b)])
tree2 = self.commit_tree([(b'c', blob_a2), (b'b', blob_b)])
detector = RenameDetector(self.store)
self.assertChangesEqual(
[TreeChange.delete((b'a', F, blob_a1.id)),
TreeChange.add((b'c', F, blob_a2.id))],
tree1, tree2)
self.assertChangesEqual(
[TreeChange.delete((b'a', F, blob_a1.id)),
TreeChange(CHANGE_UNCHANGED, (b'b', F, blob_b.id),
(b'b', F, blob_b.id)),
TreeChange.add((b'c', F, blob_a2.id))],
tree1, tree2, want_unchanged=True)
self.assertChangesEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob_a1.id),
(b'c', F, blob_a2.id))],
tree1, tree2, rename_detector=detector)
self.assertChangesEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob_a1.id),
(b'c', F, blob_a2.id)),
TreeChange(CHANGE_UNCHANGED, (b'b', F, blob_b.id),
(b'b', F, blob_b.id))],
tree1, tree2, rename_detector=detector, want_unchanged=True)
def assertChangesForMergeEqual(self, expected, parent_trees, merge_tree,
**kwargs):
parent_tree_ids = [t.id for t in parent_trees]
actual = list(tree_changes_for_merge(
self.store, parent_tree_ids, merge_tree.id, **kwargs))
self.assertEqual(expected, actual)
parent_tree_ids.reverse()
expected = [list(reversed(cs)) for cs in expected]
actual = list(tree_changes_for_merge(
self.store, parent_tree_ids, merge_tree.id, **kwargs))
self.assertEqual(expected, actual)
def test_tree_changes_for_merge_add_no_conflict(self):
blob = make_object(Blob, data=b'blob')
parent1 = self.commit_tree([])
parent2 = merge = self.commit_tree([(b'a', blob)])
self.assertChangesForMergeEqual([], [parent1, parent2], merge)
self.assertChangesForMergeEqual([], [parent2, parent2], merge)
def test_tree_changes_for_merge_add_modify_conflict(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
parent1 = self.commit_tree([])
parent2 = self.commit_tree([(b'a', blob1)])
merge = self.commit_tree([(b'a', blob2)])
self.assertChangesForMergeEqual(
[[TreeChange.add((b'a', F, blob2.id)),
TreeChange(CHANGE_MODIFY, (b'a', F, blob1.id),
(b'a', F, blob2.id))]],
[parent1, parent2], merge)
def test_tree_changes_for_merge_modify_modify_conflict(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
blob3 = make_object(Blob, data=b'3')
parent1 = self.commit_tree([(b'a', blob1)])
parent2 = self.commit_tree([(b'a', blob2)])
merge = self.commit_tree([(b'a', blob3)])
self.assertChangesForMergeEqual(
[[TreeChange(CHANGE_MODIFY, (b'a', F, blob1.id),
(b'a', F, blob3.id)),
TreeChange(CHANGE_MODIFY, (b'a', F, blob2.id),
(b'a', F, blob3.id))]],
[parent1, parent2], merge)
def test_tree_changes_for_merge_modify_no_conflict(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
parent1 = self.commit_tree([(b'a', blob1)])
parent2 = merge = self.commit_tree([(b'a', blob2)])
self.assertChangesForMergeEqual([], [parent1, parent2], merge)
def test_tree_changes_for_merge_delete_delete_conflict(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
parent1 = self.commit_tree([(b'a', blob1)])
parent2 = self.commit_tree([(b'a', blob2)])
merge = self.commit_tree([])
self.assertChangesForMergeEqual(
[[TreeChange.delete((b'a', F, blob1.id)),
TreeChange.delete((b'a', F, blob2.id))]],
[parent1, parent2], merge)
def test_tree_changes_for_merge_delete_no_conflict(self):
blob = make_object(Blob, data=b'blob')
has = self.commit_tree([(b'a', blob)])
doesnt_have = self.commit_tree([])
self.assertChangesForMergeEqual([], [has, has], doesnt_have)
self.assertChangesForMergeEqual([], [has, doesnt_have], doesnt_have)
def test_tree_changes_for_merge_octopus_no_conflict(self):
r = list(range(5))
blobs = [make_object(Blob, data=bytes(i)) for i in r]
parents = [self.commit_tree([(b'a', blobs[i])]) for i in r]
for i in r:
# Take the SHA from each of the parents.
self.assertChangesForMergeEqual([], parents, parents[i])
def test_tree_changes_for_merge_octopus_modify_conflict(self):
# Because the octopus merge strategy is limited, I doubt it's possible
# to create this with the git command line. But the output is well-
# defined, so test it anyway.
r = list(range(5))
parent_blobs = [make_object(Blob, data=bytes(i)) for i in r]
merge_blob = make_object(Blob, data=b'merge')
parents = [self.commit_tree([(b'a', parent_blobs[i])]) for i in r]
merge = self.commit_tree([(b'a', merge_blob)])
expected = [[TreeChange(CHANGE_MODIFY, (b'a', F, parent_blobs[i].id),
(b'a', F, merge_blob.id)) for i in r]]
self.assertChangesForMergeEqual(expected, parents, merge)
def test_tree_changes_for_merge_octopus_delete(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'3')
parent1 = self.commit_tree([(b'a', blob1)])
parent2 = self.commit_tree([(b'a', blob2)])
parent3 = merge = self.commit_tree([])
self.assertChangesForMergeEqual([], [parent1, parent1, parent1], merge)
self.assertChangesForMergeEqual([], [parent1, parent1, parent3], merge)
self.assertChangesForMergeEqual([], [parent1, parent3, parent3], merge)
self.assertChangesForMergeEqual(
[[TreeChange.delete((b'a', F, blob1.id)),
TreeChange.delete((b'a', F, blob2.id)),
None]],
[parent1, parent2, parent3], merge)
def test_tree_changes_for_merge_add_add_same_conflict(self):
blob = make_object(Blob, data=b'a\nb\nc\nd\n')
parent1 = self.commit_tree([(b'a', blob)])
parent2 = self.commit_tree([])
merge = self.commit_tree([(b'b', blob)])
add = TreeChange.add((b'b', F, blob.id))
self.assertChangesForMergeEqual(
[[add, add]], [parent1, parent2], merge)
def test_tree_changes_for_merge_add_exact_rename_conflict(self):
blob = make_object(Blob, data=b'a\nb\nc\nd\n')
parent1 = self.commit_tree([(b'a', blob)])
parent2 = self.commit_tree([])
merge = self.commit_tree([(b'b', blob)])
self.assertChangesForMergeEqual(
[[TreeChange(CHANGE_RENAME, (b'a', F, blob.id),
(b'b', F, blob.id)),
TreeChange.add((b'b', F, blob.id))]],
[parent1, parent2], merge, rename_detector=self.detector)
def test_tree_changes_for_merge_add_content_rename_conflict(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
parent1 = self.commit_tree([(b'a', blob1)])
parent2 = self.commit_tree([])
merge = self.commit_tree([(b'b', blob2)])
self.assertChangesForMergeEqual(
[[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob2.id)),
TreeChange.add((b'b', F, blob2.id))]],
[parent1, parent2], merge, rename_detector=self.detector)
def test_tree_changes_for_merge_modify_rename_conflict(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
parent1 = self.commit_tree([(b'a', blob1)])
parent2 = self.commit_tree([(b'b', blob1)])
merge = self.commit_tree([(b'b', blob2)])
self.assertChangesForMergeEqual(
[[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob2.id)),
TreeChange(CHANGE_MODIFY, (b'b', F, blob1.id),
(b'b', F, blob2.id))]],
[parent1, parent2], merge, rename_detector=self.detector)
class RenameDetectionTest(DiffTestCase):
def _do_test_count_blocks(self, count_blocks):
blob = make_object(Blob, data=b'a\nb\na\n')
self.assertEqual({hash(b'a\n'): 4, hash(b'b\n'): 2},
count_blocks(blob))
test_count_blocks = functest_builder(_do_test_count_blocks,
_count_blocks_py)
test_count_blocks_extension = ext_functest_builder(_do_test_count_blocks,
_count_blocks)
def _do_test_count_blocks_no_newline(self, count_blocks):
blob = make_object(Blob, data=b'a\na')
self.assertEqual({hash(b'a\n'): 2, hash(b'a'): 1}, _count_blocks(blob))
test_count_blocks_no_newline = functest_builder(
_do_test_count_blocks_no_newline, _count_blocks_py)
test_count_blocks_no_newline_extension = ext_functest_builder(
_do_test_count_blocks_no_newline, _count_blocks)
def _do_test_count_blocks_chunks(self, count_blocks):
blob = ShaFile.from_raw_chunks(Blob.type_num, [b'a\nb', b'\na\n'])
self.assertEqual({hash(b'a\n'): 4, hash(b'b\n'): 2},
_count_blocks(blob))
test_count_blocks_chunks = functest_builder(_do_test_count_blocks_chunks,
_count_blocks_py)
test_count_blocks_chunks_extension = ext_functest_builder(
_do_test_count_blocks_chunks, _count_blocks)
def _do_test_count_blocks_long_lines(self, count_blocks):
a = b'a' * 64
data = a + b'xxx\ny\n' + a + b'zzz\n'
blob = make_object(Blob, data=data)
self.assertEqual({hash(b'a' * 64): 128, hash(b'xxx\n'): 4,
hash(b'y\n'): 2, hash(b'zzz\n'): 4},
_count_blocks(blob))
test_count_blocks_long_lines = functest_builder(
_do_test_count_blocks_long_lines, _count_blocks_py)
test_count_blocks_long_lines_extension = ext_functest_builder(
_do_test_count_blocks_long_lines, _count_blocks)
def assertSimilar(self, expected_score, blob1, blob2):
self.assertEqual(expected_score, _similarity_score(blob1, blob2))
self.assertEqual(expected_score, _similarity_score(blob2, blob1))
def test_similarity_score(self):
blob0 = make_object(Blob, data=b'')
blob1 = make_object(Blob, data=b'ab\ncd\ncd\n')
blob2 = make_object(Blob, data=b'ab\n')
blob3 = make_object(Blob, data=b'cd\n')
blob4 = make_object(Blob, data=b'cd\ncd\n')
self.assertSimilar(100, blob0, blob0)
self.assertSimilar(0, blob0, blob1)
self.assertSimilar(33, blob1, blob2)
self.assertSimilar(33, blob1, blob3)
self.assertSimilar(66, blob1, blob4)
self.assertSimilar(0, blob2, blob3)
self.assertSimilar(50, blob3, blob4)
def test_similarity_score_cache(self):
blob1 = make_object(Blob, data=b'ab\ncd\n')
blob2 = make_object(Blob, data=b'ab\n')
block_cache = {}
self.assertEqual(
50, _similarity_score(blob1, blob2, block_cache=block_cache))
self.assertEqual(set([blob1.id, blob2.id]), set(block_cache))
def fail_chunks():
self.fail('Unexpected call to as_raw_chunks()')
blob1.as_raw_chunks = blob2.as_raw_chunks = fail_chunks
blob1.raw_length = lambda: 6
blob2.raw_length = lambda: 3
self.assertEqual(
50, _similarity_score(blob1, blob2, block_cache=block_cache))
def test_tree_entry_sort(self):
sha = 'abcd' * 10
expected_entries = [
TreeChange.add(TreeEntry(b'aaa', F, sha)),
TreeChange(CHANGE_COPY, TreeEntry(b'bbb', F, sha),
TreeEntry(b'aab', F, sha)),
TreeChange(CHANGE_MODIFY, TreeEntry(b'bbb', F, sha),
TreeEntry(b'bbb', F, b'dabc' * 10)),
TreeChange(CHANGE_RENAME, TreeEntry(b'bbc', F, sha),
TreeEntry(b'ddd', F, sha)),
TreeChange.delete(TreeEntry(b'ccc', F, sha)),
]
for perm in permutations(expected_entries):
self.assertEqual(expected_entries,
sorted(perm, key=_tree_change_key))
def detect_renames(self, tree1, tree2, want_unchanged=False, **kwargs):
detector = RenameDetector(self.store, **kwargs)
return detector.changes_with_renames(tree1.id, tree2.id,
want_unchanged=want_unchanged)
def test_no_renames(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\ne\nf\n')
blob3 = make_object(Blob, data=b'a\nb\ng\nh\n')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'a', blob1), (b'b', blob3)])
self.assertEqual(
[TreeChange(CHANGE_MODIFY, (b'b', F, blob2.id),
(b'b', F, blob3.id))],
self.detect_renames(tree1, tree2))
def test_exact_rename_one_to_one(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'c', blob1), (b'd', blob2)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'c', F, blob1.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob2.id),
(b'd', F, blob2.id))],
self.detect_renames(tree1, tree2))
def test_exact_rename_split_different_type(self):
blob = make_object(Blob, data=b'/foo')
tree1 = self.commit_tree([(b'a', blob, 0o100644)])
tree2 = self.commit_tree([(b'a', blob, 0o120000)])
self.assertEqual(
[TreeChange.add((b'a', 0o120000, blob.id)),
TreeChange.delete((b'a', 0o100644, blob.id))],
self.detect_renames(tree1, tree2))
def test_exact_rename_and_different_type(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
tree1 = self.commit_tree([(b'a', blob1)])
tree2 = self.commit_tree([(b'a', blob2, 0o120000), (b'b', blob1)])
self.assertEqual(
[TreeChange.add((b'a', 0o120000, blob2.id)),
TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob1.id))],
self.detect_renames(tree1, tree2))
def test_exact_rename_one_to_many(self):
blob = make_object(Blob, data=b'1')
tree1 = self.commit_tree([(b'a', blob)])
tree2 = self.commit_tree([(b'b', blob), (b'c', blob)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob.id), (b'b', F, blob.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob.id), (b'c', F, blob.id))],
self.detect_renames(tree1, tree2))
def test_exact_rename_many_to_one(self):
blob = make_object(Blob, data=b'1')
tree1 = self.commit_tree([(b'a', blob), (b'b', blob)])
tree2 = self.commit_tree([(b'c', blob)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob.id), (b'c', F, blob.id)),
TreeChange.delete((b'b', F, blob.id))],
self.detect_renames(tree1, tree2))
def test_exact_rename_many_to_many(self):
blob = make_object(Blob, data=b'1')
tree1 = self.commit_tree([(b'a', blob), (b'b', blob)])
tree2 = self.commit_tree([(b'c', blob), (b'd', blob), (b'e', blob)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob.id),
(b'c', F, blob.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob.id),
(b'e', F, blob.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob.id),
(b'd', F, blob.id))],
self.detect_renames(tree1, tree2))
def test_exact_copy_modify(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
tree1 = self.commit_tree([(b'a', blob1)])
tree2 = self.commit_tree([(b'a', blob2), (b'b', blob1)])
self.assertEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob1.id),
(b'a', F, blob2.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob1.id),
(b'b', F, blob1.id))],
self.detect_renames(tree1, tree2))
def test_exact_copy_change_mode(self):
blob = make_object(Blob, data=b'a\nb\nc\nd\n')
tree1 = self.commit_tree([(b'a', blob)])
tree2 = self.commit_tree([(b'a', blob, 0o100755), (b'b', blob)])
self.assertEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob.id),
(b'a', 0o100755, blob.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob.id), (b'b', F, blob.id))],
self.detect_renames(tree1, tree2))
def test_rename_threshold(self):
blob1 = make_object(Blob, data=b'a\nb\nc\n')
blob2 = make_object(Blob, data=b'a\nb\nd\n')
tree1 = self.commit_tree([(b'a', blob1)])
tree2 = self.commit_tree([(b'b', blob2)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob2.id))],
self.detect_renames(tree1, tree2, rename_threshold=50))
self.assertEqual(
[TreeChange.delete((b'a', F, blob1.id)),
TreeChange.add((b'b', F, blob2.id))],
self.detect_renames(tree1, tree2, rename_threshold=75))
def test_content_rename_max_files(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd')
blob4 = make_object(Blob, data=b'a\nb\nc\ne\n')
blob2 = make_object(Blob, data=b'e\nf\ng\nh\n')
blob3 = make_object(Blob, data=b'e\nf\ng\ni\n')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'c', blob3), (b'd', blob4)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'd', F, blob4.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob2.id),
(b'c', F, blob3.id))],
self.detect_renames(tree1, tree2))
self.assertEqual(
[TreeChange.delete((b'a', F, blob1.id)),
TreeChange.delete((b'b', F, blob2.id)),
TreeChange.add((b'c', F, blob3.id)),
TreeChange.add((b'd', F, blob4.id))],
self.detect_renames(tree1, tree2, max_files=1))
def test_content_rename_one_to_one(self):
b11 = make_object(Blob, data=b'a\nb\nc\nd\n')
b12 = make_object(Blob, data=b'a\nb\nc\ne\n')
b21 = make_object(Blob, data=b'e\nf\ng\n\h')
b22 = make_object(Blob, data=b'e\nf\ng\n\i')
tree1 = self.commit_tree([(b'a', b11), (b'b', b21)])
tree2 = self.commit_tree([(b'c', b12), (b'd', b22)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, b11.id), (b'c', F, b12.id)),
TreeChange(CHANGE_RENAME, (b'b', F, b21.id), (b'd', F, b22.id))],
self.detect_renames(tree1, tree2))
def test_content_rename_one_to_one_ordering(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\ne\nf\n')
blob2 = make_object(Blob, data=b'a\nb\nc\nd\ng\nh\n')
# 6/10 match to blob1, 8/10 match to blob2
blob3 = make_object(Blob, data=b'a\nb\nc\nd\ng\ni\n')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'c', blob3)])
self.assertEqual(
[TreeChange.delete((b'a', F, blob1.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob2.id),
(b'c', F, blob3.id))],
self.detect_renames(tree1, tree2))
tree3 = self.commit_tree([(b'a', blob2), (b'b', blob1)])
tree4 = self.commit_tree([(b'c', blob3)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob2.id),
(b'c', F, blob3.id)),
TreeChange.delete((b'b', F, blob1.id))],
self.detect_renames(tree3, tree4))
def test_content_rename_one_to_many(self):
blob1 = make_object(Blob, data=b'aa\nb\nc\nd\ne\n')
blob2 = make_object(Blob, data=b'ab\nb\nc\nd\ne\n') # 8/11 match
blob3 = make_object(Blob, data=b'aa\nb\nc\nd\nf\n') # 9/11 match
tree1 = self.commit_tree([(b'a', blob1)])
tree2 = self.commit_tree([(b'b', blob2), (b'c', blob3)])
self.assertEqual(
[TreeChange(CHANGE_COPY, (b'a', F, blob1.id), (b'b', F, blob2.id)),
TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'c', F, blob3.id))],
self.detect_renames(tree1, tree2))
def test_content_rename_many_to_one(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
blob3 = make_object(Blob, data=b'a\nb\nc\nf\n')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'c', blob3)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'c', F, blob3.id)),
TreeChange.delete((b'b', F, blob2.id))],
self.detect_renames(tree1, tree2))
def test_content_rename_many_to_many(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
blob3 = make_object(Blob, data=b'a\nb\nc\nf\n')
blob4 = make_object(Blob, data=b'a\nb\nc\ng\n')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'c', blob3), (b'd', blob4)])
# TODO(dborowitz): Distribute renames rather than greedily choosing
# copies.
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'c', F, blob3.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob1.id), (b'd', F, blob4.id)),
TreeChange.delete((b'b', F, blob2.id))],
self.detect_renames(tree1, tree2))
def test_content_rename_with_more_deletions(self):
blob1 = make_object(Blob, data=b'')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob1), (b'c', blob1),
(b'd', blob1)])
tree2 = self.commit_tree([(b'e', blob1), (b'f', blob1), (b'g', blob1)])
self.maxDiff = None
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id), (b'e', F, blob1.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob1.id), (b'f', F, blob1.id)),
TreeChange(CHANGE_RENAME, (b'c', F, blob1.id), (b'g', F, blob1.id)),
TreeChange.delete((b'd', F, blob1.id))],
self.detect_renames(tree1, tree2))
def test_content_rename_gitlink(self):
blob1 = make_object(Blob, data=b'blob1')
blob2 = make_object(Blob, data=b'blob2')
link1 = b'1' * 40
link2 = b'2' * 40
tree1 = self.commit_tree([(b'a', blob1), (b'b', link1, 0o160000)])
tree2 = self.commit_tree([(b'c', blob2), (b'd', link2, 0o160000)])
self.assertEqual(
[TreeChange.delete((b'a', 0o100644, blob1.id)),
TreeChange.delete((b'b', 0o160000, link1)),
TreeChange.add((b'c', 0o100644, blob2.id)),
TreeChange.add((b'd', 0o160000, link2))],
self.detect_renames(tree1, tree2))
def test_exact_rename_swap(self):
blob1 = make_object(Blob, data=b'1')
blob2 = make_object(Blob, data=b'2')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'a', blob2), (b'b', blob1)])
self.assertEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob1.id),
(b'a', F, blob2.id)),
TreeChange(CHANGE_MODIFY, (b'b', F, blob2.id),
(b'b', F, blob1.id))],
self.detect_renames(tree1, tree2))
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob1.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob2.id),
(b'a', F, blob2.id))],
self.detect_renames(tree1, tree2, rewrite_threshold=50))
def test_content_rename_swap(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'e\nf\ng\nh\n')
blob3 = make_object(Blob, data=b'a\nb\nc\ne\n')
blob4 = make_object(Blob, data=b'e\nf\ng\ni\n')
tree1 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
tree2 = self.commit_tree([(b'a', blob4), (b'b', blob3)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob3.id)),
TreeChange(CHANGE_RENAME, (b'b', F, blob2.id),
(b'a', F, blob4.id))],
self.detect_renames(tree1, tree2, rewrite_threshold=60))
def test_rewrite_threshold(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
blob3 = make_object(Blob, data=b'a\nb\nf\ng\n')
tree1 = self.commit_tree([(b'a', blob1)])
tree2 = self.commit_tree([(b'a', blob3), (b'b', blob2)])
no_renames = [
TreeChange(CHANGE_MODIFY, (b'a', F, blob1.id),
(b'a', F, blob3.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob1.id), (b'b', F, blob2.id))]
self.assertEqual(
no_renames, self.detect_renames(tree1, tree2))
self.assertEqual(
no_renames, self.detect_renames(
tree1, tree2, rewrite_threshold=40))
self.assertEqual(
[TreeChange.add((b'a', F, blob3.id)),
TreeChange(CHANGE_RENAME, (b'a', F, blob1.id),
(b'b', F, blob2.id))],
self.detect_renames(tree1, tree2, rewrite_threshold=80))
def test_find_copies_harder_exact(self):
blob = make_object(Blob, data=b'blob')
tree1 = self.commit_tree([(b'a', blob)])
tree2 = self.commit_tree([(b'a', blob), (b'b', blob)])
self.assertEqual([TreeChange.add((b'b', F, blob.id))],
self.detect_renames(tree1, tree2))
self.assertEqual(
[TreeChange(CHANGE_COPY, (b'a', F, blob.id), (b'b', F, blob.id))],
self.detect_renames(tree1, tree2, find_copies_harder=True))
def test_find_copies_harder_content(self):
blob1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob2 = make_object(Blob, data=b'a\nb\nc\ne\n')
tree1 = self.commit_tree([(b'a', blob1)])
tree2 = self.commit_tree([(b'a', blob1), (b'b', blob2)])
self.assertEqual([TreeChange.add((b'b', F, blob2.id))],
self.detect_renames(tree1, tree2))
self.assertEqual(
[TreeChange(CHANGE_COPY, (b'a', F, blob1.id),
(b'b', F, blob2.id))],
self.detect_renames(tree1, tree2, find_copies_harder=True))
def test_find_copies_harder_with_rewrites(self):
blob_a1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob_a2 = make_object(Blob, data=b'f\ng\nh\ni\n')
blob_b2 = make_object(Blob, data=b'a\nb\nc\ne\n')
tree1 = self.commit_tree([(b'a', blob_a1)])
tree2 = self.commit_tree([(b'a', blob_a2), (b'b', blob_b2)])
self.assertEqual(
[TreeChange(CHANGE_MODIFY, (b'a', F, blob_a1.id),
(b'a', F, blob_a2.id)),
TreeChange(CHANGE_COPY, (b'a', F, blob_a1.id),
(b'b', F, blob_b2.id))],
self.detect_renames(tree1, tree2, find_copies_harder=True))
self.assertEqual(
[TreeChange.add((b'a', F, blob_a2.id)),
TreeChange(CHANGE_RENAME, (b'a', F, blob_a1.id),
(b'b', F, blob_b2.id))],
self.detect_renames(tree1, tree2, rewrite_threshold=50,
find_copies_harder=True))
def test_reuse_detector(self):
blob = make_object(Blob, data=b'blob')
tree1 = self.commit_tree([(b'a', blob)])
tree2 = self.commit_tree([(b'b', blob)])
detector = RenameDetector(self.store)
changes = [TreeChange(CHANGE_RENAME, (b'a', F, blob.id),
(b'b', F, blob.id))]
self.assertEqual(changes,
detector.changes_with_renames(tree1.id, tree2.id))
self.assertEqual(changes,
detector.changes_with_renames(tree1.id, tree2.id))
def test_want_unchanged(self):
blob_a1 = make_object(Blob, data=b'a\nb\nc\nd\n')
blob_b = make_object(Blob, data=b'b')
blob_c2 = make_object(Blob, data=b'a\nb\nc\ne\n')
tree1 = self.commit_tree([(b'a', blob_a1), (b'b', blob_b)])
tree2 = self.commit_tree([(b'c', blob_c2), (b'b', blob_b)])
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob_a1.id),
(b'c', F, blob_c2.id))],
self.detect_renames(tree1, tree2))
self.assertEqual(
[TreeChange(CHANGE_RENAME, (b'a', F, blob_a1.id),
(b'c', F, blob_c2.id)),
TreeChange(CHANGE_UNCHANGED, (b'b', F, blob_b.id),
(b'b', F, blob_b.id))],
self.detect_renames(tree1, tree2, want_unchanged=True))