INFO:swh.loader.git.loader.GitLoader:Load origin 'https://github.com/kubernetes/kubernetes' with type 'git' Enumerating objects: 1187812, done. Counting objects: 0% (1/666) Counting objects: 1% (7/666) Counting objects: 2% (14/666) Counting objects: 3% (20/666) Counting objects: 4% (27/666) Counting objects: 5% (34/666) Counting objects: 6% (40/666) Counting objects: 7% (47/666) Counting objects: 8% (54/666) Counting objects: 9% (60/666) Counting objects: 10% (67/666) Counting objects: 11% (74/666) Counting objects: 12% (80/666) Counting objects: 13% (87/666) Counting objects: 14% (94/666) Counting objects: 15% (100/666) Counting objects: 16% (107/666) Counting objects: 17% (114/666) Counting objects: 18% (120/666) Counting objects: 19% (127/666) Counting objects: 20% (134/666) Counting objects: 21% (140/666) Counting objects: 22% (147/666) Counting objects: 23% (154/666) Counting objects: 24% (160/666) Counting objects: 25% (167/666) Counting objects: 26% (174/666) Counting objects: 27% (180/666) Counting objects: 28% (187/666) Counting objects: 29% (194/666) Counting objects: 30% (200/666) Counting objects: 31% (207/666) Counting objects: 32% (214/666) Counting objects: 33% (220/666) Counting objects: 34% (227/666) Counting objects: 35% (234/666) Counting objects: 36% (240/666) Counting objects: 37% (247/666) Counting objects: 38% (254/666) Counting objects: 39% (260/666) Counting objects: 40% (267/666) Counting objects: 41% (274/666) Counting objects: 42% (280/666) Counting objects: 43% (287/666) Counting objects: 44% (294/666) Counting objects: 45% (300/666) Counting objects: 46% (307/666) Counting objects: 47% (314/666) Counting objects: 48% (320/666) Counting objects: 49% (327/666) Counting objects: 50% (333/666) Counting objects: 51% (340/666) Counting objects: 52% (347/666) Counting objects: 53% (353/666) Counting objects: 54% (360/666) Counting objects: 55% (367/666) Counting objects: 56% (373/666) Counting objects: 57% (380/666) Counting objects: 58% (387/666) Counting objects: 59% (393/666) Counting objects: 60% (400/666) Counting objects: 61% (407/666) Counting objects: 62% (413/666) Counting objects: 63% (420/666) Counting objects: 64% (427/666) Counting objects: 65% (433/666) Counting objects: 66% (440/666) Counting objects: 67% (447/666) Counting objects: 68% (453/666) Counting objects: 69% (460/666) Counting objects: 70% (467/666) Counting objects: 71% (473/666) Counting objects: 72% (480/666) Counting objects: 73% (487/666) Counting objects: 74% (493/666) Counting objects: 75% (500/666) Counting objects: 76% (507/666) Counting objects: 77% (513/666) Counting objects: 78% (520/666) Counting objects: 79% (527/666) Counting objects: 80% (533/666) Counting objects: 81% (540/666) Counting objects: 82% (547/666) Counting objects: 83% (553/666) Counting objects: 84% (560/666) Counting objects: 85% (567/666) Counting objects: 86% (573/666) Counting objects: 87% (580/666) Counting objects: 88% (587/666) Counting objects: 89% (593/666) Counting objects: 90% (600/666) Counting objects: 91% (607/666) Counting objects: 92% (613/666) Counting objects: 93% (620/666) Counting objects: 94% (627/666) Counting objects: 95% (633/666) Counting objects: 96% (640/666) Counting objects: 97% (647/666) Counting objects: 98% (653/666) Counting objects: 99% (660/666) Counting objects: 100% (666/666) Counting objects: 100% (666/666), done. Compressing objects: 0% (1/354) Compressing objects: 1% (4/354) Compressing objects: 2% (8/354) Compressing objects: 3% (11/354) Compressing objects: 4% (15/354) Compressing objects: 5% (18/354) Compressing objects: 6% (22/354) Compressing objects: 7% (25/354) Compressing objects: 8% (29/354) Compressing objects: 9% (32/354) Compressing objects: 10% (36/354) Compressing objects: 11% (39/354) Compressing objects: 12% (43/354) Compressing objects: 13% (47/354) Compressing objects: 14% (50/354) Compressing objects: 15% (54/354) Compressing objects: 16% (57/354) Compressing objects: 17% (61/354) Compressing objects: 18% (64/354) Compressing objects: 19% (68/354) Compressing objects: 20% (71/354) Compressing objects: 21% (75/354) Compressing objects: 22% (78/354) Compressing objects: 23% (82/354) Compressing objects: 24% (85/354) Compressing objects: 25% (89/354) Compressing objects: 26% (93/354) Compressing objects: 27% (96/354) Compressing objects: 28% (100/354) Compressing objects: 29% (103/354) Compressing objects: 30% (107/354) Compressing objects: 31% (110/354) Compressing objects: 32% (114/354) Compressing objects: 33% (117/354) Compressing objects: 34% (121/354) Compressing objects: 35% (124/354) Compressing objects: 36% (128/354) Compressing objects: 37% (131/354) Compressing objects: 38% (135/354) Compressing objects: 39% (139/354) Compressing objects: 40% (142/354) Compressing objects: 41% (146/354) Compressing objects: 42% (149/354) Compressing objects: 43% (153/354) Compressing objects: 44% (156/354) Compressing objects: 45% (160/354) Compressing objects: 46% (163/354) Compressing objects: 47% (167/354) Compressing objects: 48% (170/354) Compressing objects: 49% (174/354) Compressing objects: 50% (177/354) Compressing objects: 51% (181/354) Compressing objects: 52% (185/354) Compressing objects: 53% (188/354) Compressing objects: 54% (192/354) Compressing objects: 55% (195/354) Compressing objects: 56% (199/354) Compressing objects: 57% (202/354) Compressing objects: 58% (206/354) Compressing objects: 59% (209/354) Compressing objects: 60% (213/354) Compressing objects: 61% (216/354) Compressing objects: 62% (220/354) Compressing objects: 63% (224/354) Compressing objects: 64% (227/354) Compressing objects: 65% (231/354) Compressing objects: 66% (234/354) Compressing objects: 67% (238/354) Compressing objects: 68% (241/354) Compressing objects: 69% (245/354) Compressing objects: 70% (248/354) Compressing objects: 71% (252/354) Compressing objects: 72% (255/354) Compressing objects: 73% (259/354) Compressing objects: 74% (262/354) Compressing objects: 75% (266/354) Compressing objects: 76% (270/354) Compressing objects: 77% (273/354) Compressing objects: 78% (277/354) Compressing objects: 79% (280/354) Compressing objects: 80% (284/354) Compressing objects: 81% (287/354) Compressing objects: 82% (291/354) Compressing objects: 83% (294/354) Compressing objects: 84% (298/354) Compressing objects: 85% (301/354) Compressing objects: 86% (305/354) Compressing objects: 87% (308/354) Compressing objects: 88% (312/354) Compressing objects: 89% (316/354) Compressing objects: 90% (319/354) Compressing objects: 91% (323/354) Compressing objects: 92% (326/354) Compressing objects: 93% (330/354) Compressing objects: 94% (333/354) Compressing objects: 95% (337/354) Compressing objects: 96% (340/354) Compressing objects: 97% (344/354) Compressing objects: 98% (347/354) Compressing objects: 99% (351/354) Compressing objects: 100% (354/354) Compressing objects: 100% (354/354), done. Total 1187812 (delta 376), reused 358 (delta 292), pack-reused 1187146 Filename: /home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/git/loader.py Line # Mem usage Increment Occurences Line Contents ============================================================ 67 93.3 MiB 93.3 MiB 1 @profile 68 def determine_wants(self, refs: Dict[bytes, HexBytes]) -> List[HexBytes]: 69 """Get the list of bytehex sha1s that the git loader should fetch. 70 71 This compares the remote refs sent by the server with the base snapshot 72 provided by the loader. 73 74 """ 75 93.3 MiB 0.0 MiB 1 if not refs: 76 return [] 77 78 # Cache existing heads 79 93.3 MiB 0.0 MiB 1 local_heads: Set[HexBytes] = set() 80 99.3 MiB 0.0 MiB 87800 for branch_name, branch in self.base_snapshot.branches.items(): 81 99.3 MiB 0.0 MiB 87799 if not branch or branch.target_type == TargetType.ALIAS: 82 93.3 MiB 0.0 MiB 1 continue 83 99.3 MiB 6.0 MiB 87798 local_heads.add(hashutil.hash_to_hex(branch.target).encode()) 84 85 99.3 MiB 0.0 MiB 1 self.heads = local_heads 86 87 # Get the remote heads that we want to fetch 88 99.3 MiB 0.0 MiB 1 remote_heads: Set[HexBytes] = set() 89 101.4 MiB 0.0 MiB 97855 for ref_name, ref_target in refs.items(): 90 101.4 MiB 0.0 MiB 97854 if utils.ignore_branch_name(ref_name): 91 101.4 MiB 0.0 MiB 9900 continue 92 101.4 MiB 2.1 MiB 87954 remote_heads.add(ref_target) 93 94 101.4 MiB 0.0 MiB 1 logger.debug("local_heads_count=%s", len(local_heads)) 95 101.4 MiB 0.0 MiB 1 logger.debug("remote_heads_count=%s", len(remote_heads)) 96 101.4 MiB 0.0 MiB 1 wanted_refs = list(remote_heads - local_heads) 97 101.4 MiB 0.0 MiB 1 logger.debug("wanted_refs_count=%s", len(wanted_refs)) 98 101.4 MiB 0.0 MiB 1 return wanted_refs Filename: /home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/git/loader.py Line # Mem usage Increment Occurences Line Contents ============================================================ 150 70.5 MiB 70.5 MiB 1 @profile 151 def fetch_pack_from_origin( 152 self, 153 origin_url: str, 154 base_repo: RepoRepresentation, 155 do_activity: Callable[[bytes], None], 156 ) -> FetchPackReturn: 157 """Fetch a pack from the origin""" 158 159 70.5 MiB 0.0 MiB 1 pack_buffer = SpooledTemporaryFile(max_size=self.temp_file_cutoff) 160 161 # Hardcode the use of the tcp transport (for GitHub origins) 162 163 # Even if the Dulwich API lets us process the packfile in chunks as it's 164 # received, the HTTP transport implementation needs to entirely allocate 165 # the packfile in memory *twice*, once in the HTTP library, and once in 166 # a BytesIO managed by Dulwich, before passing chunks to the `do_pack` 167 # method Overall this triples the memory usage before we can even try to 168 # interrupt the loader before it overruns its memory limit. 169 170 # In contrast, the Dulwich TCP transport just gives us the read handle 171 # on the underlying socket, doing no processing or copying of the bytes. 172 # We can interrupt it as soon as we've received too many bytes. 173 174 70.5 MiB 0.0 MiB 1 transport_url = origin_url 175 70.5 MiB 0.0 MiB 1 if transport_url.startswith("https://github.com/"): 176 70.5 MiB 0.0 MiB 1 transport_url = "git" + transport_url[5:] 177 178 70.5 MiB 0.0 MiB 1 logger.debug("Transport url to communicate with server: %s", transport_url) 179 180 70.5 MiB 0.0 MiB 1 client, path = dulwich.client.get_transport_and_path( 181 70.6 MiB 0.0 MiB 1 transport_url, thin_packs=False 182 ) 183 184 70.6 MiB 0.0 MiB 1 logger.debug("Client %s to fetch pack at %s", client, path) 185 186 70.6 MiB 0.0 MiB 1 size_limit = self.pack_size_bytes 187 188 204.8 MiB -8280777.4 MiB 95468 def do_pack(data: bytes) -> None: 189 204.8 MiB -8280808.2 MiB 95467 cur_size = pack_buffer.tell() 190 204.8 MiB -8280808.2 MiB 95467 would_write = len(data) 191 204.8 MiB -8280808.2 MiB 95467 if cur_size + would_write > size_limit: 192 raise IOError( 193 f"Pack file too big for repository {origin_url}, " 194 f"limit is {size_limit} bytes, current size is {cur_size}, " 195 f"would write {would_write}" 196 ) 197 198 204.8 MiB -8280805.0 MiB 95467 pack_buffer.write(data) 199 200 70.6 MiB 0.0 MiB 1 pack_result = client.fetch_pack( 201 70.6 MiB 0.0 MiB 1 path, 202 70.6 MiB 0.0 MiB 1 base_repo.determine_wants, 203 70.6 MiB 0.0 MiB 1 base_repo.graph_walker(), 204 70.6 MiB 0.0 MiB 1 do_pack, 205 104.7 MiB -100.2 MiB 1 progress=do_activity, 206 ) 207 208 104.7 MiB 0.0 MiB 1 remote_refs = pack_result.refs or {} INFO:swh.loader.git.loader:Listed 87954 refs for repo https://github.com/kubernetes/kubernetes 209 104.7 MiB 0.0 MiB 1 symbolic_refs = pack_result.symrefs or {} 210 211 104.7 MiB 0.0 MiB 1 pack_buffer.flush() 212 104.7 MiB 0.0 MiB 1 pack_size = pack_buffer.tell() 213 104.7 MiB 0.0 MiB 1 pack_buffer.seek(0) 214 215 104.7 MiB 0.0 MiB 1 logger.debug("fetched_pack_size=%s", pack_size) 216 217 # check if repository only supports git dumb transfer protocol, 218 # fetched pack file will be empty in that case as dulwich do 219 # not support it and do not fetch any refs 220 104.7 MiB 0.0 MiB 1 self.dumb = transport_url.startswith("http") and client.dumb 221 222 104.7 MiB 0.0 MiB 1 return FetchPackReturn( 223 107.1 MiB 2.4 MiB 1 remote_refs=utils.filter_refs(remote_refs), 224 107.1 MiB 0.0 MiB 1 symbolic_refs=utils.filter_refs(symbolic_refs), 225 107.1 MiB 0.0 MiB 1 pack_buffer=pack_buffer, 226 107.1 MiB 0.0 MiB 1 pack_size=pack_size, 227 ) Filename: /home/swhworker/profiling/lib/python3.7/site-packages/swh/loader/git/loader.py Line # Mem usage Increment Occurences Line Contents ============================================================ 254 70.5 MiB 70.5 MiB 1 @profile 255 def fetch_data(self) -> bool: 256 70.5 MiB 0.0 MiB 1 assert self.origin is not None 257 258 70.5 MiB 0.0 MiB 1 base_repo = self.repo_representation( 259 70.5 MiB 0.0 MiB 1 storage=self.storage, 260 70.5 MiB 0.0 MiB 1 base_snapshot=self.base_snapshot, 261 70.5 MiB 0.0 MiB 1 ignore_history=self.ignore_history, 262 ) 263 264 104.7 MiB 34.1 MiB 192 def do_progress(msg: bytes) -> None: 265 104.7 MiB 0.0 MiB 191 sys.stderr.buffer.write(msg) 266 104.7 MiB 0.0 MiB 191 sys.stderr.flush() 267 268 70.5 MiB 0.0 MiB 1 try: 269 70.5 MiB 0.0 MiB 1 fetch_info = self.fetch_pack_from_origin( 270 107.1 MiB 2.4 MiB 1 self.origin.url, base_repo, do_progress 271 ) 272 except NotGitRepository as e: 273 raise NotFound(e) 274 except GitProtocolError as e: 275 # unfortunately, that kind of error is not specific to a not found 276 # scenario... It depends on the value of message within the exception. 277 for msg in [ 278 "Repository unavailable", # e.g DMCA takedown 279 "Repository not found", 280 "unexpected http resp 401", 281 ]: 282 if msg in e.args[0]: 283 raise NotFound(e) 284 # otherwise transmit the error 285 raise 286 except (AttributeError, NotImplementedError, ValueError): 287 # with old dulwich versions, those exceptions types can be raised 288 # by the fetch_pack operation when encountering a repository with 289 # dumb transfer protocol so we check if the repository supports it 290 # here to continue the loading if it is the case 291 self.dumb = dumb.check_protocol(self.origin_url) 292 if not self.dumb: 293 raise 294 295 107.1 MiB 0.0 MiB 1 logger.debug( 296 107.1 MiB 0.0 MiB 1 "Protocol used for communication: %s", "dumb" if self.dumb else "smart" 297 ) 298 107.1 MiB 0.0 MiB 1 if self.dumb: 299 self.dumb_fetcher = dumb.GitObjectsFetcher(self.origin_url, base_repo) 300 self.dumb_fetcher.fetch_object_ids() 301 self.remote_refs = utils.filter_refs(self.dumb_fetcher.refs) # type: ignore 302 self.symbolic_refs = self.dumb_fetcher.head 303 else: 304 107.1 MiB 0.0 MiB 1 self.pack_buffer = fetch_info.pack_buffer 305 107.1 MiB 0.0 MiB 1 self.pack_size = fetch_info.pack_size 306 107.1 MiB 0.0 MiB 1 self.remote_refs = fetch_info.remote_refs 307 107.1 MiB 0.0 MiB 1 self.symbolic_refs = fetch_info.symbolic_refs 308 309 107.1 MiB 0.0 MiB 87957 self.ref_object_types = {sha1: None for sha1 in self.remote_refs.values()} 310 311 107.1 MiB 0.0 MiB 1 logger.info( 312 107.1 MiB 0.0 MiB 1 "Listed %d refs for repo %s", 313 107.1 MiB 0.0 MiB 1 len(self.remote_refs), 314 107.1 MiB 0.0 MiB 1 self.origin.url, 315 extra={ 316 107.1 MiB 0.0 MiB 1 "swh_type": "git_repo_list_refs", 317 107.1 MiB 0.0 MiB 1 "swh_repo": self.origin.url, 318 107.1 MiB 0.0 MiB 1 "swh_num_refs": len(self.remote_refs), 319 }, 320 ) 321 322 # No more data to fetch 323 107.1 MiB 0.0 MiB 1 return False {'status': 'eventful'} mprof: Sampling memory every 0.1s running new process