Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A public git repository is even more interesting, for both ArchiveTeam Codearchiver, and Software Heritage. The latter offers an interface for saving code automatically.

https://wiki.archiveteam.org/index.php/Codearchiver https://wiki.archiveteam.org/index.php/Software_Heritage https://archive.softwareheritage.org/save/



After initial save, do they perform automatic git pulls? What happens if there are potential conflicts? I wonder how it all works behind the surface. I know I ran into issues with "git pull --all" before, for example. Or what if it is public software that is not mine? I saved some git repositories (should I do .tar.gz too for the same project? Does it know anything about versions?).


On Software Heritage: for forges (GitLab, cgit etc), every couple of months SWH lists all repos, pulls new/updated ones. I think if you save an individual repo, it gets pulled later too, but I'm not sure of the schedule. They have custom tooling (open source) for doing the importing of repos, tarballs and other things. They deduplicate on the backend, so if you cloned some repos then the files/commits that are shared between them are saved once. They import the git tags (and other refs) too.

ArchiveTeam Codearchiver is quite a bit different, it does one-shot archiving of repos into VCS-native export formats, like git bundles. There is some deduplication based on commit hashes I think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: