git init —-bare
will give you a git repo without a working set (just the contents typically in the .git directory). This allows you to create things like `foo.git` instead of `foo/.git`.“origin” is also just the default name for the cloned remote. It could be called anything, and you can have as many remotes as you’d like. You can even namespace where you push back to the same remotes by changing fetch and push paths. At one company it was common to push back to `$user/$feature` to avoid polluting the root namespace with personal branches. It was also common to have `backup/$user` for pushing having a backup of an entire local repo.
I often add a hostname namespace when I’m working from multiple hosts and then push between them directly to another instead of going back to a central server.
For a small static site repo that has documents and server config, I have a remote like:
[remote “my-server”]
url = ssh+git://…/deploy/path.git
fetch = +refs/heads/*:refs/remotes/my-server
push = +refs/heads/*:refs/remotes/my-laptop
So I can push from my computer directly to that server, but those branches won’t overwrite the server’s branches. It acts like a reverse `git pull`, which can be useful for firewalls and other situations where my laptop wouldn’t be routable. git clone --mirror <remote>
is another good one to know, it also makes a bare repository that is an exact clone (including all branches, tags, notes, etc) of a remote repo. Unlike a normal clone that is set up for local tracking branches of the remote.It doesn't include pull requests, when cloning from github, though.
Because GitHub pull requests are a proprietary, centralized, cloud-dependent reimplementation of `git request-pull`.
How the "free software" world slid head first into a proprietary cloud-based "open source" world still boils my blood. Congrats, Microsoft loves and owns it all, isn't that what what we always wanted?
git was designed to facilitate the collaboration scheme of the Linux Kernel Mailing List, which is, as you might guess... a mailing list.
Rather than a pull-request (which tries to repurpose git's branching infrastructure to support collaboration), the intended unit of in-the-large contribution / collaboration in git is supposed to be the patch.
The patch contribution workflow is entirely CLI-based... if you use a CLI mail client (like Linus Torvalds did at the time git was designed.)
The core "technology" of this is, on the contributor side:
1. "trailer" fields on commits (for things like `Fixes`, `Link`, `Reported-By`, etc)
2. `git format-patch`, with flags like `--cover-letter` (this is where the thing you'd think of as the "PR description" goes), `--reroll-count`, etc.
3. a codebase-specific script like Linux's `./scripts/get_maintainer.pl`, to parse out (from source-file-embedded headers) the set of people to notify explicitly about the patch — this is analogous to a PR's concept of "Assignees" + "Reviewers"
4. `git send-email`, feeding in the patch-series generated in step 2, and targeting the recipients list from step 3. (This sends out a separate email for each patch in the series, but in such a way that the messages get threaded to appear as a single conversation thread in modern email clients.)
And on the maintainer side:
5. `s ~/patches/patch-foo.mbox` (i.e. a command in a CLI email client like mutt(1), in the context of the patch-series thread, to save the thread to an .mbox file)
6. `git am -3 --scissors ~/patches/patch-foo.mbox` to split the patch-series mbox file back into individual patches, convert them back into an annotated commit-series, and build that into a topic branch for testing and merging.
Subsystem maintainers, meanwhile, didn't use patches to get topic branches "upstream" [= in Linus's git repo]. Linus just had the subsystem maintainers as git-remotes, and then, when nudged, fetched their integration branches, reviewed them, and merged them, with any communication about this occurring informally out-of-band. In other words, the patch flow was for low-trust collaboration, while direct fetch was for high-trust collaboration.
Interestingly, in the LKML context, `git request-pull` is simply a formalization of the high-trust collaboration workflow (specifically, the out-of-band "hey, fetch my branches and review them" nudge email). It's not used for contribution, only integration; and it doesn't really do anything you can't do with an email — its only real advantages are in keeping the history of those requests within the repo itself, and for forcing requests to be specified in terms of exact git refs to prevent any confusion.
* Pile of commits - each individual commit doesn't matter as much as they all work combined. As a general rule, the only requirement for a valid patch is that the final version does what you say it does. Either the final result is squashed together entirely and then merged onto "master" (or whatever branch you've set up to be the "stable" one) or it's all piled together. Keeping the commit history one linear sequence of events is the single most important element here - if you submit a patch, you will not be updating the git hashes because it could force people to reclone your version of the code and that makes it complicated. This is pretty easy to mentally wrap your head around for a small project, but for larger projects quickly makes a lot of the organizatory tools git gives you filled with junk commits that you have to filter through. Most git forges encourage this PR system because it's again, newbie friendly.
* Patch series. Here, a patch isn't so much a series of commits you keep adding onto, but is instead a much smaller set of commits that you curate into its "most perfect form" - each individual commit has its own purpose and they don't/shouldn't bleed into each other. It's totally okay to change the contents of a patch series, because until it's merged, the history of the patch series is irrelevant as far as git is concerned. This is basically how the LKML (and other mailing list based) software development works, but it can be difficult to wrap your head around (+years of advice that "changing history" is the biggest sin you can do with git, so don't you dare!). It tends to work the best with larger projects, while being completely overkill for a smaller tool. Most forges usually offer poor support for patch series based development, unless the forge is completely aimed at doing it that way.
Under the original paradigm, the email list itself — and a (pretty much expected/required) public archive of such, e.g. https://lore.kernel.org for LKML — serves the same history-preserving function for the patch series themselves (and all the other emails that go back and forth discussing them!) that the upstream git repo does for the final patches-turned-commits. The commits that make it into the repo reference URLs of threads on the public mailing-list archive, and vice-versa.
Fun fact: in the modern era where ~nobody uses CLI email clients any more, a tool called b4 (https://b4.docs.kernel.org/) is used to facilitate the parts of the git workflow that interact with the mailing list. The subcommand that pulls patches out of the list (`b4 mbox`) actually relies on the public web archive of the mailing list, rather than relying on you to have an email account with a subscription to the mailing list yourself (let alone a locally-synced mail database for such an account.)
The second one makes sense, but I can't imagine actually working that way on any of the projects I've been in. The amount of work it would take just doesn't make sense. Can totally understand why it would be useful on something like the Linux Kernel though.
As per [0] merge commits are dropped:
Note that format-patch will omit merge commits from the output, even if they are part of the requested range. A simple "patch" does not include enough information for the receiving end to reproduce the same merge commit.
I originally thought it would use --first-parent (so just diff vs the first parent, which is what I would want) but apparently no! It is possible to get this behaviour using git log as detailed in this great write-up [1].
[0] https://git-scm.com/docs/git-format-patch#_caveats
[1] https://stackoverflow.com/questions/2285699/git-how-to-creat...
I'm thinking in terms of what I often see from people I work with, where a PR is normally made up of lots of small commits.
I think the point I always get stuck on is how small is "small" when we're talking about commits/patches. Like if you're adding a new feature (to anything, not necessarily the Linux Kernel), should the entire feature be a single commit or several smaller commits? I go back and forth on this all the time, and if you research you're gonna see a ton of different opinions. I've seen some people argue a commit should basically only be a couple lines of code changed, and others argue it should be the entire feature.
You commonly hear Linus talk about commits/patches having very detailed descriptions attached to them. I have trouble believing people would have time for that if each commit was only a few lines, and larger features were spread out over hundreds of commits.
Often, a change to a new working state is necessarily bigger than a couple of lines, or one of the lines has to get removed later.
I don't want to have to say, "Hmm, I wonder if this will work at the end of the file?" and spend a long time figuring out that it won't, then see that the problem is fixed later in the patch series.
Other people may have other preferences.
My feeling is that devs in general are not into the "learning how to use tools" idea.
They don't want to learn the git basics, they don't want to learn the cmake basics, ...
I mean that as an observation more than a criticism. But to me, the fact that git was designed for those who want to learn powerful tools is a feature. Those who don't can use Microsoft. It all works in the end.
Fun fact: if I want to open source my code but not get contributions (or rather feature requests by people who probably won't ever contribute), I put my git repo on anything that is not GitHub. It feels like most professional devs don't know how to handle anything that is not on GitHub :-). Bonus point for SourceHut: if someone manages to send a proper patch on the mailing list, it usually means that they know what they are doing.
Well, the devs learnt how to use Github, didn't they? Seems like people CAN learn things that are useful. I can also make the argument that Github pull requests are actually more powerful than git request-pull in addition to having a nicer UI/UX.
Being upset that people aren't using git request-pull is like the creator of Brainfuck being upset that scientists aren't using Brainfuck instead of something more powerful and has a better UI/UX like Python. It's kinda obvious which one is better to use...
I didn't say they could not.
Given the number of vim, emacs, nix, git, i3, etc. users who are proud of it and all the customisations they do, I don't think so. Like, there will be a decent group, but not generalisable to "devs".
For me the largest advantage of Git was being able to easily switch branches. Previously I'd have to have multiple copies of an entire source repo on my machine if I wanted to work on multiple things at the same time. Likewise a patch set going through CR meant an entire folder on my machine was frozen until I got feed back.
Not having to email complex patches was another huge plus. I was at Microsoft at the time and they had home made scripts (probably Perl or VBS, but I forget what) that applied patches to a repo.
It sucked.
Git branch alone was worth the cost of changing over.
The other big point was local branches. Before DVCS, the concept of a "local branch" was generally not a thing. But now you could suddenly create a branch for each separate issue and easily switch between them while isolating unrelated changes.
It's not the interface, it's the web hosting. People want a free destination server that's up 24/7 to store their repository.
If it was only the web interface, people could locally install GitLab or Gitea to get a web browser UI. (Or use whatever modern IDE code editor to have a GUI instead of a CLI for git commands.) But doing that still doesn't solve what GitHub solves: a public server to host the files, issue tracking, etc.
Before git & Github, people put source code for public access on SourceForge and CodeProject. The reason was the same: a zero-cost way to share code with everybody.
A 24/7 repository and a 24/7 web URL for the code. Those two features together let devs inspect and download code, and open and discuss issues.
The URL also let automated tools download and install packages.
Familiar UI, network effects made the rest.
I'm really looking forward to federated forges.
One remote can also hold more URLs! This is arguably more obscure (Eclipse's EGit doesn't even support it), but works wonders for my workflow, since I want to push to multiple mirrors at the same time.
Multiple remotes is also how you can combine multiple repos into one monorepo by just fetching and pulling from each one, maybe into different subdirectories to avoid path collisions.
Can’t test it now but wonder if this is changed if it affects the remote name for fresh clones: https://git-scm.com/docs/git-config#Documentation/git-config...
Nevertheless, to avoid ambiguity I usually name my personal forks on GitHub gh-<username>.
And of course in Git every clone is a fork.
AGit seems to be a new alternative where apparently you can push a new branch to someone else's repository that you don't normally have access to, but that's never guaranteed to be possible, and is certainly very idiosyncratic.
That's backwards. In Github every fork is just a git clone. Before GitHub commandeered the term "fork' was already in common use and it had a completely different meaning.
All those workflows are just as valid as the others, I was just pointing out that the way github does it is not the only way it can be done.
Ah yes, I'm sure the remote being called "origin" is what confuses people when they have to push to a refspec with push options. That's so much more straightforward than a button "create pull request".
It's like arguing that instead of having salad or fries on the menu with your entree they should only serve fries.
There was a thread not to long ago where people were conflating git with GitHub. Git is an incredible tool (after coming from SVN/CVS/p4/source safe) that stands on its own apart from hosting providers.
There's other stuff too, like git submodules can't be configured to reference another branch on the local repository and then be cloned correctly, only another remote.
When you clone you get the full remote history and all remote branches (by default). That’s painfully true when you have a repo with large binary blobs (and the reason git-lfs and others exist).
But a clone of your clone is not going to work the same way, since remote branches are not cloned by default, either. So it'll only have partial history. This is what I was thinking about.
git fetch origin refs/notes/*:refs/notes/*
is the command you have to run to actually clone remote refs if you're making a working-copy clone?You may be thinking of the optional -depth switch, which allows you to create shallow clones that don't have the full history. If you don't include that, you'll get the full history when cloning.
But I get your larger point.
Sadly doing a monorepo this way with pnpm doesn't work, since pnpm doesn't enforce package version requirements inside of a pnpm workspace. And it doesn't record installed version information for linked packages either.
I don't have a central dotfiles repo anymore (that I would always to forget to push to); I have SSH access to my devices - via tailscale - anyway so I'm doing
git remote add $hostname $hostname:.config
and can cd ~/.config && git fetch/pull/rebase $hostname anytime from anywhere.I've been considering a bare repo + setting $GITDIR (e.g via direnv) but somehow the dead simple simplicity has trumped the lack of push ability
I put my whole home folder in git and that has its benefits (being able to see changes to files as they happen) but if I'm just copying a file or two of config I'll just cat or scp it--introducing git seems needlessly complex if the branches are divergent
I don't have to remember which to copy
> rsync or scp
I don't have to remember which is most recent, nir even assume that "most recent" is a thing (i.e nonlinear)
It's all just:
- a git fetch --all away to get
- a git log --oneline --decorate --graph --all to find out who's where and when
- diff and whatchanged for contents if needed
- a cherry-pick / rebase away to get what I want, complete with automatic conflict resolution
I can also have local experiments in local topic branches, things I want to try out but not necessarily commit to across all of my machines yet.
Tangentially related: when you have multiple local checkouts, often `git worktree` is more convenient than having completely independent local repository. See https://git-scm.com/docs/git-worktree
Then, have separate prod and staging clones parallel to that.
Have a post-commit hook set on the bare repo that automatically pushes updates to the staging repo for testing.
When ready, then pull the updates into prod.
Might sound strange, but for certain clients hosting situations, I've found it allows for faster iterations. ymmv
#!/bin/sh
set -e
echo -n 'updating... '
git update-server-info
echo 'done. going to dev3'
cd /home/kragen/public_html/sw/dev3
echo -n 'pulling... '
env -u GIT_DIR git pull
echo -n 'updating... '
env -u GIT_DIR git update-server-info
echo 'done.'
You can of course also run a local site generator here as well, although for dev3 I took a lighter-weight approach — I just checked in the HEADER.html file that Apache FancyIndexing defaults to including above the file directory listing and tweaked some content-types in the .htaccess file.This could still fail to update the checkout if it has local changes, but only if they create a merge conflict, and it won't fail to update the bare repo, which is probably what your other checkouts are cloned from and therefore where they'll be pulling from.
I would think you'd want to
cd /home/kragen/public_html/sw/dev3
git update-server-info
git pull
..in that order.And I wouldn't think you'd need to run git update-server-info again, after git pull. My understanding isthe update-server-info makes updates to info/refs , which is necessary _after a push_.
What am I missing?
I'm not sure the second update-server-info is necessary.
If you're asking about the env -u, that's because Git sets that variable so that commands know which repo they're in even if you cd somewhere else, which is exactly what I don't want.
Git is especially prone to the sort of confusion where all the experts you know use it in slightly different ways so the culture is to just wing it until you're your own unique kind of wizard who can't tie his shoes because he favors sandals anyhow.
The solution is to set team/department standards inside companies or use whatever you need as a single contributor. I saw attempts to standardize across a company that is quite decentralized and it failed every time.
This is ultimately where, and why, github succeeded. It's not that it was free for open source. It's that it ironed out lots of kinks in a common group flow.
Git is a cultural miracle, and maybe it wouldn't have got its early traction if it had been overly prescriptive or proscriptive, but more focus on those workflows earlier on would have changed history.
Unlike calculus, though, you can learn enough about Git to use it usefully in ten minutes. Maybe this sets people up for disappointment when they find out that afterwards their progress isn't that fast.
Ane example of that is the suckless philosophy where extra features comes as patches and diff.
(I started using Git in 02009, with networking strictly over ssh and, for pulls, HTTP.)
What? Knowing that a git repo is just a folder is nowhere near "expert" level. That's basic knowledge, just like knowing that the commits are nodes of a DAG. Sadly, most git users have no idea how the tool works. It's a strange situation, it'd be like if a majority of drivers didn't know how to change gears.
If you literally can't change gears then your choices are a) go nowhere (neutral), b) burn out your clutch (higher gears), or c) burn out your engine (1st gear). All are bad things. Even having an expert come along to put you in the correct gear once, twice, or even ten times won't improve things.
If a programmer doesn't know that git is a folder or that the commits are nodes of a DAG, nothing bad will happen in the short term. And if they have a git expert who can get them unstuck say, five times total, they can probably make it to the end of their career without having to learn those two details of git.
In short-- bad analogy.
You are simultaneously saying that something is not expert level knowledge while acknowledging that most people don’t know it. Strange.
I'm not sure that's true, unless you only take certain parts of the world into consideration.
> just like knowing that the commits are nodes of a DAG
Hello gatekeeping! I have used Git for more than 10 years. I could not explain all of the ins-and-outs of commits, especially that they are "nodes of a DAG". I do just fine, and Git is wonderful to me. Another related example: I would say that 90%+ of .NET and Java users don't intimately understand their virtual machine that runs their code. Hot take: That is fine in 2025; they are still very productive and add lots of value. > Have you never used rebase or cherry-pick?
Of course. And when I forget how to do it, I ask Google, or ChatGPT. It works for me.Depends on what you mean by "a project". If it's policy related, maybe it's company's policy that all code that is written must be stored in a certain way for multitude of reasons.
It's not about how long the action takes, it's about how much the team responsible for that is loaded and can prioritize things. Every team needs more round tuits. Anyone who works in an IT support role knows this. The point is that they can self-service immediately and there is no actual dependency to start writing code and using revision control, but people will trot out any excuse.
Tons of people who DO use git cli don't know git init. Their whole life was create a project on github and clone it. Anyway initting new project isn't the most "basic" thing with git, it is used less than .01% of total git commands
if you combine the above easily MOST people have no idea about git init
In large corps you usually have policies to not leave your laptop unattended logged in, in the office, that would be potentially even worse than that.
It's a misguided policy that hurts morale and leaves a tremendous amount of productivity and value on the floor. And I suspect that many of the policies are in place simply because a number of the rule makers aren't aware of how easy it to share the code. Look how many in this thread alone weren't aware of inherent distributability of git repositories, and presumably they're developers. You really think some aging career dev ops that worked at Microsoft for 30 years is going to make sensical policies about some software that was shunned and forbidden only a decade ago?
With remote, if your company stubbornly refuses to use a modern vpn like tailscale, and you can't really network between two computers easily, git format patch and git am, coupled with something like slack messages, works well enough, albeit moderately cumbersome
That way, I
1. didn't have to worry about sync conflicts. Once complete, just push to origin 2. had my code backed up outside my computer
I can't exactly remember, if it saves space. I assumed it does, but not sure anymore. But I feel it was quite reliable.
I gave that way up with GitHub. But thinking of migrating to `Codeberg`
With `tailscale`, I feel we have so much options now, instead of putting our personal computer out on the Internet.
I mean, it works fine for a few days or weeks, but then it gets corrupted. Doesn't matter if you use Dropbox, Google Drive, OneDrive, whatever.
It's apparently something to do with the many hundreds of file operations git does in a basic operation, and somehow none of the sync implementations can quite handle it all 100.0000% correctly. I'm personally mystified as to why not, but can attest from personal experience (as many people can) that it will get corrupted. I've heard theories that somehow the file operations get applied out of order somewhere in the pipeline.
When it is a file conflict the sync engines will often drop multiple copies with names like "file (1)" and "file (2)" and so forth. It's sometimes possible to surgically fix a git repo in that state by figuring out which files need to be "file" or "file (1)" or "file (2)" or whatever, but it is not fun.
In theory, a loose objects-only bare repo with `git gc` disabled is more append-only and might be useful in file sync engines like that, but in practice a loose-objects-only bare repo with no `git gc` is not a great experience and certainly not recommended. It's probably better to use something like `git bundle` files in a sync engine context to avoid conflicts. I wonder if anyone has built a useful automation for that.
Good luck with iCloud!
But have you ever found a cloud sync tool that doesn't eventually corrupt with git? I'm not aware of one existing, and I've looked.
Again, to be clear, I'm not talking about the occasional rsync, but rather an always-on tool that tries to sync changes as they happen.
What about SyncThing?
Granted I've never tried it so take it with a grain of salt.
https://www.theverge.com/22684730/students-file-folder-direc...
These guys won.
Now, if it is a growing misconception among cs students or anyone doing software development or operations, that's a cause for concern.
Clip from the interview: https://www.youtube.com/shorts/0wLidyXzFk8
Even someone who knows that git isn't GitHub might not be aware that ssh is enough to use git remotely. That's actually the case for me! I'm a HUGE fan of git, I mildly dislike GitHub, and I never knew that ssh was enough to push to a remote repo. Like, how does it even work, I don't need a server? I suspect this is due to my poor understanding of ssh, not my poor understand of git.
You do, an SSH server needs to be running on the remote if you want to ssh into it, using your ssh client - the `ssh` command on your laptop. It's just not a http server is all.
You start that server using the `sshd` [systemd] service. On VPSs it's enabled by default.
Git supports both http and ssh as the "transport method". So, you can use either. Browsers OTOH only support http.
Edit: hey this is really exciting. For a long time one of the reasons I've loved git (not GitHub) is the elegance of being a piece of software which is decentralized and actually works well. But I'd never actually used the decentralized aspect of it, I've always had a local repo and then defaulted to use GitHub, bitbucket or whatever instead, because I always thought I'd need to install some "git daemon" in order to achieve this and I couldn't be bothered. But now, this is so much more powerful. Linus Torvalds best programmer alive, change my mind.
And most things are files.
From the git-fetch(1) manual page:
> Git supports ssh, git, http, and https protocols (in addition, ftp and ftps can be used for fetching, but this is inefficient and deprecated; do not use them).
You only need access to the other node repo information. There's no server. You can also use a simple path and store the other repo on drive.
At that point, may as well support raw serial too.
Supporting rlogin on the other hand is probably as simple as GIT_SSH=rlogin
There IS a server, it's the ssh daemon. That's the bit I had never thought about until now.
Read https://git-scm.com/docs/git-init#Documentation/git-init.txt...
Anyway it sounds like you have a lot of holes in your git-knowledge and should read some man pages
https://git-scm.com/book/ms/v2/Git-on-the-Server-The-Protoco...
Imagine what the equivalent argumentation for a lawyer or nurse would be. Those rules ought to apply for engineers, too.
If somebody asked me if it's possible to scp my git repo over to another box and use it there or vice versa, I would have said, yes, that is possible. Although I would've felt uneasy doing that.
If somebody asked me if git clone ssh:// ... would definitely work, I wouldn't have known out of the gate, although I would have thought it would be neat if it did and maybe it does. I may have thought that maybe there must be some sort of git server process running that would handle it, although it's plausible that it would be possible to just do a script that would handle it from the client side.
And finally, I would've never thought really to necessarily try it out like that, since I've always been using Github, Bitbucket, etc. I have thought of those as permanent, while any boxes I have could be temporary, so not a place where I'd want to store something as important to be under source control.
You’ve always used GitHub but never known it could work over ssh? Isn’t it the default method of cloning when you’re signed in and working on your own repository…?
E.g. maybe the host would have to have something like apt install git-server installed there for it to work. Maybe it wouldn't be available by default.
I do know however that all info required for git in general is available in the directory itself.
We’ve gone so far with elaborate environments and sets to make it easy to learn more advanced things, that many people never learn the very basics. I see this as a real problem.
But more importantly, I’m not sure why I would want to deploy something by pushing changes to the server. In my mental model the repo contains the SOT, and whatever’s running on the server is ephemeral, so I don’t want to mix those two things.
I guess it’s more comfortable than scp-ing individual files for a hotfix, but how does this beat pushing to the SOT, sshing into the server and pulling changes from there?
I’ve never worked with decentralized repos, patches and the like. I think it’s a good moment to grab a book and relearn git beyond shallow usage - and I suspect its interface is a bit too leaky to grok it without understanding the way it works under the hood.
I did a quick search for "post-receive hook ci" and found this one: https://gist.github.com/nonbeing/f3441c96d8577a734fa240039b7...
I think your age isn't the issue, but I suspect you're in a bubble.
Decades from now, git will be looked back at in a similar but worse version of the way SQL often is -- a terrible interface over a wonderful idea.
I don't think git would end up this popular if it didn't allow to be used in a basic way by just memorizing a few commands without having to understand its repository model (however simple) well.
It's mostly probably fine if that's the thing most of everybody wants to use and it works well; but also it's very unwise to forget that the point was NEVER to have a deeply centralized thing -- and that idea is BUILT into the very structure of all of it.
"In the 80s and 90s (and before), it was mainly academics working in the public interest, and hobbyist hackers. Think Tim Berners-Lee, Vint Cerf, IETF for web/internet standards, or Dave Winer with RSS. In the 00s onward, it was well-funded corporations and the engineers who worked for them. Think Google. So from the IETF, you have the email protocol standards, with the assumption everyone will run their own servers. But from Google, you get Gmail.
[The web] created a whole new mechanism for user comfort with proprietary fully-hosted software, e.g. Google Docs. This also sidelined many of the efforts to keep user-facing software open source. Such that even among the users who would be most receptive to a push for open protocols and open source software, you have strange compromises like GitHub: a platform that is built atop an open source piece of desktop software (git) and an open source storage format meant to be decentralized (git repo), but which is nonetheless 100% proprietary and centralized (e.g. GitHub.com repo hosting and GitHub Issues)." From: https://news.ycombinator.com/item?id=42760298
It was extremely unlikely that it would be some kind of free utopia; but also, it's extremely remarkable what we've been able to keep generally free, or at least with a free-enough option.
GitHub's value is in network effects and features like big and issue tracking.
At least Gmail is still an email client that communicates with other systems.
It's like, a Redis cluster is distributed but not decentralized. The ssh protocol is not decentralized. XMPP, Matrix, and Bitcoin are decentralized protocols, first two via federation.
git clone ssh://username@hostname/path/to/repo
this is equivalent to: git clone username@hostname:path/to/repo
and if your usernames match between local and remote: git clone hostname:path/to/repo
(if the path has no leading /, it is relative to your home directory on the remote)Host A, cannot reach official github.com. But Host B can and has a local copy of a repo cloned. So Host B can 'git clone ssh://' from Host A which is essentially equivalent, but just setting origin to Host B instead of github.com, sort of acting as a manual proxy?
What if Host A is natted, so Host B can ssh to Host A but not the reverse, can Host G ssh clone to Host A to push changes?
In the rare times I've needed this, I just 'rsync -av --delete' a repo from B->A.
This is the usecase mentioned in the article and it wouldn't work with a bare repo. But if the server your SSH'ing to is just a central point to sync code across machines, then you're right: multiple hoops mentioned in the article are solved by having the central repo bare.
Tip: create a `git` user on the server and set its shell to `git-shell`. E.g.:
sudo useradd -m -g git -d /home/git -s /usr/bin/git-shell git
You might also want to restrict its directory and command access in the sshd config for extra security.Then, when you need to create a new repository you run:
sudo -u git git init --bare --initial-branch=main /home/git/myrepo.git
And use it like so: git clone git@myserver:myrepo.git
Or: git remote add myserver git@myserver:myrepo.git
git push -u myserver main
This has the exact same UX as any code forge.I think that initializing a bare repository avoids the workarounds for pushing to a currently checked out branch.
However, this setup doesn't work with git-lfs (large file support). Or, at least I haven't been able to get it working.
PS: Even though git-shell is very restricted you can still put shell commands in ~/git-shell-commands
For an actually distributed large file tracking system on top of git you could take a look at git-annex. It works with standard ssh remotes as long as git-annex is installed on the remote too (it provides its own git-annex-shell instead of git-shell), and has a bunch of additional awesome features.
git init
git commit -am Initial\ commit
git clone . ssh://server/path/to/repo
And it didn’t work. You have to ssh to the remote server and “git init” on a path first. How uncivilized.Bitkeeper and a few other contemporaries would let you just push to a remote path that doesn’t exist yet and it’d create it. Maybe git added this since then, but at the time it seemed like a huge omission to me.
If you want a public facing "read only" ui to public repositories you can use cgit (https://git.zx2c4.com/cgit/about/) to expose them. That will enable others to git clone without using ssh.
I keep my private repositories private and expose a few public ones using cgit.
So many features and concepts; it's easy to think you understand the basics, but you need to dig deep into it's origin and rationale to begin to grasp the way of thinking it is built around.
And the API surface area is much larger than one would think, like an iceberg
So I find it really weirdly low level in a way. Probably what is needed is is a higher-level CLI to use it in the most sensible, default way, because certainly the mental model most people use it with is inadequate.
During the dev / compile / test flow, git makes a lightweight CI that reduces the exposure to your primary repo . Just run `watch -n 60 make` on the target and push using `git push` . The target can run builds without having any access to your primary (GitHub) repo.
On your primary repo, create dirs for each machine e.g.
monorepo/
├─ Machine1/
│ ├─ usr/share/www/
│ ├─ etc/
├─ machine2/
│ ├─ etc/
Create remotes for each machine+repo e.g. `git remote add hosts/machine1/etc ssh://machine1/etc` then `git fetch hosts/machine1/etc`Then add the subtree with `git subtree add -P machine1/etc hosts/machine1/etc master`
When you want to pull changes you can `git subtree pull` or `git subtree push …`
If you end up making changes in your monorepo, use push. If you make changes directly on the machine (or via terraform), pull
This way you can edit and manage dozens of machines using a single git repo, and git subtree push to deploy the changes. No deploy scripts.
The idea of a folder per machine is very good.
This is true, but I do also like having backups that are entirely decoupled from my own infrastructure. GitHub personal private accounts are free and I believe they store your data redundantly in more than one region.
I imagine there's a way to setup a hook on your own server such that any pushes are then pushed to a GitHub copy without you having to do anything else yourself.
For most people, that would defeat the purpose of self-hosting.
(There's also the "to comply with our legal obligations" bit, which is a concern if you're doing things that governments around the world may have issue with.)
I expect there are people who like to self-host so that they're not dependent on GitHub up-time or having their accounts banned and losing access to their data. For them, having a secondary GitHub backup should still be useful.
None of that protects against the U.S. laws or current political climate. There is no guarantee for EU citizen that GitHub processes data only in EU area or that data ever leaves it. So it is all about the data privacy.
At that moment, if I had any important private repos, they would be gone.
We then went to using a central bare repo on a shared server, to hosted gitlab(? I think - it was Ruby and broke constantly) eventually landing on GitHub
The thing that people really don't seem to get these days is how your master branch is a different branch from someone else's master branch. So pulling from one master to another was a normal thing. When you clone you get a copy of all the branches. You can commit to your master branch all you want, it's yours to use however you want to.
In my 10+ year career, I'm not sure I've ever run "git fetch" manually. I've never had a time where I wanted to fetch upstream changes, but not merge (or rebase) them into my branch.
I got pwned this way before (by a pentester fortunately). I had to configure Apache to block the .git directory.
I usually throw `etc` and `log` directories at the top level as well and out my server config in etc, and have a gitignite rule to ignore everything in logs, but it’s there and ready for painless deployment.
Since the web root is already a sub directory, more sensitive things can go into the same repo without worrying about exposing them.
- Deleted files and development artifacts that were never meant to go public.
- My name and email address.
- Cringy commit messages.
I assumed these commits and their metadata would be private.
It was embarrassing. I was in high school, I was a noob.
The speed at which an accidentally committed and reverted key is compromised and used to say launch a fleet of stolen VPSes on a github public repo nowadays is incredible. Fortunately most of the time your provider will cancel the charges...
This has always been the roughest part of git for me, the process to remove accidentally committed sensitive content. Sure we should all strive not to commit stupid things in the first place, and of course we have tools like gitignore, but we are all only human.
> https://docs.github.com/en/authentication/keeping-your-accou...
"Sensitive data can be removed from the history of a repository if you can carefully coordinate with everyone who has cloned it and you are willing to manage the side effects."
Don't worry, karma dictates when the interviewer goes looking they'll get rejected for not knowing some similarly esoteric graph theory equation or the internal workings of a NIC card.
Too much of our interviewing is reading the interviewer's mind or already knowing the answer to a trick question.
The field is way too vast for anyone to even know a majority, and realistically it's extremely difficult to assess if someone is an expert in a different 1%.
Sometimes I feel like we need a system for just paying folks to see if they can do the job. Or an actually trusted credentialing system where folks can show what they've earned with badges and such.
A better interview question about this subject doesn't assume they have it memorized, but if they can find the answer in a short time with the internet or get paralyzed and give up. It's a very important skill to be able to recognize you are missing information and researching it on the Internet.
For example, one of my most talented engineers didn't really know that much about CS/SWE. However, he had some very talented buddies on a big discord server who could help him figure out anything. I kid you not, this kid with no degree and no experience other than making a small hobby video game would regularly tackle the most challenging projects we had. He'd just ask his buddies when he got stuck and they'd point him to the right blog posts and books. It was like he had a real life TRRPG Contacts stat. He was that hungry and smart enough to listen to his buddies, and then actually clever enough to learn on the job to figure it out. He got done more in a week than the next three engineers of his cohort combined (and this was before LLMs).
So maybe what we should test isn't data stored in the brain but ability to solve a problem given internet access.
Same issues with git: they don't realise they can have multiple configs, multiple remotes, etc. Never mind knowing how to sign commits.........
They claim to be linux boffins but cannot initialise a git repo. This has nothing to do with elitism. This is basic stuff.
What's next, they don't know what a bootloader or a partition is? Or run database engine with default settings? Or install a server OS and never bother to look at firewall config?
I'm truly not trying to be cruel.
But that's why people don't know about it, because they skip past the basics because in practice you never use it or need to know about it.
This is the reality of software engineering and the like though - mostly you learn what you need to know, because learning everything is usually wasteful and never used, and there's a lot available.
(I haven't been able to read documentation or a software book end to end in 20 years)
I wrote a HOWTO a few weeks ago: http://mikhailian.mova.org/node/305
When you checkout (now switch) a branch, HEAD is now the same as the branch (they point to the same commit). When you do operation like commit, reset, reset,... both the head and the branch are updated. So if a remote node tries to update the local node via a push on its end, that would mess up the local worktree. So you create a bare repo instead which can't contain a local worktree.
Note: A worktree is computed from the initial commit to the commit currently identified by HEAD by following the parent information on each commit.
The staging area is like a stored snapshot of what you would like to commit. You can always create patch between HEAD and the worktree, edit it, and then save the patch as the commit, then apply the leftover to the worktree. The staging just make that easier. It's a WIP patch for the next commit.
Actually, HEAD now points to that branch, which in turn points to the commit. It's a different state than when HEAD points directly to the same commit (called "detached HEAD" in Git's terminology) where this issue doesn't happen as you're not on a branch at all. It's a subtle, but important distinction when trying to understand what happens under the hood there.
You can push to a repo with a working copy, if you like; nothing will happen to that working copy unless you run `hg update`. Since you don’t need a working copy on the server’s repo, you never run `hg update` on it, and it’s effectively what git calls a bare repository.
Case one: WiP on a branch, code is pretty stable, but I want to do some experiments which will likely be deleted. I stage everything and then make (unstaged) changes which I can then undo with two keystrokes.
Case two: I'm reviewing a complex PR, so I first merge it with --no-commit, then unstage everything and then stage chunks (or even individual lines) I have already reviewed.
Case three: I was coding in the state of flow and now I have a lot of changes I want to commit, but I want to separate them into several atomic commits. I stage the first batch of changes, commit, then stage another, commit, etc.
There are probably more, but these three alone are worth having the staging area.
Case two: you don't need staging. You can stash, and then unstash incrementally (I would be shocked if git doesn't have the equivalent of `hg unshelve --interactive` and its keyboard friendly TUI)
Case three: you don't need staging. You can just edit/amend the series (rebase --interactive I believe you guys call that).
That is to say, all that you want to put in your stash, you could commit directly to the DAG, and edit to your convenience, with the regular history-rewriting tools already at your disposal. And the off-DAG stuff can be handled by stash (and even there, a normal commit that you would rebase to its destination would perfectly do).
And what I described is incidentally what https://github.com/jj-vcs/jj does, which can be pretty-well described as "taking the best of all major VCSes"
The immediate response from many git users when confronted to alternative VCSes is "well, it doesn't have a staging area, so it's obviously inferior" instead of going with "let's see how differently they approach this problem, and perhaps I will like it/git isn't all perfect after all".
Managing code is one of the cornerstones of software engineering. It would be like refusing to learn how to use a screwdriver because someone really just wants to hammer things together.
The great thing about $your-favorite-scm is that it transcends language or framework choices and is fungible for early any project, even outside of software. I’m surprised it isn’t part of more professional tools.
Ok, then make the commands make sense. For example 90%+ people has no idea what rebase does, yet it is a useful command.
People does not want to learn with git outside of what works, because they can't experiment. The moment they will "hold it wrong" whole repo will break into pieces, unable to go forward or backwards. Hopefully they did not commit.
Git feels like a hammer covered in razor blades. The moment you will try to hold it differently you will cut yourself and somebody else will need to stich you up.
I used that all the time when I had to move private repositories back and forth from work, without ssh access.
For file protocol, just type a path.
For ssh, type a ssh connection string like lion@myserver.test:path/to/file
Another option would be to use the email-based workflow, but that's quite different from most people's expected git experience.
Match User gituser
ChrootDirectory /srv/git_chroot
ForceCommand internal-sftp
AllowTcpForwarding no
X11Forwarding no
PermitTTY no
But tbh sending patches is fun and easy! After you force yourself to do it a few times you might even prefer it to push/pullOr if you are using Linux, there is a httpd (web server) in the busybox package:
busybox httpd -f -p 0.0.0.0:10080 -h ~/public/- a prod server ( and a test server) with a git repo.
- a local machine with the git repo.
- a git server where code officially lives, nowadays just a github.
If I were to simplify and run my own git server of the third kind, I would probably not run a server for the sole purpose of hosting code, it would most likely run on the prod/test server.
So essentially I would be eliminating one node and simplifying. I don't know, maybe there's merits to having an official place for code to be in. Even if just semantics.
I know you can also use branches to have a "master" branch with code and then have migrations just be merging from master into a prod branch, but then again, I could have just master branches, but if it's on the test server then it's the test branch.
I don't know if spendint time reinventing git workflows is a very efficient use of brain juice though.
I really preferred the idea of just paying for what I used -- rather than being on a "freemium" model with GitHub.
But -- as many things with Google -- it was shutdown. Probably because most other people do prefer the freemium model.
I wonder if this kind of thing will come back in style someday, or if we are stuck with freemium/pro "tiers" for everything.
TIL about the update options for checked out branch. In practise though usually you want just the .git "bare" folder on server
(Use `git pull`? If the different people push to different branches, then there's no conflict and no problem. If you try to push different things into the same branch, the second person will get told their branch is out of date. They can either rebase or - if this is allowed by the repo config - force push over the previous changes...)
the vastly superior way is 'git bare' which is a first class supported command without hacky settings.
mkdir -p repo/project
cd repo/project
git init --bare
cd ../..
git clone repo/project
cd project
(do git stuff)It'd be great if there was more specific support. But in practice? No problems so far.
I sync my repos manually (using GitHub as the always-on remote, but I'm not particularly attached to it). This gives me more resilience should I blow up the repo completely (hard to do, I know).
The benefit is that git repos are essentially append only in this mode.
The folder itself is scheduled into an encrypted backblaze backup too.
$ git push heroku master
So you'll have to make sure to push to e.g. GitHub as well for version control.
https://gitolite.com/gitolite/
If you think the bare bones example is interesting and want something simple just for you or a small group of people, this is one step up. There's no web interface. The admin is a git repository that stores ssh public keys and a config file that defines repo names with an ACL. When you push, it updates the authorization and inits new repositories that you name.
I put everything in repos at home and a have multiple systems (because of VMs) so this cleaned things up for me considerably.
https://public-inbox.org/README.html
> public-inbox implements the sharing of an email inbox via git to complement or replace traditional mailing lists. Readers may read via NNTP, IMAP, POP3, Atom feeds or HTML archives.
> public-inbox stores mail in git repositories as documented in https://public-inbox.org/public-inbox-v2-format.txt and https://public-inbox.org/public-inbox-v1-format.txt
> By storing (and optionally) exposing an inbox via git, it is fast and efficient to host and mirror public-inboxes.
If I was just using it for git hosting I'd probably go for something more light weight to be honest.
Hard to justify using SSH for Git. Principle of least power and all that
TIL that people use non-bare git.
CSS needs some tweaking for iOS Safari, I guess.
Until you have more users than dollars that's all you need.
Why is GitHub popular? its not because people are "dumb" as others think.
Its because GitHub "Just Works".
You don't need obscure tribal knowledge like seba_dos1 suggests [0] or this comment https://news.ycombinator.com/item?id=45711294
The official Git documentation for example has its own documentation that I failed to get work. (it is vastly different from what OP is suggesting)
The problem with software development is that not knowing such "tribal knowledge" is considered incompetence.
People don't need to deal with obscure error messages which is why they choose GitHub & why Github won.
Like the adge goes, "Technology is best when it is invisible"
[0] https://news.ycombinator.com/item?id=45711236
[1] https://git-scm.com/book/en/v2/Git-on-the-Server-Setting-Up-...
Yes the commit won't be on the branch you want, but you'd get about the same issue if the two repos had a bare upstream. The branch diverges and you need to merge. It's a bit less ergonomic here but could be improved. Git could use branch following improvements in general.
Not at all. The commit would have "landed" on the exact branch you thought it will. How it will be reconciled with a diverged remote branch is completely orthogonal and may not even be of concern in some use cases at all.
If git was a tiny bit smarter it could remember you were working on "foo" even after the ref changes.
But my real point is that refusing to act is not "the only sane [default] option" here. Mild ergonomic issues aren't a disqualifier.
> error: cannot delete branch 'main' used by worktree at '/tmp/git'
You could build the system differently and what seems like a sane default would be different there, but it would be a different system. In this system, HEAD isn't being manipulated by things not meant to manipulate it.
Still, I like the online browser, and pr workflow.
However would NEVER trust Github since the MS acquisition. codeberg and https://forgejo.org are perfectly sound FOSS alternative to GitHub and GigLabs nowdays.
[0] Source: pulled out of my arse.
Good job, now you can't add it nor remove it, without manually removing it in .git folder.
I think your comment shows some confusion that it's either the result or cause of some negative experiences.
Starting with GitHub. The primary reason it "just works" is because GitHub, like any SaaS offering, is taking care of basic things like managing servers, authorization, access control, etc.
Obviously, if you have to setup your own ssh server, things won't be as streamlined as clicking a button.
But that's obviously not the point of this post.
The point is that the work you need to do to setup a Git server is way less than you might expect because you already have most of the things already set, and the ones that aren't are actually low-hanging fruit.
This should not come as a surprise. Git was designed as a distributed version control system. Being able to easily setup a stand-alone repository was a design goal. This blog post covers providing access through ssh, but you can also create repositories in any mount point of your file system, including in USB pens.
And, yes, "it just works".
> The official Git documentation for example has its own documentation that I failed to get work. (it is vastly different from what OP is suggesting)
I'm sorry, the inability to go through the how-to guide that you cited has nothing to do with Git. The guide only does three things: create a user account, setup ssh access to that account, and create a Git repository. If you fail to create a user account and setup ssh, your problems are not related to Git. If you created a user account and successfully setup ssh access, all that is missing is checking out the repo/adding a remote repo. If you struggle with this step, your issues are not related to Git.
The knowledge is neither obscure nor tribal, it is public and accessible. And likely already on your system, in the form of man-pages shipped with your git binaries.
> The problem with software development is that not knowing such "tribal knowledge" is considered incompetence.
Consequently.
For normal users. Having this tribal knowledge is basically what makes developer and it’s their job to make technology invisible for others. Someone has to be behind the curtain.
IPv6 still doesn't work with GitHub: https://doesgithubhaveipv6yet.com/
At the very least, a huge part of the intent of Git's very design was decentralization; though as is the case with many good tools, people don't use them as they are designed.
Going further, simply because "deeply centralized Git" is very popular, does not AT ALL determine that "this is the better way to do things." Please don't frame it as if "popular equals ideal."
> Its because GitHub "Just Works".
Git also "just works". GitHub simply offers a higher level of abstraction, a graphical UI, and some value-add features on top of Git. How much better all this really is arguable. I would say that it's disastrous that most developers rely on a centralized service to use a distributed version control system. Nevermind the fact that the service is the single largest repository of open source software, owned and controlled by a giant corporation which has historically been hostile to OSS.
GitHub "won" because it came around at the right time, had "Git" in its name—which has been a perpetual cause of confusion w.r.t. its relation with Git—, and boosted by the success of Git itself largely due to the cult of personality around Linus Torvalds. Not because Git was technically superior, or because GitHub "just works".
> You don't need obscure tribal knowledge
As others have said, a bare repository is certainly not "tribal knowledge". Not anymore than knowing how to use basic Git features.
> Like the adge goes, "Technology is best when it is invisible"
Eh, all technology is an abstraction layer built on top of other technology. Whether it's "invisible" or not depends on the user, and their comfort level. I would argue that all abstractions also make users "dumber" when it comes to using the layers they abstract. Which is why people who only rely on GitHub lose the ability, or never learn, to use Git properly.
Not because it's hard or obscure to put git on your server.
Unfortunately (though expected), ever since Microsoft took over this has devolved into GitHub "sometimes works".
What are we supposed to do ... throw our hands up because GitHub won?
I'll be down voted, but I'll say it. If you hold that attitude and you don't learn the fundamentals, if you don't understand your tools, you're a bad developer and a poor craftsman. You're not someone I would hire or work with.
Git's difficulty is NOT intrinsic; it could be a much better tool if Torvalds were better at UX. In short, I don't blame people who don't want to "learn git". They shouldn't have to learn it anymore than one learns to use a browser or Google docs.
You're likely using a VCS, which is likely git (or jj, hg, fossil, tfs, etc).
Therefore, you should know how to use whatever you're using. It shouldn't be "invisible" to you.