Git Troubleshooting & Tips
Background
Section titled BackgroundSome reasons, why we now sometimes run into git issues
- Over the past years we asked users (e.g. during trainings) to install git with minium required guidance
- Even though more advanced tools (ARCitect) now bring their own git installation, there might still occur interferences with older installations
- There might also be issues of tools (e.g. ARCitect and ARC commander) or different versions of those tools handling git-related tasks a bit differently or more / less strict (e.g. things like
main
as the default branch) - The current (versions of) tools were not really built for collaboration with many people on one ARC (at least not with default settings from DataHUB side). So common errors are related to merge conflicts (multiple users changing files) and divergent branches (e.g. between local and remote clones of the ARC).
- Some behaviors are simply very use-case or setup specific and will in any case and even with the best tooling require some stewardship
Debugging
Section titled Debugging-
(if required) Install Git on user machine
-
navigate to the ARC in trouble (via one of many options below)
- On macOS: you can drag&drop the ARC folder from Finder into a terminal
- On macOS: right-click ARC folder—>“Services”—> “New Terminal at Folder”
- On windows: open folder via Explorer; type “cmd” or “powershell” into the address field on top of Explorer
- On linux / macOS terminal:
cd path/to/ARC
- From inside ARCitect: Tools -> Command Window
- try some of the git commands and debugging below
Error messages
Section titled Error messageserror message* | possible reason | possible solution |
---|---|---|
remote: HTTP Basic: Access denied fatal: Authentication failed for 'https://git.nfdi4plants.org/UserName/ARCname' | Your computer is not “linked” to your DataHUB account | Access Denied |
error: failed to push some refs to 'https://git.nfdi4plants.org/UserName/ARCname' hint: Your push was rejected due to missing or corrupt local objects. | You tried to upload LFS-tracked files that are not present on your computer | Missing LFS Objects |
remote: GitLab: LFS objects are missing. Ensure LFS is properly set up or try a manual "git lfs push --all" | You tried to upload LFS-tracked files that are not present on your computer | Missing LFS Objects |
LFS: PUT "<https://git.nfdi4plants.org/.../...>" read tcp ... i/o timeout | You ran into a time out, likely due to very large single files | Prevent LFS time out error |
error: failed to push some refs to 'https://git.nfdi4plants.org/UserName/ARCname' hint: Updates were rejected because the remote contains work that you do not have locally. | Your local ARC is out of sync with the remote. | ARC not in sync with the DataHUB |
ERROR: Can not sync with remote as no remote repository address was specified. | There is no URL specified for your ARC’s remote | Git remote |
ERROR: GIT: fatal: repository 'https://git.nfdi4plants.org/UserName/ARCname.git' not found | The remote URL does not exist | Git remote |
ERROR: GIT: fatal: detected dubious ownership | This is an error typically seen when working on mounted network drives | Dubious ownership |
fatal: credential-cache unavailable; no unix socket support | Likely happens on Windows, if a gitconfig contains credential.helper=cache | Adjust the Git Credential helper setting |
fatal: Need to specify how to reconcile divergent branches. | Your ARC contains multiple branches that progressed independently and need to be merged | Contact a data steward. |
error: unable to create file <path/to/file> : Filename too long | Likely occurs on Windows, if your ARC or files in your ARC are stored in a deeply nested folder, i.e. a folder in a folder in a folder … | Allow very long file names |
Your two favorite Git commands: status and log
Section titled Your two favorite Git commands: status and logWhenever your asked for ARC support likely related to a git issue, the first thing you want to explore is the state of the ARC.
git status
Section titled git statusTo get a good summary of the ARC including
- the branch you are on
- files that were committed since last commit
- files that were modified, but not committed (tracked)
- typically anything buggy
If everything’s clear and committed, this should prompt something like
Your branch is up to date with … nothing to commit, working tree clean
git log
Section titled git logNow, to compare the status of the local clone vs. that of the remote (i.e. the DataHUB) with a bit more confidence and wording, use
This displays the commit history (messages) of the ARC reverse-chronologically, i.e. top-most = latest. So if the top commit message of the local ARC is different from the last commit message displayed in the DataHUB, the ARC is out of sync.
If you like it prettier, remember “a dog”…
Hit qto close the log.
Git configuration
Section titled Git configurationThe gitconfig is basically the settings and preferences for your git installation. There are three types of gitconfigs. Depending on the tool (ARCitect, ARC Commander) and operating system (macOS, Linux, Windows), different git settings may be received from different config files.
flag | meaning |
---|---|
—global | current user on that computer |
—system | system-wide (all users) |
—local | current repository (ARC) |
Checking the git config
Section titled Checking the git configThe following command lists all configurations and where they originate (—show-origin) from and what there scope is (—show-scope).
In order to only show e.g. the global gitconfig use
Typical settings to explore and trouble-shoot
- the default branch should be:
init.defaultbranch=main
user.name
anduser.email
should be defined- if users keep being asked for passwords during sync with the DataHUB, they might not store their credentials via a
credential.helper
.
Changing git config
Section titled Changing git configEditing the respective gitconfig is ideally done via command line (quick internet search helps).
Adapt user name and email
Section titled Adapt user name and emailSet main as default branch
Section titled Set main as default branchGit Credential Helper
Section titled Git Credential HelperThe gitconfig contains a setting, whether and how to save git credentials on your machine called credential.helper
.
On Windows, you might run into the error fatal: credential-cache unavailable; no unix socket support
, if it is set to credential.helper=cache
.
This can be solved by either of the following:
- Remove “credential.helper=cache” via
git config --global --unset credential.helper
. - Overwrite the setting with “store” instead of “cache” via
git config --global credential.helper store
.
Allow very long file names
Section titled Allow very long file namesUsers (especially on windows) run into errors with long overall file names (i.e. full path). This setting should fix it:
Git remote
Section titled Git remoteFor ARCs the “remote” is the DataHUB. The remote address (ARC url) is stored in the git of the local ARC. Display the URL, to which the local ARC is connected via
Adding a remote during arc sync
Section titled Adding a remote during arc syncA default remote is usually added by ARC Commander or ARCitect.
If the ARC does not yet exist in the DataHUB, and you created it via ARC Commander and synced it via arc sync
, you will see this error:
This is not to worry about, the ARC was created in the DataHUB during this process.
If you only see the error ERROR: GIT: fatal: repository 'https://git.nfdi4plants.org/UserName/ARCname.git/' not found
, but not the following lines mentioning that the ARC was created automatically, make sure to use the “force”, i.e. arc sync --force ...
.
Adding a remote via git
Section titled Adding a remote via gitIf above command does not display any remote, you can add one via
Editing a remote
Section titled Editing a remoteYou can edit a remote via
Branches
Section titled BranchesAs of now, the DataPLANT tools focus on working on a single branch (main
).
It can still happen that your ARC has multiple branches e.g. by accident (see git config
—> init.defaultbranch
) or because some git-affine collaborator knows how to create them.
To display the branches of the local ARC, use
If you also want to display branches that exist on the remote (but not locally), use
Git LFS
Section titled Git LFSGit LFS is basically the system in the back to simplify working with git and (ARCs containing) large data files. ARC commander and ARCitect offer options to download (clone) an ARC without large files; speeding up the process and avoiding waste of data storage, if you are only interested e.g. in the metadata.
In order to properly upload large(r) files to the DataHUB via “pure git” (i.e. on the command line) or via ARC Commander or ARCitect, Git-LFS needs to be initiated on every computer (and user account) before using these tools.
Initiating git-lfs
Section titled Initiating git-lfsChecking whether LFS (large file storage) works properly for your ARCs
Section titled Checking whether LFS (large file storage) works properly for your ARCs- In ARCitect, you can see large files (defined by the threshold in the commit menu) flagged as
LFS
in the file tree - In the DataHUB LFS files are also flagged as
LFS
. In addition, you can click in the right sidebar of your ARC in the DataHUB on “Project Storage”. Here, the major amount of your data should be stored in “LFS”, while only a minor part is stored in “Repository”.
Via command line
Section titled Via command line- If you have git-lfs installed and know how to use there command line, simply run
git lfs install
. - You can check for the proper configuration via
git config --list --show-origin --show-scope
. Amongst others, the config should contain the following lines
Manually
Section titled ManuallyIn your home folder (Windows: C:/Users/<UserName>
, macOS: Users/<UserName>
), create or edit the file called .gitconfig to include the following lines.
Prevent LFS Time out error
Section titled Prevent LFS Time out errorWhen users try to upload very large files, i.e. not the overall push size, but single-very-large-files, they might run into a time out error. This setting should fix it:
Missing LFS objects
Section titled Missing LFS objectsThe following errors are related to missing LFS object:
Possible reasons, why this happens:
- you have downloaded (cloned) an ARC without the large files (i.e. only the pointer files) and try to upload it to another location on the DataHUB (i.e. new remote due to a transfer to other user, group, etc. or renamed ARC)
- you moved a pointer file (instead of an actual large file) from one ARC on your computer to another ARC and tried to upload
In this case you would have to download all LFS objects from the original remote first -> ask a data steward for help.
Step-by-step track large file(s) via LFS
Section titled Step-by-step track large file(s) via LFSDone in small steps plus logging. Note this works on shells like macOS terminal, linux terminal, Git Bash (available for Windows). This likely does not work on Windows Powershell and definitely not in Windows command prompt.
-
Track files via LFS (this adds them to .gitattributes)
-
git track the
.gitattributes
file first -
Git add the large files
-
Git commit (and write what’s happening to a log file)
-
Git push (and write what’s happening to a log file)
Check the status of LFS-tracked files
Section titled Check the status of LFS-tracked filesList LFS-tracked files
Section titled List LFS-tracked filesTo get a list of LFS-tracked files including the size of the original file, run
This will display the object ID (oid), the relative path to the file and the object size. The oid is also stored in the pointer file at the file’s position.
Debug LFS-tracked files
Section titled Debug LFS-tracked filesTo get a report of all LFS-tracked files including there status, use
Amongst others, this report will print for every LFS file, whether it is downloaded (checkout: true; download: true
) to the local ARC or not (checkout: false; download: false
).
Common issues and error messages
Section titled Common issues and error messagesARC files opened in multiple programs
Section titled ARC files opened in multiple programsA common source for issues are multiple programs that work on the ARC in parallel.
-
In particular, working on the ARC with multiple softwares that have Git integration may lead to confusion. For instance, while you sync the ARC using ARCitect or ARC Commander, the changes may still be displayed as un-committed in VSCode, RStudio, PyCharm or other third-party software.
-
Many softwares produce hidden temporary files. By default these files are not shown or synced by the ARCitect or ARC Commander. They might still sometimes lead to confusion, e.g. not being able to commit changes. This is especially the case for office software (Excel, Word, LibreOffice, etc.), where e.g. one of the ISA files (
isa.investigation.xlsx
,isa.study.xlsx
,isa.assay.xlsx
) or another office file stored in the ARC may be open. However, also ARCs opened in Windows Explorer or macOS Finder sometimes led to issues. -
Before syncing an ARC, close all ARC-files and Explorer / Finder windows
-
Avoid to edit, delete, or move files, while the ARC is being synced to the DataHUB
ARC not in sync with the DataHUB
Section titled ARC not in sync with the DataHUBYour local ARC is likely out of sync with the remote. This happens, if you or an invited colleague work(s) on the same ARC from a different location (e.g. the DataHUB or another computer). Before working on your ARC, make sure to update the local clone via one of these
- ARCitect —> Versioning —> Pull
arc sync
git pull
(-> this would also prompt a message if changes need to be merged)
Access denied
Section titled Access deniedSometimes you run into permission issues such as
This is due to missing or outdated DataHUB credentials on your computer. It usually helps to just retrieve new ones. If not, you might have to remove existing credentials stored on your computer.
Authenticate the computer
Section titled Authenticate the computerOption 1: via ARC Commander
Option 2: “by hand”
- Login to the DataHUB
- Create a new Personal Access Token (PAT) with scope
api
- Run a git command (e.g.
arc sync
,git pull
) to trigger being asked for git credentials- Provide your DataHUB username
- Use the token instead of your password
Delete stored credentials
Section titled Delete stored credentialsIf (new) authentication alone does not help, you might need to delete existing tokens or passwords first.
-
Run
git config --get-regexp "credential"
to find out whether and where credentials are stored -
This typically displays one of the following
credential.helper store
credential.helper osxkeychain (only on macOS)
-
If
credential.helper store
is displayed, the credentials are typically stored in~/.git-credentials
, a hidden text file stored in the user’s home folder. Edit this file and delete the row(s) containing “git.nfdi4plants.org” (https://<UserName>:<Token>@git.nfdi4plants.org
). -
On macOS (if
credential.helper osxkeychain
is displayed) open the app “Keychain Access”, search and delete passwords for “git.nfdi4plants.org”.
Dubious ownership
Section titled Dubious ownershipThe error ERROR: GIT: fatal: detected dubious ownership
typically occurs when working on a mounted network drive (Fileshare, File Server, NAS). Very simplified: the user on the computer and the owner of the network drive differ and git tries to safe you from working in a folder you do not own.
You can add the path to the ARC to the list of safe directories via the command
You can circumvent this error by adding all directories to your list of safe directories via the command
Get more log
Section titled Get more logTo help troubleshooting add (some or all) variables GIT_CURL_VERBOSE=1 GIT_TRACE=1 GIT_TRACE_PACKET=1
before your git command to get more info, e.g.