Skip custom mirror config when keyring sync fails#4577
Conversation
Check the exit state of archlinux-keyring-wkd-sync.service after waiting for it to finish. If it failed, skip set_mirrors() and continue installation with default mirrors instead of crashing with "GPGME error: No data" during pacman -Syy.
My thought is that perhaps there should be a fall-back path that uses steps in comment And I don't think how you handled it in code will help actual state tracking of the service:
|
Instead of skipping mirror configuration when wkd-sync fails, catch the GPGME error at the point where it actually occurs - during pacman -Syy. If the sync fails with a keyring-related error, reinit the keyring with pacman-key --init/--populate and retry the sync.
|
Good point, thanks for linking #2213. You're right that checking the service state alone isn't enough - the keyring can be broken even when wkd-sync reports success (e.g. network wasn't ready when it ran). Updated the approach: instead of skipping mirrors, we now catch the error where it actually happens - in pacman -Syy. If the sync fails with a GPGME/keyring error, we automatically reinit the keyring (killall gpg-agent, pacman-key --init, pacman-key --populate archlinux) and retry the sync. The wkd-sync state check stays as an early warning in the log. |
| try: | ||
| SysCommand('killall gpg-agent') | ||
| except SysCallError as err: | ||
| debug(f'killall gpg-agent failed (may not be running): {err}') |
There was a problem hiding this comment.
you should validate if it is actually running before killing it and not hoping the error is related to that state
There was a problem hiding this comment.
Hm... Maybe you're right. Done.
|
|
||
| try: | ||
| self.run('-Syy', default_cmd='pacman') | ||
| except SysCallError as err: |
There was a problem hiding this comment.
can be simplified to
def sync(self) -> None:
if self.synced:
return
try:
self.run('-Syy')
except SysCallError as err:
if b'GPGME' in err.worker_log or b'keyring' in err.worker_log.lower():
warn('Pacman sync failed with keyring error, attempting keyring reinit')
self._reinit_keyring()
msg = 'Could not sync a new package database after keyring reinit'
else:
msg = 'Could not sync a new package database'
self.ask(msg, 'Could not sync mirrors', self.run, '-Syy', default_cmd='pacman')
self.synced = True
Collapse the duplicated self.ask() blocks in sync() into a single call via a msg variable, and drop the redundant default_cmd='pacman' (it is already the default). In _reinit_keyring(), check that gpg-agent is actually running (pgrep -x) before killing it, instead of catching the killall error and assuming it was not running.
| @staticmethod | ||
| def _is_running(process: str) -> bool: | ||
| try: | ||
| SysCommand(f'pgrep -x {process}') |
There was a problem hiding this comment.
Given that gpg-agent is a systemd service we can use systemctl rather
def _is_running() -> bool:
return os.system('systemctl is-active --quiet gpg-agent.service') == 0
Maybe even put that into the utils https://github.com/archlinux/archinstall/blob/af2120c0e94ada1bb7e9eea44283d19323ce2f27/archinstall/lib/utils/util.py
There was a problem hiding this comment.
Good call on systemctl, that's cleaner and I see we already do exactly this for espeakup.
One thing I'm unsure about though: in the live ISO, gpg-agent is usually spawned on-demand by gpg/pacman-key rather than started as a systemd service, so systemctl is-active --quiet gpg-agent.service might report it inactive even while the process is actually running. In that case we'd skip the killall, and that's the one spot where killing it matters (a stale agent holding the old /etc/pacman.d/gnupg open before --init). pgrep catches the process directly regardless of how it was started, which is why I reached for it.
That said, I might be overthinking the ISO behavior here. Have you actually seen gpg-agent show up as an active unit in that context? If so I'm happy to switch to systemctl, otherwise maybe pgrep is the safer bet. What do you think?

Related to #4565
I couldn't reproduce the exact crash, but there is a clear gap in the code: _verify_service_stop() waits for archlinux-keyring-wkd-sync.service to reach a terminal state but never checks whether it succeeded or failed. If it failed, set_mirrors() triggers mirror speed sorting (downloading core.db from each mirror), followed by pacman -Syy which can crash with "GPGME error: No data" due to uninitialized keyring.
This change checks the service exit state after waiting. If it failed, custom mirror configuration is skipped and installation continues with default mirrors (from reflector or ISO).
What changed: