Skip to content

Redis caching backend deadlocks on dispose when notification thread never started (unbounded wait) #24

@gfraiteur

Description

@gfraiteur

Summary

PostSharp.Patterns.Caching.Backends.Redis can deadlock on dispose (unbounded, no timeout) when the Redis pub/sub notification-processing thread was never started. A transient Redis connect/subscribe hiccup is enough to trigger it. The stuck thread is the (single) finalizer thread, which then prevents the host process from exiting.

This is the product root cause behind the CI test-target hang reported in #22 (the TimeSensitiveTest target runs the Caching/Redis tests, and a hung finalizer keeps the test host alive until the build is killed). Same source code passes or hangs depending only on the runtime Redis condition — i.e. it is not a code regression, it is a latent unbounded-wait bug.

Affected component

Patterns/Caching/PostSharp.Patterns.Caching.Backends.Redis/RedisNotificationQueue.cs
(also the sync-over-async dispose in RedisCachingBackend.cs)

Root cause

In RedisNotificationQueue.InitAsync the channel-subscription loop runs before the processing thread is started:

  • RedisNotificationQueue.cs:86-114SubscribeAsync + while (!IsConnected(channel)) (with a connectionTimeout) + PingAsync, all of which can throw on a transient hiccup or cancellation.
  • RedisNotificationQueue.cs:116notificationProcessingThread.Start(...) only runs after the loop succeeds.

notificationProcessingThreadCompleted (a TaskCompletionSource) is signalled only in the processing thread's finally block (RedisNotificationQueue.cs:254-262). If the subscription loop throws, the thread never starts, so that TCS is never completed.

Any subsequent dispose then blocks forever:

  • RedisNotificationQueue.cs:352 (sync Dispose) — notificationProcessingThreadCompleted.Task.Wait()no timeout, no cancellation.
  • RedisNotificationQueue.cs:353 / :426notificationProcessingThread.Join() — unbounded.

Because a failed RedisCachingBackend.Create / CreateAsync does not dispose the partially-initialized backend (RedisCachingBackend.cs:240, :280), the orphaned RedisNotificationQueue is collected and its finalizer (~RedisNotificationQueue -> Dispose(false)) hits the unbounded wait on the finalizer thread. A permanently blocked finalizer thread stops all finalization and prevents clean process exit.

Impact

  • A transient Redis pub/sub connect failure can hang the disposing/finalizing thread indefinitely.
  • In a host that depends on clean shutdown (e.g. a test host, or a short-lived process), this manifests as a process that never exits.
  • Customer-facing risk: applications that create Redis caching backends under flaky network conditions can leak a wedged finalizer thread.

Fix direction

  1. Always complete notificationProcessingThreadCompleted even when the processing thread never starts (e.g. set it on the init-failure path, or only wait when the thread was actually started).
  2. Bound every dispose wait (Task.Wait/Thread.Join) with a timeout; log and proceed on timeout rather than blocking forever.
  3. Dispose the partially-initialized backend/queue when Init/InitAsync throws, so failures clean up deterministically instead of relying on the finalizer.
  4. Add a deterministic regression test (init fails before thread start -> dispose must return promptly).

Relation

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions