Skip to content

Problems with find_degrees_of_freedom for the noncentral t distribution. #1410

@WarrenWeckesser

Description

@WarrenWeckesser

(Note: in the following, I'm going to use the mnemonic names df and nc for degrees of freedom and noncentrality, respectively. The Greek letters aren't used consistently in the various references I've been using, and I got tired of having to remind myself which symbol is which parameter.)

These are some notes about the code that attempts to invert the CDF of the noncentral t distribution with respect to the degrees of freedom. The code was added in #1355 (or more accurately, reenabled). The old comment said "This code is disabled, since there can be multiple answers to the question, and it's not clear how to find the 'right' one." As noted in that comment, there are cases where the function being inverted is not monotone; in particular the function is not monotone when x and nc have the same sign and |nc| is sufficiently large.

The limits as df -> 0 and df -> inf of the noncentral t CDF are Phi(-nc) and Phi(x - nc), respectively, where Phi(z) is the CDF of the standard normal distribution. So in the monotone case, these value determine the domain of the inverse. Outside of these values, the inverse function should probably trigger a domain error. Even if the function is not monotone, one of the bounds is still correct.

The "problems" referred to in the issue title are:

  • The solver typically fails in the input regions where the inverse is multivalued.
  • The solver doesn't trigger domain errors in regions where there is no inverse. For example, with nc = -2.5, x = 1.25 and p = 0.99, find_degrees_of_freedom(nc, x, p) returns 5.30499e-313, when instead it should trigger a domain error. (See the fourth plot below.)

It would be useful to investigate:
(1) Under what conditions are we assured that the function is monotone? In this case, we know the domain of the inverse, so inputs to the inverse that are outside the domain can trigger a domain error.
(2) Can we be smarter in the non-monotone case? How hard would it be to ensure that the return value is always the lower branch (i.e. the smaller df), and the solver succeeds on the full extent of this branch?

Currently, the solver typically fails when the inverse is multivalued. Running a grid of p inputs shows that the solver will find some solutions a small way along the lower branch, but then starts failing as p moves further into the region of multivalued results.

Getting (2) working reliably might be a lot of work. An alternative would be to define the behavior such that the domain of the inverse is restricted to the region where the inverse is single valued. (This is almost what is happening now anyway, since the solver often fails in the multivalued region.) This would be easy to implement, because we know the domain in that case. Of course, this is a bit of a cop-out, but it would provide a well-defined and unambiguous behavior. The example added in a follow-up comment makes this idea less useful.


Here are plots of the function to be inverted that include various combinations of x and nc. The first and last plots show examples where the function is not monotone. For all of them the df axis uses log scaling. For the last plot, the vertical axis also uses a log-scale. The label of the vertical axis, nctdtr(df, nc, x), uses the (somewhat cryptic) SciPy name of the CDF of the noncentral t distribution.

The first three plots have x < 0:

Image Image Image

The next three have x > 0:

Image Image

This one uses a log-scale on the vertical axis to make clearer the behavior where df is very small:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions