(Note: in the following, I'm going to use the mnemonic names df and nc for degrees of freedom and noncentrality, respectively. The Greek letters aren't used consistently in the various references I've been using, and I got tired of having to remind myself which symbol is which parameter.)
These are some notes about the code that attempts to invert the CDF of the noncentral t distribution with respect to the degrees of freedom. The code was added in #1355 (or more accurately, reenabled). The old comment said "This code is disabled, since there can be multiple answers to the question, and it's not clear how to find the 'right' one." As noted in that comment, there are cases where the function being inverted is not monotone; in particular the function is not monotone when x and nc have the same sign and |nc| is sufficiently large.
The limits as df -> 0 and df -> inf of the noncentral t CDF are Phi(-nc) and Phi(x - nc), respectively, where Phi(z) is the CDF of the standard normal distribution. So in the monotone case, these value determine the domain of the inverse. Outside of these values, the inverse function should probably trigger a domain error. Even if the function is not monotone, one of the bounds is still correct.
The "problems" referred to in the issue title are:
- The solver typically fails in the input regions where the inverse is multivalued.
- The solver doesn't trigger domain errors in regions where there is no inverse. For example, with
nc = -2.5, x = 1.25 and p = 0.99, find_degrees_of_freedom(nc, x, p) returns 5.30499e-313, when instead it should trigger a domain error. (See the fourth plot below.)
It would be useful to investigate:
(1) Under what conditions are we assured that the function is monotone? In this case, we know the domain of the inverse, so inputs to the inverse that are outside the domain can trigger a domain error.
(2) Can we be smarter in the non-monotone case? How hard would it be to ensure that the return value is always the lower branch (i.e. the smaller df), and the solver succeeds on the full extent of this branch?
Currently, the solver typically fails when the inverse is multivalued. Running a grid of p inputs shows that the solver will find some solutions a small way along the lower branch, but then starts failing as p moves further into the region of multivalued results.
Getting (2) working reliably might be a lot of work. An alternative would be to define the behavior such that the domain of the inverse is restricted to the region where the inverse is single valued. (This is almost what is happening now anyway, since the solver often fails in the multivalued region.) This would be easy to implement, because we know the domain in that case. Of course, this is a bit of a cop-out, but it would provide a well-defined and unambiguous behavior. The example added in a follow-up comment makes this idea less useful.
Here are plots of the function to be inverted that include various combinations of x and nc. The first and last plots show examples where the function is not monotone. For all of them the df axis uses log scaling. For the last plot, the vertical axis also uses a log-scale. The label of the vertical axis, nctdtr(df, nc, x), uses the (somewhat cryptic) SciPy name of the CDF of the noncentral t distribution.
The first three plots have x < 0:
The next three have x > 0:
This one uses a log-scale on the vertical axis to make clearer the behavior where df is very small:

(Note: in the following, I'm going to use the mnemonic names
dfandncfor degrees of freedom and noncentrality, respectively. The Greek letters aren't used consistently in the various references I've been using, and I got tired of having to remind myself which symbol is which parameter.)These are some notes about the code that attempts to invert the CDF of the noncentral t distribution with respect to the degrees of freedom. The code was added in #1355 (or more accurately, reenabled). The old comment said "This code is disabled, since there can be multiple answers to the question, and it's not clear how to find the 'right' one." As noted in that comment, there are cases where the function being inverted is not monotone; in particular the function is not monotone when
xandnchave the same sign and|nc|is sufficiently large.The limits as
df -> 0anddf -> infof the noncentral t CDF arePhi(-nc)andPhi(x - nc), respectively, wherePhi(z)is the CDF of the standard normal distribution. So in the monotone case, these value determine the domain of the inverse. Outside of these values, the inverse function should probably trigger a domain error. Even if the function is not monotone, one of the bounds is still correct.The "problems" referred to in the issue title are:
nc = -2.5,x = 1.25andp = 0.99,find_degrees_of_freedom(nc, x, p)returns5.30499e-313, when instead it should trigger a domain error. (See the fourth plot below.)It would be useful to investigate:
(1) Under what conditions are we assured that the function is monotone? In this case, we know the domain of the inverse, so inputs to the inverse that are outside the domain can trigger a domain error.
(2) Can we be smarter in the non-monotone case? How hard would it be to ensure that the return value is always the lower branch (i.e. the smaller
df), and the solver succeeds on the full extent of this branch?Currently, the solver typically fails when the inverse is multivalued. Running a grid of
pinputs shows that the solver will find some solutions a small way along the lower branch, but then starts failing aspmoves further into the region of multivalued results.Getting (2) working reliably might be a lot of work. An alternative would be to define the behavior such that the domain of the inverse is restricted to the region where the inverse is single valued. (This is almost what is happening now anyway, since the solver often fails in the multivalued region.) This would be easy to implement, because we know the domain in that case. Of course, this is a bit of a cop-out, but it would provide a well-defined and unambiguous behavior.The example added in a follow-up comment makes this idea less useful.Here are plots of the function to be inverted that include various combinations of
xandnc. The first and last plots show examples where the function is not monotone. For all of them thedfaxis uses log scaling. For the last plot, the vertical axis also uses a log-scale. The label of the vertical axis,nctdtr(df, nc, x), uses the (somewhat cryptic) SciPy name of the CDF of the noncentral t distribution.The first three plots have
x < 0:The next three have
x > 0:This one uses a log-scale on the vertical axis to make clearer the behavior where
dfis very small: