.. _likelihood_module:

Likelihood module
#################

The :py:class:`asteca.Likelihood` class is used to define an object that can
compute how similar your observed cluster is, stored in
a :py:class:`asteca.Cluster` object, compared to a given
synthetic cluster, generated by the :py:meth:`asteca.Synthetic.generate` method.

This basically compares observed photometry versus synthetic photometry.

Currently, the only function implemented is the :ref:`plr`, as
described below. The user can of course choose to implement their own
likelihood functions, but this one is recommended because it is thoroughly tested,
well documented, and has been used in the literature. It is also
the one used in the examples and tutorials provided in this documentation.


To define a :py:class:`asteca.Likelihood` object, we need to provide a
:py:class:`asteca.Cluster` object which will be used to calibrate the likelihood
function:

.. code-block:: python

    # Define a likelihood object
    likelihood = asteca.Likelihood(my_cluster)
 
After this, we can evaluate the likelihood of a given synthetic cluster
using the :py:meth:`asteca.Likelihood.get` method:

.. code-block:: python

    # Generate an array with synthetic data, as described in previous sections
    synth_cluster_arr = synthcl.generate(params)

    # Evaluate versus a synthetic cluster
    lkl = likelihood.get(synth_cluster_arr)

The result is a **normalized and inverted** likelihood which serves as a distance
measure between the observed and the synthetic data, where the best value is then ``0``.




.. _plr:


Poisson likelihood ratio
========================


This is the Poisson likelihood ratio as defined in `Tremmel et al (2013)`_. Starting
from Eq 10 with :math:`v_{i,j}=1` (i.e.: equal volumes for all cells), we have:

.. math::

    p(d|\theta) = \prod_i^B \frac{\Gamma(n_i+m_i+\frac{1}{2})}
    {2^{n_i+m_i+\frac{1}{2}} n_i!\Gamma(m_i+\frac{1}{2}))}

where :math:`n_i` and :math:`m_i` are the number of observed and synthetic stars in 
bin :math:`i`, respectively, and :math:`B` is the total number of bins.
Applying the logarithm:

.. math::

    \log(p) = \sum_i^B \left[\log\Gamma(n_i+m_i+\frac{1}{2})
    - (m_i+n_i+\frac{1}{2})\log2 -\log n_i!
    - \log \Gamma(m_i+\frac{1}{2}) \right]

re-arranging:

.. math::

    \log(p) = \sum_i^B \left[\log\Gamma(n_i+m_i+\frac{1}{2})-
    \log \Gamma(m_i+\frac{1}{2}) \right]
    - (M+N+\frac{1}{2}B) \log 2 - \sum_i^B \log n_i!

employing the :math:`LogGamma()` function;

.. math::

    \log(p) = \sum_i^B \left[LogGamma(n_i+m_i+\frac{1}{2}) - LogGamma(m_i+\frac{1}{2}) \right] - \\ (N+\frac{1}{2}B) \log 2 - \sum_i^N \log n_i! -M  \log 2

.. note::

    The :math:`LogGamma()` function is used instead of the :math:`\Gamma()`
    function because it is numerically more stable, especially when
    dealing with large numbers. This is implemented in `scipy.special.gammaln`_.


We can discard the second and third terms as they depend on :math:`N` and :math:`B`
only (i.e.: the number of stars in observed cluster and the number of bins), and hence
do not change for different synthetic data.

The last term :math:`M` is the total number of synthetic stars. This value is fixed
in the arrays of synthetic data generated by :py:meth:`asteca.Synthetic.generate`,
and thus we can also discard it.

.. important::

    The fact that :math:`M` is fixed is true due to how **ASteCA** generates
    synthetic clusters. This might not be the case for other implementations.

Finally, discarding the last three terms, we have the expression for the Poisson
likelihood ratio:

.. math::

    \log(p)\approx \sum_i^B \left[LogGamma(n_i+m_i+\frac{1}{2}) - LogGamma(m_i+\frac{1}{2}) \right]


This likelihood is normalized and inverted so that the best value is ``0`` and the
worst value is ``1``, as mentioned above.


.. _Tremmel et al (2013): https://ui.adsabs.harvard.edu/abs/2013ApJ...766...19T/abstract
.. _scipy.special.gammaln: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.gammaln.html