
    Ǆg                        d Z ddlmZ ddlmZ ddlmZ ddlmZ ddl	m
Z
 ddlmZ ddlmZ dd	lmZmZ dd
lmZ ddlmZ ddlmZ ddlmZ ddlmZ ddlmZ ddlmZ ddl m!Z! ddl"m#Z# ddl$m%Z% ddl&m'Z' ddl(m)Z) ddl*m+Z+ ddl,m-Z-m.Z.m/Z/ ddl0m1Z1 ddl2m3Z3 ddl4m5Z5 ddl6m7Z7 ddl8m9Z9 ddl:m;Z; ddl<m=Z= ddl>m?Z? dd l@mAZA dd!lBmCZC dd"lDmEZE dd#lFmGZGmHZH dd$lImJZJ dd%lKmLZL dd&lMmNZN dd'lOmPZP dd(lQmRZR dd)lSmTZT dd*l dd+lUmVZV dd,lWmXZX dd-lYmZZZ dd.l[m\Z\  e-        [-g d/Z]e]j                  ej                         y0)1a  
The ``distributions`` package contains parameterizable probability distributions
and sampling functions. This allows the construction of stochastic computation
graphs and stochastic gradient estimators for optimization. This package
generally follows the design of the `TensorFlow Distributions`_ package.

.. _`TensorFlow Distributions`:
    https://arxiv.org/abs/1711.10604

It is not possible to directly backpropagate through random samples. However,
there are two main methods for creating surrogate functions that can be
backpropagated through. These are the score function estimator/likelihood ratio
estimator/REINFORCE and the pathwise derivative estimator. REINFORCE is commonly
seen as the basis for policy gradient methods in reinforcement learning, and the
pathwise derivative estimator is commonly seen in the reparameterization trick
in variational autoencoders. Whilst the score function only requires the value
of samples :math:`f(x)`, the pathwise derivative requires the derivative
:math:`f'(x)`. The next sections discuss these two in a reinforcement learning
example. For more details see
`Gradient Estimation Using Stochastic Computation Graphs`_ .

.. _`Gradient Estimation Using Stochastic Computation Graphs`:
     https://arxiv.org/abs/1506.05254

Score function
^^^^^^^^^^^^^^

When the probability density function is differentiable with respect to its
parameters, we only need :meth:`~torch.distributions.Distribution.sample` and
:meth:`~torch.distributions.Distribution.log_prob` to implement REINFORCE:

.. math::

    \Delta\theta  = \alpha r \frac{\partial\log p(a|\pi^\theta(s))}{\partial\theta}

where :math:`\theta` are the parameters, :math:`\alpha` is the learning rate,
:math:`r` is the reward and :math:`p(a|\pi^\theta(s))` is the probability of
taking action :math:`a` in state :math:`s` given policy :math:`\pi^\theta`.

In practice we would sample an action from the output of a network, apply this
action in an environment, and then use ``log_prob`` to construct an equivalent
loss function. Note that we use a negative because optimizers use gradient
descent, whilst the rule above assumes gradient ascent. With a categorical
policy, the code for implementing REINFORCE would be as follows::

    probs = policy_network(state)
    # Note that this is equivalent to what used to be called multinomial
    m = Categorical(probs)
    action = m.sample()
    next_state, reward = env.step(action)
    loss = -m.log_prob(action) * reward
    loss.backward()

Pathwise derivative
^^^^^^^^^^^^^^^^^^^

The other way to implement these stochastic/policy gradients would be to use the
reparameterization trick from the
:meth:`~torch.distributions.Distribution.rsample` method, where the
parameterized random variable can be constructed via a parameterized
deterministic function of a parameter-free random variable. The reparameterized
sample therefore becomes differentiable. The code for implementing the pathwise
derivative would be as follows::

    params = policy_network(state)
    m = Normal(*params)
    # Any distribution with .has_rsample == True could work based on the application
    action = m.rsample()
    next_state, reward = env.step(action)  # Assuming that reward is differentiable
    loss = -reward
    loss.backward()
   )
transforms)	Bernoulli)Beta)Binomial)Categorical)Cauchy)Chi2)	biject_totransform_to)ContinuousBernoulli)	Dirichlet)Distribution)ExponentialFamily)Exponential)FisherSnedecor)Gamma)	Geometric)Gumbel)
HalfCauchy)
HalfNormal)Independent)InverseGamma)_add_kl_infokl_divergenceregister_kl)Kumaraswamy)Laplace)LKJCholesky)	LogNormal)LogisticNormal)LowRankMultivariateNormal)MixtureSameFamily)Multinomial)MultivariateNormal)NegativeBinomial)Normal)OneHotCategorical OneHotCategoricalStraightThrough)Pareto)Poisson)RelaxedBernoulli)RelaxedOneHotCategorical)StudentT)TransformedDistribution)*)Uniform)VonMises)Weibull)Wishart).r   r   r   r   r   r	   r   r   r   r   r   r   r   r   r   r   r   r   r   r   r   r   r   r    r!   r"   r#   r$   r%   r&   r'   r(   r)   r+   r,   r-   r*   r0   r1   r2   r3   r.   r
   r   r   r   N)___doc__ r   	bernoullir   betar   binomialr   categoricalr   cauchyr   chi2r	   constraint_registryr
   r   continuous_bernoullir   	dirichletr   distributionr   
exp_familyr   exponentialr   fishersnedecorr   gammar   	geometricr   gumbelr   half_cauchyr   half_normalr   independentr   inverse_gammar   klr   r   r   kumaraswamyr   laplacer   lkj_choleskyr   
log_normalr   logistic_normalr    lowrank_multivariate_normalr!   mixture_same_familyr"   multinomialr#   multivariate_normalr$   negative_binomialr%   normalr&   one_hot_categoricalr'   r(   paretor)   poissonr*   relaxed_bernoullir+   relaxed_categoricalr,   studentTr-   transformed_distributionr.   uniformr0   	von_misesr1   weibullr2   wishartr3   __all__extend     d/home/mcse/projects/flask_80/flask-venv/lib/python3.12/site-packages/torch/distributions/__init__.py<module>rf      s   GR      $   8 5   & ) $ *     # # $ ' 8 8 $  % ! + B 2 $ 3 /  T   / 9  =      /` z!! "rd   