profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/langmore/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

tensorflow/probability 3245

Probabilistic reasoning and statistical analysis in TensorFlow

langmore/statsmodels 2

A slight modification of statsmodels that allows for l1 regularization of LikelihoodModel. See README.txt

langmore/parallel_easy 1

No longer maintained. See https://github.com/columbia-applied-data-science/rosetta/tree/master/rosetta/parallel

langmore/boot-camps 0

Software Carpentry boot camp material

langmore/cmd_datatools 0

No longer maintained. See https://github.com/columbia-applied-data-science/rosetta/tree/master/rosetta/cmd

langmore/generic 0

Generic starting repo for projects

langmore/gensim 0

Vector Space Modelling for Humans

langmore/swc-teaching 0

Homework associated with software carpentry

issue commenttensorflow/probability

On the predictive distribution of a structural time-series model

@davmre, sorry for dragging you into the conversation like this. You might be the only one who is able to answer.

IvanUkhov

comment created time in 11 hours

push eventtensorflow/probability

Srinivas Vasudevan

commit sha eefb4dfb3805cdcd2cf34b387ffb3ad943312d92

Implement @tfp.math.custom_gradient for Bessel functions. PiperOrigin-RevId: 361146735

view details

push time in a day

push eventtensorflow/probability

Srinivas Vasudevan

commit sha ee5b9d6f48c10314296a6d6d8fbc802e6a69a8b5

Fix second derivatives for Dawsn, Erfcx and LambertW. PiperOrigin-RevId: 361145787

view details

push time in a day

push eventtensorflow/probability

bjp

commit sha 12b50fa584ef347b0c34b9cc9912a1b7ac6fb698

Adds a tfd.Masked distribution, which masks an underlying distribution in the batch dimensions. PiperOrigin-RevId: 361138088

view details

push time in a day

issue openedtensorflow/probability

Running MC methods until convergence

Hi!

I try to implement a simple schema to decide how many iterations of MCMC/HMC to run:

do
    new_samples = run_MC_method()
    all_samples.extend(new_samples)
    gelman_rubin = max(potential_scale_reduction(all_samples))
while gelman_rubin > CONST

I would like to have this schema running under XLA compilation. If I understood correctly the intentions behind the API, the sample_chain method should be reused in such a case with previous_kernel_results set and current_state taken from the previous result of the same function.

It seems to me quite complicated to do it by hand. In order to compile it under XLA all the variables would have to have their shape_invariants defined. I define below only shape_invariants for samples, but more have to be defined - if I understand correctly the stacktrace I would have to define shape invariants for all the tensors from the trace_fn and perhaps some that are used internally by DualAveragingStepSizeAdaptation.

I believe there must be a simpler way to achieve what I want but I can't find it documented. Could you tell me if:

  • my understanding above is correct?
  • the current_state for the next sample_chain should be the last sample from the previous run?
  • DualAveragingStepSizeAdaptation can be used in such a way? I'm worried that it might not function correctly in all the sample_chain executions other than the first one.
  • there is a simpler alternative?

thanks!

Here is an example where I tried to do based on the linear regression tutorial. The code runs fine without XLA compilation. I don't know though if the step size adaptation and internal NUTS state are correct when the new sample_chain is being run.

from pprint import pprint
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import arviz as az

import tensorflow as tf
import tensorflow_probability as tfp

tfd = tfp.distributions
tfb = tfp.bijectors

NUM_CHAINS = 9

dtype = tf.float64

dfhogg = pd.DataFrame(np.array([[1, 201, 592, 61, 9, -0.84],
                                 [2, 244, 401, 25, 4, 0.31],
                                 [3, 47, 583, 38, 11, 0.64],
                                 [4, 287, 402, 15, 7, -0.27],
                                 [5, 203, 495, 21, 5, -0.33],
                                 [6, 58, 173, 15, 9, 0.67],
                                 [7, 210, 479, 27, 4, -0.02],
                                 [8, 202, 504, 14, 4, -0.05],
                                 [9, 198, 510, 30, 11, -0.84],
                                 [10, 158, 416, 16, 7, -0.69],
                                 [11, 165, 393, 14, 5, 0.30],
                                 [12, 201, 442, 25, 5, -0.46],
                                 [13, 157, 317, 52, 5, -0.03],
                                 [14, 131, 311, 16, 6, 0.50],
                                 [15, 166, 400, 34, 6, 0.73],
                                 [16, 160, 337, 31, 5, -0.52],
                                 [17, 186, 423, 42, 9, 0.90],
                                 [18, 125, 334, 26, 8, 0.40],
                                 [19, 218, 533, 16, 6, -0.78],
                                 [20, 146, 344, 22, 5, -0.56]]),
                   columns=['id','x','y','sigma_y','sigma_x','rho_xy'])


## for convenience zero-base the 'id' and use as index
dfhogg['id'] = dfhogg['id'] - 1
dfhogg.set_index('id', inplace=True)

## standardize (mean center and divide by 1 sd)
dfhoggs = (dfhogg[['x','y']] - dfhogg[['x','y']].mean(0)) / dfhogg[['x','y']].std(0)
dfhoggs['sigma_y'] = dfhogg['sigma_y'] / dfhogg['y'].std(0)
dfhoggs['sigma_x'] = dfhogg['sigma_x'] / dfhogg['x'].std(0)

X_np = dfhoggs['x'].values
sigma_y_np = dfhoggs['sigma_y'].values
Y_np = dfhoggs['y'].values

def gen_ols_batch_model(X, sigma, hyperprior_mean=0, hyperprior_scale=1):
    hyper_mean = tf.cast(hyperprior_mean, dtype)
    hyper_scale = tf.cast(hyperprior_scale, dtype)
    return tfd.JointDistributionSequential([
        # b0
        tfd.Sample(tfd.Normal(loc=hyper_mean, scale=hyper_scale), sample_shape=1),
        # b1
        tfd.Sample(tfd.Normal(loc=hyper_mean, scale=hyper_scale), sample_shape=1),
        # likelihood
        lambda b1, b0: tfd.Independent(
          tfd.Normal(
              # Parameter transformation
              loc=b0 + b1*X,
              scale=sigma),
          reinterpreted_batch_ndims=1
        ),
    ], validate_args=True)

mdl_ols_batch = gen_ols_batch_model(X_np[tf.newaxis, ...],
                                    sigma_y_np[tf.newaxis, ...])
                                    
def get_max_gelman_rubin(all_states):
    return tf.reduce_max(
        [tf.reduce_max(i) for i in tfp.mcmc.potential_scale_reduction(all_states)]
    )

@tf.function(experimental_compile=True)
def run_chain(init_state, step_size, target_log_prob_fn, unconstraining_bijectors,
              num_steps=10, burnin=50):

    def trace_fn(_, pkr):
        return (
            pkr.inner_results.inner_results.target_log_prob,
            pkr.inner_results.inner_results.leapfrogs_taken,
            pkr.inner_results.inner_results.has_divergence,
            pkr.inner_results.inner_results.energy,
            pkr.inner_results.inner_results.log_accept_ratio
        )

    kernel = tfp.mcmc.TransformedTransitionKernel(
    inner_kernel=tfp.mcmc.NoUTurnSampler(
      target_log_prob_fn,
      step_size=step_size),
    bijector=unconstraining_bijectors)

    hmc = tfp.mcmc.DualAveragingStepSizeAdaptation(
        inner_kernel=kernel,
        num_adaptation_steps=burnin,
        step_size_setter_fn=lambda pkr, new_step_size: pkr._replace(
            inner_results=pkr.inner_results._replace(step_size=new_step_size)),
        step_size_getter_fn=lambda pkr: pkr.inner_results.step_size,
        log_accept_prob_getter_fn=lambda pkr: pkr.inner_results.log_accept_ratio
    )

    # Sampling from the chain.
    all_states, all_sampler_stat, fkr = tfp.mcmc.sample_chain(
      num_results=num_steps,
      num_burnin_steps=burnin,
      current_state=init_state,
      kernel=hmc,
      trace_fn=trace_fn,
      return_final_kernel_results=True
    )
    
    gelman_rubin = get_max_gelman_rubin(all_states)

    
    while tf.abs(gelman_rubin - 1) > 0.001:
        tf.autograph.experimental.set_loop_options(shape_invariants=[
            (all_states[0], tf.TensorShape([None, NUM_CHAINS, 1])),
            (all_states[1], tf.TensorShape([None, NUM_CHAINS, 1]))
        ])
        chain_states, sampler_stat, fkr = tfp.mcmc.sample_chain(
            num_results=num_steps,
            num_burnin_steps=0,
            previous_kernel_results=fkr,
            current_state=[d[-1] for d in all_states],
            kernel=hmc,
            trace_fn=trace_fn,
            return_final_kernel_results=True
        )
        all_states = [tf.concat((all_states[i], chain_states[i]), axis=0) for i in [0,1]]
        gelman_rubin = get_max_gelman_rubin(all_states)

    # TODO: all_sample_stat should gather sampler_stats from all the iterations.
    return all_states, all_sampler_stat


nchain = NUM_CHAINS
b0, b1, _ = mdl_ols_batch.sample(nchain)
init_state = [b0, b1]
step_size = [tf.cast(i, dtype=dtype) for i in [.1, .1]]
target_log_prob_fn = lambda *x: mdl_ols_batch.log_prob(x + (Y_np, ))

# bijector to map contrained parameters to real
unconstraining_bijectors = [
    tfb.Identity(),
    tfb.Identity(),
]

samples, sampler_stat = run_chain(
    init_state,
    step_size,
    target_log_prob_fn,
    unconstraining_bijectors
)

created time in a day

issue openedtensorflow/probability

`tfb.real_nvp_default_template` legacy layers throw error

Hi,

I am trying to use realnvp in a keras functional API model (tf=2.4.0 and tfp=0.12.1), but this error is thrown:

TypeError: The following are legacy tf.layers.Layers:
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe2636d1310>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe263910520>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe263904250>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243c185e0>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe263531430>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243f5fc40>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe263974eb0>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243ed20a0>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243c3f130>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243ed2b80>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243c18a30>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243ed26a0>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe24410b670>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243ef82b0>
  <tensorflow.python.keras.legacy_tf_layers.core.Dense object at 0x7fe243fb9700>
To use keras as a framework (for instance using the Network, Model, or Sequential classes), please use the tf.keras.layers implementation instead. (Or, if writing custom layers, subclass from tf.keras.layers rather than tf.layers)

I believe these are the lines that should be changed: https://github.com/tensorflow/probability/blob/92af38538e215063777e531380ca84c076fe1134/tensorflow_probability/python/bijectors/real_nvp.py#L396 https://github.com/tensorflow/probability/blob/92af38538e215063777e531380ca84c076fe1134/tensorflow_probability/python/bijectors/real_nvp.py#L402

shouldn't be a problem to just convert the layers to tf.keras.layers.Dense right? Happy to open a PR if needed.

created time in a day

issue openedtensorflow/probability

Permute(np.random.permutation(event_size)).astype('int32')) still not a reliable parameterization on TF 2 ?

According to the documentation one should not use numpy to initialize bijectors permutation, but the alternative solution based on "tf.get_variable" is no longer posible on TF2. What should I do ?

"Permute(np.random.permutation(event_size)).astype('int32')) is not a reliable parameterization (nor would it be even if using tf.constant). A safe alternative is to use tf.get_variable to achieve "init once" behavior"

created time in a day

push eventtensorflow/probability

Srinivas Vasudevan

commit sha 92af38538e215063777e531380ca84c076fe1134

Fix second derivatives for igammainv. - Specifically allow for more numerically stable gradients with respect to the second parameter. PiperOrigin-RevId: 361010353

view details

push time in 2 days

push eventtensorflow/probability

bjp

commit sha 38c1e0c8693592d1337696c4049265cc08044329

Add a name arg to log_prob_ratio. PiperOrigin-RevId: 361008471

view details

push time in 2 days

push eventtensorflow/probability

Googler

commit sha ea08ffbad4accc1f667e8b479d73dc9af2833c4b

Disables dpp_test. PiperOrigin-RevId: 360994693

view details

push time in 2 days

pull request commenttensorflow/probability

Introduce Kendall's Tau computation.

Not yet, but the code has diverged a lot from this PR so thought I should close. Will ping this thread when it's made part of a future release.

sorensenjs

comment created time in 2 days

pull request commenttensorflow/probability

Introduce Kendall's Tau computation.

Is there another PR?

sorensenjs

comment created time in 2 days

PR closed tensorflow/probability

Introduce Kendall's Tau computation. cla: yes

Migrated and updated https://github.com/tensorflow/addons/pull/2169/files

+285 -0

1 comment

4 changed files

sorensenjs

pr closed time in 2 days

pull request commenttensorflow/probability

Introduce Kendall's Tau computation.

Working on integrating via tensorflow probability.

sorensenjs

comment created time in 2 days

push eventtensorflow/probability

Srinivas Vasudevan

commit sha bcb4e4e56367e95d1b9b14d8dcc46b3f61a69d74

Allow Invert bijector to do no jacobian reductions when `event_ndims` is not passed in. PiperOrigin-RevId: 360945859

view details

push time in 2 days

issue openedtensorflow/probability

distributions.Poisson.quantile not implemented but exists in documentation

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 TensorFlow installed from (source or binary): pip TensorFlow version (use command below): v2.4.0-49-g85c8b2a817f 2.4.1 Python version: 3.8.8

Running tfp.distributions.Poisson(rate=1).quantile(0.5) raises a

NotImplementedError: quantile is not implemented: Poisson

Despite that method being present in the documentation.

This seems to be the same issue as #20208, but for Poisson. There is a Poisson quantile implementation in scipy, which is making use of some Poisson survival functions from scipy.special.

created time in 2 days

issue commenttensorflow/probability

log probability of Truncated Normal gives NAN as gradient

@srvasude Could we reopen this? Below is another example that you could try running locally to reproduce the problem.

import tensorflow_probability as tfp
import tensorflow as tf

mu = tf.Variable(0.0)
sigma = tf.constant(1.0)
lower = tf.constant(6.0)
upper = lower + 5
x = lower + 2

with tf.GradientTape() as tape:
    dist = tfp.distributions.TruncatedNormal(mu, sigma, lower, upper)
    log_prob = dist.log_prob(x)
print(tape.gradient(log_prob, [mu]))

I would like to know a workaround for this problem. I tried to use tf.where as a workaround by setting a threshold to decide whether to take the gradient of the TruncatedNormal or not, however, there's an issue with the tf.where itself. https://github.com/tensorflow/tensorflow/issues/38349

Nephalen

comment created time in 2 days

push eventtensorflow/probability

Srinivas Vasudevan

commit sha ed6909252198f48ac68978576c682054b08f7d93

Change normed to density as normed is deprecated for matplotlib plotting. PiperOrigin-RevId: 360763882

view details

push time in 3 days

push eventtensorflow/probability

sharadmv

commit sha a0e9e6a186237b7cf29ef1c6494525828efee561

[Oryx] Add term-rewriting system PiperOrigin-RevId: 360762780

view details

push time in 3 days

push eventtensorflow/probability

Srinivas Vasudevan

commit sha d33ab2add39c5e615dfe81267eafee3db72c4654

Test Truncated distribgutions in distribution_bijectors_test.py PiperOrigin-RevId: 360730757

view details

push time in 3 days

push eventtensorflow/probability

emilyaf

commit sha 44bdd36fc776d909dff7b47b03398b8e2e31f866

Light refactoring of `surrogate_posteriors.py` and tests. PiperOrigin-RevId: 360711023

view details

push time in 3 days

pull request commenttensorflow/probability

Fix numpy 1.20 deprecation warnings

Regarding JAX though, they made the same changes a while ago (https://github.com/google/jax/pull/3525).

NeilGirdhar

comment created time in 3 days

pull request commenttensorflow/probability

Fix numpy 1.20 deprecation warnings

@emilyfertig Sure, sounds good.

NeilGirdhar

comment created time in 3 days

Pull request review commenttensorflow/probability

Add experimental `GibbsKernel`

+"""Gibbs sampling kernel"""+import collections+import tensorflow as tf+import tensorflow_probability as tfp+from tensorflow_probability.python.mcmc.internal import util as mcmc_util+from tensorflow_probability.python.internal import unnest+from tensorflow_probability.python.internal import prefer_static++tfd = tfp.distributions  # pylint: disable=no-member+tfb = tfp.bijectors  # pylint: disable=no-member+mcmc = tfp.mcmc  # pylint: disable=no-member+++class GibbsKernelResults(+    mcmc_util.PrettyNamedTupleMixin,+    collections.namedtuple(+        "GibbsKernelResults",+        [+            "target_log_prob",+            "inner_results",+        ],+    ),+):+    __slots__ = ()+++def _flatten_results(results):+    """Results structures from nested Gibbs samplers sometimes+    need flattening for writing out purposes.+    """++    def recurse(r):+        for i in iter(r):+            if isinstance(i, list):+                for j in _flatten_results(i):+                    yield j+            else:+                yield i++    return [r for r in recurse(results)]+++def _has_gradients(results):+    return unnest.has_nested(results, "grads_target_log_prob")+++def _get_target_log_prob(results):+    """Fetches a target log prob from a results structure"""+    return unnest.get_innermost(results, "target_log_prob")+++def _update_target_log_prob(results, target_log_prob):+    """Puts a target log prob into a results structure"""+    if isinstance(results, GibbsKernelResults):+        replace_fn = unnest.replace_outermost+    else:+        replace_fn = unnest.replace_innermost+    return replace_fn(results, target_log_prob=target_log_prob)+++def _maybe_transform_value(tlp, state, kernel, direction):+    if not isinstance(kernel, tfp.mcmc.TransformedTransitionKernel):+        return tlp++    tlp_rank = prefer_static.rank(tlp)+    event_ndims = prefer_static.rank(state) - tlp_rank++    if direction == "forward":+        return tlp + kernel.bijector.inverse_log_det_jacobian(+            state, event_ndims=event_ndims+        )+    if direction == "inverse":+        return tlp - kernel.bijector.inverse_log_det_jacobian(+            state, event_ndims=event_ndims+        )+    raise AttributeError("`direction` must be `forward` or `inverse`")+++class GibbsKernel(mcmc.TransitionKernel):+    """Gibbs Sampling Algorithm.+    Gibbs sampling may be useful when the joint distribution is explicitly unknown+    or difficult to sample from directly, but the conditional distribution for each+    variable is known and can be sampled directly. The Gibbs sampling algorithm+    generates a realisation from each variable's conditional distribution in turn,+    conditional on the current realisations of the other variables.+    The resulting sequence of samples forms a Markov chain whose stationary+    distribution represents the joint distribution.++    In pseudo code the algorithm is:+    ```+      Inputs:+        D number of dimensions (i.e. number of parameters)+        D' = D - 1+        X1, X2, ...,XD', XD random variables+        x1[0], x2[0], ..., xD'[0], xD[0] initial chain state for random variables+        i iteration index+        N total number of steps+        pi(.) denotes probability distribution of argument++      for i = 1,..., N do++        x1[i] ~ pi(X1=x1|X2=x2[i-1], X3=x3[i-1], ..., XD=xD[i-1])+        x2[i] ~ pi(X2=x2|X1=x1[i-1], X3=x3[i-1], ..., XD=xD[i-1])+        ...+        xD[i] ~ pi(XD=xD|X1=x1[i-1], X2=x2[i-1], ..., XD'=xD'[i-1])++        i = i + 1++      end+    ```++    #### Example 1: 2-variate MVN+    ```python+        import numpy as np+        import tensorflow as tf+        import tensorflow_probability as tfp+        from gemlib.mcmc.gibbs_kernel import GibbsKernel++        tfd = tfp.distributions++        dtype = np.float32+        true_mean = dtype([1, 1])+        true_cov = dtype([[1, 0.5], [0.5, 1]])+        target = tfd.MultivariateNormalTriL(+            loc=true_mean,+            scale_tril=tf.linalg.cholesky(true_cov)+        )+++        def log_prob(x1, x2):+            return target.log_prob([x1, x2])+++        def kernel_make_fn(target_log_prob_fn, state):+            return tfp.mcmc.RandomWalkMetropolis(target_log_prob_fn=target_log_prob_fn)+++        @tf.function+        def posterior(iterations, burnin, initial_state):+            kernel_list = [(0, kernel_make_fn),+                           (1, kernel_make_fn)]+            kernel = GibbsKernel(+                target_log_prob_fn=log_prob,+                kernel_list=kernel_list+            )+            return tfp.mcmc.sample_chain(+                num_results=iterations,+                current_state=initial_state,+                kernel=kernel,+                num_burnin_steps=burnin,+                trace_fn=None)+++        samples = posterior(+            iterations=10000,+            burnin=1000,+            initial_state=[dtype(1), dtype(1)])++        tf.print('sample_mean', tf.math.reduce_mean(samples, axis=1))+        tf.print('sample_cov', tfp.stats.covariance(tf.transpose(samples)))++    ```++    #### Example 2: linear model+    ```python+        import numpy as np+        import tensorflow as tf+        import tensorflow_probability as tfp+        from gemlib.mcmc.gibbs_kernel import GibbsKernel++        tfd = tfp.distributions++        dtype = np.float32++        # data+        x = dtype([2.9, 4.2, 8.3, 1.9, 2.6, 1.0, 8.4, 8.6, 7.9, 4.3])+        y = dtype([6.2, 7.8, 8.1, 2.7, 4.8, 2.4, 10.7, 9.0, 9.6, 5.7])+++        # define linear regression model+        def Model(x):+            def alpha():+                return tfd.Normal(loc=dtype(0.), scale=dtype(1000.))++            def beta():+                return tfd.Normal(loc=dtype(0.), scale=dtype(100.))++            def sigma():+                return tfd.Gamma(concentration=dtype(0.1), rate=dtype(0.1))++            def y(alpha, beta, sigma):+                mu = alpha + beta * x+                return tfd.Normal(mu, scale=sigma)++            return tfd.JointDistributionNamed(dict(+                alpha=alpha,+                beta=beta,+                sigma=sigma,+                y=y))+++        # target log probability of linear model+        def log_prob(alpha, beta, sigma):+            lp = model.log_prob({'alpha': alpha,+                                 'beta': beta,+                                 'sigma': sigma,+                                 'y': y})+            return tf.reduce_sum(lp)+++        # random walk Markov chain function+        def kernel_make_fn(target_log_prob_fn, state):+            return tfp.mcmc.RandomWalkMetropolis(target_log_prob_fn=target_log_prob_fn)+++        # posterior distribution MCMC chain+        @tf.function+        def posterior(iterations, burnin, thinning, initial_state):+            kernel_list = [(0, kernel_make_fn), # conditional probability for zeroth parmeter alpha+                           (1, kernel_make_fn), # conditional probability for first parameter beta+                           (2, kernel_make_fn)] # conditional probability for second parameter sigma+            kernel = GibbsKernel(+                target_log_prob_fn=log_prob,+                kernel_list=kernel_list+            )+            return tfp.mcmc.sample_chain(+                num_results=iterations,+                current_state=initial_state,+                kernel=kernel,+                num_burnin_steps=burnin,+                num_steps_between_results=thinning,+                parallel_iterations=1,+                trace_fn=None)+++        # initialize model+        model = Model(x)+        initial_state = [dtype(0.1), dtype(0.1), dtype(0.1)]  # start chain at alpha=0.1, beta=0.1, sigma=0.1++        # estimate posterior distribution+        samples = posterior(+            iterations=10000,+            burnin=1000,+            thinning=0,+            initial_state=initial_state)++        tf.print('alpha samples:', samples[0])+        tf.print('beta  samples:', samples[1])+        tf.print('sigma samples:', samples[2])+        tf.print('sample means: [alpha, beta, sigma] =', tf.math.reduce_mean(samples, axis=1))++    ```++++    """++    def __init__(self, target_log_prob_fn, kernel_list, name=None):+        """Build a Gibbs sampling scheme from component kernels.++        :param target_log_prob_fn: a function that takes `state` arguments+                                   and returns the target log probability+                                   density.+        :param kernel_list: a list of tuples `(state_part_idx, kernel_make_fn)`.+                            `state_part_idx` denotes the index (relative to+                            positional args in `target_log_prob_fn`) of the+                            state the kernel updates.  `kernel_make_fn` takes+                            arguments `target_log_prob_fn` and `state`, returning+                            a `tfp.mcmc.TransitionKernel`.+        :returns: an instance of `GibbsKernel`+        """+        # Require to check if all kernel.is_calibrated is True+        self._parameters = dict(+            target_log_prob_fn=target_log_prob_fn,+            kernel_list=kernel_list,+            name=name,+        )++    @property+    def is_calibrated(self):+        return True

presumably only calibrated as long as each make_kernel_fn is also calibrated?

And yes, we should try to instantiate each kernel by calling each make_kernel_fn and take logical AND of all is_calibrated

chrism0dwk

comment created time in 3 days

Pull request review commenttensorflow/probability

Add experimental `GibbsKernel`

+"""Gibbs sampling kernel"""+import collections+import tensorflow as tf+import tensorflow_probability as tfp+from tensorflow_probability.python.mcmc.internal import util as mcmc_util+from tensorflow_probability.python.internal import unnest+from tensorflow_probability.python.internal import prefer_static++tfd = tfp.distributions  # pylint: disable=no-member+tfb = tfp.bijectors  # pylint: disable=no-member+mcmc = tfp.mcmc  # pylint: disable=no-member+++class GibbsKernelResults(+    mcmc_util.PrettyNamedTupleMixin,+    collections.namedtuple(+        "GibbsKernelResults",+        [+            "target_log_prob",+            "inner_results",+        ],+    ),+):+    __slots__ = ()+++def _flatten_results(results):+    """Results structures from nested Gibbs samplers sometimes+    need flattening for writing out purposes.+    """++    def recurse(r):+        for i in iter(r):+            if isinstance(i, list):+                for j in _flatten_results(i):+                    yield j+            else:+                yield i++    return [r for r in recurse(results)]+++def _has_gradients(results):+    return unnest.has_nested(results, "grads_target_log_prob")+++def _get_target_log_prob(results):+    """Fetches a target log prob from a results structure"""+    return unnest.get_innermost(results, "target_log_prob")+++def _update_target_log_prob(results, target_log_prob):+    """Puts a target log prob into a results structure"""+    if isinstance(results, GibbsKernelResults):+        replace_fn = unnest.replace_outermost+    else:+        replace_fn = unnest.replace_innermost+    return replace_fn(results, target_log_prob=target_log_prob)+++def _maybe_transform_value(tlp, state, kernel, direction):+    if not isinstance(kernel, tfp.mcmc.TransformedTransitionKernel):+        return tlp++    tlp_rank = prefer_static.rank(tlp)+    event_ndims = prefer_static.rank(state) - tlp_rank++    if direction == "forward":+        return tlp + kernel.bijector.inverse_log_det_jacobian(+            state, event_ndims=event_ndims+        )+    if direction == "inverse":+        return tlp - kernel.bijector.inverse_log_det_jacobian(+            state, event_ndims=event_ndims+        )+    raise AttributeError("`direction` must be `forward` or `inverse`")+++class GibbsKernel(mcmc.TransitionKernel):+    """Gibbs Sampling Algorithm.+    Gibbs sampling may be useful when the joint distribution is explicitly unknown+    or difficult to sample from directly, but the conditional distribution for each+    variable is known and can be sampled directly. The Gibbs sampling algorithm+    generates a realisation from each variable's conditional distribution in turn,+    conditional on the current realisations of the other variables.+    The resulting sequence of samples forms a Markov chain whose stationary+    distribution represents the joint distribution.++    In pseudo code the algorithm is:+    ```+      Inputs:+        D number of dimensions (i.e. number of parameters)+        D' = D - 1+        X1, X2, ...,XD', XD random variables+        x1[0], x2[0], ..., xD'[0], xD[0] initial chain state for random variables+        i iteration index+        N total number of steps+        pi(.) denotes probability distribution of argument++      for i = 1,..., N do++        x1[i] ~ pi(X1=x1|X2=x2[i-1], X3=x3[i-1], ..., XD=xD[i-1])+        x2[i] ~ pi(X2=x2|X1=x1[i-1], X3=x3[i-1], ..., XD=xD[i-1])+        ...+        xD[i] ~ pi(XD=xD|X1=x1[i-1], X2=x2[i-1], ..., XD'=xD'[i-1])++        i = i + 1++      end+    ```++    #### Example 1: 2-variate MVN+    ```python+        import numpy as np+        import tensorflow as tf+        import tensorflow_probability as tfp+        from gemlib.mcmc.gibbs_kernel import GibbsKernel++        tfd = tfp.distributions++        dtype = np.float32+        true_mean = dtype([1, 1])+        true_cov = dtype([[1, 0.5], [0.5, 1]])+        target = tfd.MultivariateNormalTriL(+            loc=true_mean,+            scale_tril=tf.linalg.cholesky(true_cov)+        )+++        def log_prob(x1, x2):+            return target.log_prob([x1, x2])+++        def kernel_make_fn(target_log_prob_fn, state):+            return tfp.mcmc.RandomWalkMetropolis(target_log_prob_fn=target_log_prob_fn)+++        @tf.function+        def posterior(iterations, burnin, initial_state):+            kernel_list = [(0, kernel_make_fn),+                           (1, kernel_make_fn)]+            kernel = GibbsKernel(+                target_log_prob_fn=log_prob,+                kernel_list=kernel_list+            )+            return tfp.mcmc.sample_chain(+                num_results=iterations,+                current_state=initial_state,+                kernel=kernel,+                num_burnin_steps=burnin,+                trace_fn=None)+++        samples = posterior(+            iterations=10000,+            burnin=1000,+            initial_state=[dtype(1), dtype(1)])++        tf.print('sample_mean', tf.math.reduce_mean(samples, axis=1))+        tf.print('sample_cov', tfp.stats.covariance(tf.transpose(samples)))++    ```++    #### Example 2: linear model+    ```python+        import numpy as np+        import tensorflow as tf+        import tensorflow_probability as tfp+        from gemlib.mcmc.gibbs_kernel import GibbsKernel++        tfd = tfp.distributions++        dtype = np.float32++        # data+        x = dtype([2.9, 4.2, 8.3, 1.9, 2.6, 1.0, 8.4, 8.6, 7.9, 4.3])+        y = dtype([6.2, 7.8, 8.1, 2.7, 4.8, 2.4, 10.7, 9.0, 9.6, 5.7])+++        # define linear regression model+        def Model(x):+            def alpha():+                return tfd.Normal(loc=dtype(0.), scale=dtype(1000.))++            def beta():+                return tfd.Normal(loc=dtype(0.), scale=dtype(100.))++            def sigma():+                return tfd.Gamma(concentration=dtype(0.1), rate=dtype(0.1))++            def y(alpha, beta, sigma):+                mu = alpha + beta * x+                return tfd.Normal(mu, scale=sigma)++            return tfd.JointDistributionNamed(dict(+                alpha=alpha,+                beta=beta,+                sigma=sigma,+                y=y))+++        # target log probability of linear model+        def log_prob(alpha, beta, sigma):+            lp = model.log_prob({'alpha': alpha,+                                 'beta': beta,+                                 'sigma': sigma,+                                 'y': y})+            return tf.reduce_sum(lp)+++        # random walk Markov chain function+        def kernel_make_fn(target_log_prob_fn, state):+            return tfp.mcmc.RandomWalkMetropolis(target_log_prob_fn=target_log_prob_fn)+++        # posterior distribution MCMC chain+        @tf.function+        def posterior(iterations, burnin, thinning, initial_state):+            kernel_list = [(0, kernel_make_fn), # conditional probability for zeroth parmeter alpha+                           (1, kernel_make_fn), # conditional probability for first parameter beta+                           (2, kernel_make_fn)] # conditional probability for second parameter sigma+            kernel = GibbsKernel(+                target_log_prob_fn=log_prob,+                kernel_list=kernel_list+            )+            return tfp.mcmc.sample_chain(+                num_results=iterations,+                current_state=initial_state,+                kernel=kernel,+                num_burnin_steps=burnin,+                num_steps_between_results=thinning,+                parallel_iterations=1,+                trace_fn=None)+++        # initialize model+        model = Model(x)+        initial_state = [dtype(0.1), dtype(0.1), dtype(0.1)]  # start chain at alpha=0.1, beta=0.1, sigma=0.1++        # estimate posterior distribution+        samples = posterior(+            iterations=10000,+            burnin=1000,+            thinning=0,+            initial_state=initial_state)++        tf.print('alpha samples:', samples[0])+        tf.print('beta  samples:', samples[1])+        tf.print('sigma samples:', samples[2])+        tf.print('sample means: [alpha, beta, sigma] =', tf.math.reduce_mean(samples, axis=1))++    ```++++    """++    def __init__(self, target_log_prob_fn, kernel_list, name=None):+        """Build a Gibbs sampling scheme from component kernels.++        :param target_log_prob_fn: a function that takes `state` arguments+                                   and returns the target log probability+                                   density.+        :param kernel_list: a list of tuples `(state_part_idx, kernel_make_fn)`.+                            `state_part_idx` denotes the index (relative to+                            positional args in `target_log_prob_fn`) of the+                            state the kernel updates.  `kernel_make_fn` takes+                            arguments `target_log_prob_fn` and `state`, returning+                            a `tfp.mcmc.TransitionKernel`.+        :returns: an instance of `GibbsKernel`+        """+        # Require to check if all kernel.is_calibrated is True+        self._parameters = dict(+            target_log_prob_fn=target_log_prob_fn,+            kernel_list=kernel_list,+            name=name,+        )++    @property+    def is_calibrated(self):+        return True

@brianwa84 Just a few preliminary comments on the above:

  1. That's a really good point, we'd thought about that too. I'd vote for the SamplingKernel approach, as it simplifies GibbsKernel and explicitly marks the step out as a full conditional. How about FullConditionalKernel which takes a make_distribution_fn function returning the correct distribution for that step of the MCMC chain?
  2. Fair point. I'll give this some careful thought -- probably along the lines of Fonnesbeck 2013 (which STAN uses a marginalised implementation for, but as far as I understand the original implementation mixed over a discrete space).
  3. (and 6) I guess this one's up to how the TFP semantics evolve. For a JointDistribution*, the state parts map onto stochastic nodes in the respective DAG describing the model, which I think this makes sense for most applications. The exception would be on-the-fly non-centering of models (e.g. Xiao Li Meng's ASIS algorithm or some of the partially-non-centred epidemics work such as Neal and Roberts 2005). What's the direction of travel with the semantics?
  4. Yeah, couldn't see a way around that, for the same reasons as (5) next...
  5. Unwrapping TransformedTransitionKernels. If I've understood your question correctly... Consider the sequence of Gibbs steps for single-parameter updates of alpha, beta, and sigma in the linear model
y = Normal(loc = alpha + beta*x, scale=sigma)

alpha and beta exist on an unconstrained space, whereas sigma is strictly positive and must be transformed to an unconstrained space for effective sampling. The conditional posteriors for all parameters do not factorise into independent terms, so rather than re-calculating the current_target_log_prob at each Gibbs step, it's quicker just to recognise the presence of a transformed kernel and add or subtract the Jacobian term as required.
6. (see 3) 7. GibbsKernelResults.inner_results is of type list (of inner TransitionKernel results namedtuples). At the time when we wrote the code, the unnest recursive descent algorithm wasn't able to iterate over a list structure when it met it, and so threw an exception. Not sure if this has changed? I guess one possibility might be to dynamically construct a namedtuple for GibbsKernelResults which replaces the list with a bunch of fields.

We'll do the easy bits first, and address (2) then (1).

Maybe I could ask the team for their thoughts around how state_parts semantics might play into the future of this approach? It seems a good time to talk about it with the commit of @ColCarroll's turnkey windowed sampling solution (which AFAIU uses restructuring and shape bijectors to concatenate state parts).

chrism0dwk

comment created time in 3 days

issue commenttensorflow/probability

Fitting an arbitrary number of component distributions using tfp.layers.MixtureSameFamily

One approach would be to

  1. Set num_components to a high number, say 10-15, and
  2. Add a penalty term to the loss function to encourage using only as many components as required, e.g. ksum[-p_i^2] or ksum[-p_i log(p_i)] where p_i-s are the probabilities assigned to each component.

The number of components necessary (to fit the given data) can only be defined in terms of a compromise between the goodness-of-fit (higher is better) and the complexity of the model (lower is better). This approach would capture that compromise.

It will require some changes to the code to get the model to output the component probabilities. The first num_components elements of the input received by the MixtureSameFamily layer are the logits --- applying softmax on them would give the component probabilities.

shardie1992

comment created time in 3 days

push eventtensorflow/probability

siege

commit sha 64bbef9fc24393231cb9ae8612694c884a6d765d

Make tfd.HiddenMarkovModel work under jax.jit. PiperOrigin-RevId: 360550263

view details

push time in 4 days

issue commenttensorflow/probability

TypeError when deepcopying models that reference (and have sampled from) LogNormal

This was resolved when we changed bijector caching from instance-level to global (so it's fixed in TFP 0.12 and later).

Awesome, thanks!

elvijs

comment created time in 4 days

push eventtensorflow/probability

bjp

commit sha 866a3f11a459791791a8ad5d59d09b62c4509eaf

Modifies PHMC to support momentum distribution with batch size different from chain shape. PiperOrigin-RevId: 360498534

view details

push time in 4 days