# Queries

In this section, we go over most common probabilistic reasoning tasks, and provide code snippets to compute those queries.

### Setup

First, we load some pretrained PC, and the corresponding data.

```
pc = zoo_psdd("plants.psdd")
data, _, _ = twenty_datasets("plants");
println("circuit with $(num_nodes(pc)) nodes and $(num_parameters(pc)) parameters.")
println("dataset with $(num_features(data)) features and $(num_examples(data)) examples.")
```

circuit with 153021 nodes and 91380 parameters. dataset with 69 features and 17412 examples.

## Full Evidence (EVI)

EVI refers to computing the probability when full evidence is given, i.e. when $x$ is fully observed, the output is $p(x)$. We can use `EVI`

method to compute $\log{p(x)}$:

```
probs = EVI(pc, data);
probs[1:3]
```

3-element Vector{Float64}: -7.533388371782998 -16.534175321651873 -27.80940132993066

Computing the `EVI`

of a mixture of circuits works the same way. You may either pass weights for the weighted mixture probability, or pass a component index to individually evaluate each component.

```
mix, mix_weights, _ = learn_strudel(data; num_mix = 10, init_maxiter = 10, em_maxiter = 100)
# This computes the weighted probability
probs = EVI(mix, data, mix_weights);
# Alternatively, we may want to compute the probability of a single component
c_prob = EVI(mix, data; component_idx = 1);
c_prob[1:3]
```

3-element Vector{Float64}: -16.41705785046347 -68.71509574089467 -38.71104227796572

## Partial Evidence (MAR)

In this case we have some missing values. Let $x^o$ denote the observed features, and $x^m$ the missing features. We would like to compute $p(x^o)$ which is defined as $p(x^o) = \sum_{x^m} p(x^o, x^m)$. Of course, computing this directly by summing over all possible ways to fill the missing values is not tractable.

The good news is that given a **smooth** and **decomposable** PC, the marginal can be computed exactly and in linear time to the size of the PC.

First, we randomly make some features go `missing`

(you can also use `make_missing_mcar`

from LogicCircuits library):

```
using DataFrames
using Tables
function make_missing(d::DataFrame; keep_prob=0.8)
m = missings(Bool, num_examples(d), num_features(d))
flag = rand(num_examples(d), num_features(d)) .<= keep_prob
m[flag] .= Tables.matrix(d)[flag]
DataFrame(m, :auto)
end;
data_miss = make_missing(data[1:1000,:]);
```

Now, we can use `MAR`

to compute the marginal queries:

```
probs = MAR(pc, data_miss);
probs[1:3]
```

3-element Vector{Float32}: -7.53333 -13.310469 -20.256458

Note that `MAR`

can also be used to compute probabilisties even if all data is observed, in fact it should give the same results as `EVI`

. However, if we know all features are observed, we suggest using EVI as its faster in general.

```
probs_mar = MAR(pc, data);
probs_evi = EVI(pc, data);
probs_mar ≈ probs_evi
```

true

Just like `EVI`

, `MAR`

works the same way for mixtures.

```
# Full weighted marginal probability
probs_mar = MAR(mix, data, mix_weights);
# Individual component's marginal probability
c_probs_mar = MAR(mix, data; component_idx = 1);
c_probs_mar[1:3]
```

3-element Vector{Float32}: -16.417055 -68.715096 -38.711044

## Conditionals (CON)

In this case, given observed features $x^o$, we would like to compute $p(Q \mid x^o)$, where $Q$ is a subset of features disjoint with $x^o$. We can use Bayes rule to compute conditionals as two seperate MAR queries as follows:

Currently, this has to be done manually by the user. We plan to add a simple API for this case in the future.

## Maximum a posteriori (MAP, MPE)

In this case, given the observed features $x^o$ the goal is to fill out the missing features in a way that $p(x^m, x^o)$ is maximized.

We can use the `MAP`

method to compute MAP, which outputs the states that maximize the probability and the log-likelihoods of each state.

```
data_miss = make_missing(data,keep_prob=0.5);
states, probs = MAP(pc, data_miss);
probs[1:3]
```

3-element Vector{Float32}: -6.598743 -15.989449 -21.956953

## Sampling

We can also sample from the distrubtion $p(x)$ defined by a Probabilistic Circuit. You can use `sample`

to achieve this task.

```
samples, _ = sample(pc, 100);
size(samples)
```

(100, 69)

Additionally, we can do conditional samples $x \sim p(x \mid x^o)$, where $x^o$ are the observed features ($x^o \subseteq x$), and could be any arbitrary subset of features.

```
#3 random evidences for the examples
evidence = DataFrame(rand( (missing,true,false), (2, num_variables(pc))), :auto)
samples, _ = sample(pc, 3, evidence);
size(samples)
```

(3, 2, 69)

## Expected Prediction (EXP)

Expected Prediction (EXP) is the task of taking expectation of a discrimintative model w.r.t a generative model conditioned on evidemce (subset of features observed).

$\mathbb{E}_{x^m \sim p(x^m \mid x^o)} [ f(x^o x^m) ]$

In the case where $f$ and $p$ are circuit, and some structural constrains for the pair, we can do this expectation and higher moments tractably. You can use `Expectation`

and `Moment`

to compute the expectations.

```
using DiscriminativeCircuits
using DataFrames
pc = zoo_psdd("insurance.psdd")
rc = zoo_dc("insurance.circuit")
# Using samples from circuit for the example; replace with real data
data, _ = sample(pc, 10);
data = make_missing(DataFrame(data, :auto));
exps, exp_cache = Expectation(pc, rc, data)
exps[1:3]
```

3-element Vector{Float32}: 12886.046 13234.583 10953.938

```
second_moments, moment_cache = Moment(pc, rc, data, 2);
exps[1:3]
```

3-element Vector{Float32}: 12886.046 13234.583 10953.938

```
stds = sqrt.( second_moments - exps.^2 );
stds[1:3]
```

3-element Vector{Float32}: 278.6539 1155.2455 3938.1763