What is a Tensor Train?
A Tensor Train is a type of tensor factorization involving the product of 3-index tensors organized on a one-dimensional chain. In the context of function approximation and probability, a function of $L$ discrete variables is in Tensor Train format if it is written as
\[f(x^1, x^2, \ldots, x^L) = \sum_{a^1,a^2,\ldots,a^{L-1}} [A^1(x^1)]_{a^1}[A^2(x^2)]_{a^1,a^2}\cdots [A^{L-1}(x^{L-1})]_{a^{L-2},a^{L-1}}[A^L(x^L)]_{a^{L-1}}\]
where, for every choice of $x^l$, $A^l(x^l)$ is a real-valued matrix and the matrix sizes must be compatible. The first matrix must have 1 row and the last matrix should have 1 column, such that the product correctly returns a scalar.
The Tensor Train factorization can be used to parametrize probability distributions, which is the main focus of this package. In this case, $f$ should be properly normalized and always return a non-negative value.
Tensor Trains with Periodic Boundary Conditions
A slight generalization, useful to describe systems with periodic boundary conditions is the following:
\[f(x^1, x^2, \ldots, x^L) = \sum_{a^1,a^2,\ldots,a^{L}} [A^1(x^1)]_{a^1,a^2}[A^2(x^2)]_{a^2,a^3}\cdots [A^{L-1}(x^{L-1})]_{a^{L-1},a^{L}}[A^L(x^L)]_{a^{L},a^1}\]
In other words, to evaluate $f$ one takes the trace of the product of matrices.
Notation and terminology
Tensor Trains are the most basic type of Tensor Network. Tensor networks are a large family of tensor factorizations which are often best represented in diagrammatic notation. For this reason, the term bond is used interchangeably as index. The indices $a^1,a^2,\ldots,a^{L-1}$ are usually called the virtual indices, while $x^1, x^2, \ldots, x^L$ are the physical indices.
Tensor Trains are used to parametrize wavefunctions in many-body quantum physics. The resulting quantum state is called Matrix Product State. In such context, the entries are generally complex numbers, and a probability can be obtained for a given state by taking the squared absolute value of the wavefunction.
In this package we focus on the "classical" case where the Tensor Train directly represents a probability distribution $p(x^1, x^2, \ldots, x^L)$.
Efficient computation
Given a Tensor Train some simple recursive strategies can be employed to
Compute the normalization
\[Z = \sum_{x^1, x^2, \ldots, x^L} \sum_{a^1,a^2,\ldots,a^{L-1}} [A^1(x^1)]_{a^1}[A^2(x^2)]_{a^1,a^2}\cdots [A^{L-1}(x^{L-1})]_{a^{L-2},a^{L-1}}[A^L(x^L)]_{a^{L-1}}\]
such that
\[\begin{aligned} 1&=\sum_{x^1, x^2, \ldots, x^L}p(x^1, x^2, \ldots, x^L)\\&=\sum_{x^1, x^2, \ldots, x^L}\frac1Z \sum_{a^1,a^2,\ldots,a^{L-1}} [A^1(x^1)]_{a^1}[A^2(x^2)]_{a^1,a^2}\cdots [A^{L-1}(x^{L-1})]_{a^{L-2},a^{L-1}}[A^L(x^L)]_{a^{L-1}} \end{aligned}\]
Compute marginals
Single-variable
\[p(x^l=x) = \sum_{x^1, x^2, \ldots, x^L} p(x^1, x^2, \ldots, x^L) \delta(x^l,x)\]
and two-variable
\[p(x^l=x, x^m=x') = \sum_{x^1, x^2, \ldots, x^L} p(x^1, x^2, \ldots, x^L) \delta(x^l,x)\delta(x^m,x')\]
Extract exact samples
Via hierarchical sampling
\[p(x^1, x^2, \ldots, x^L) = p(x^1)p(x^2|x^1)p(x^3|x^1,x^2)\cdots p(x^L|x^1,x^2,\ldots,x^{L-1})\]
by first sampling $x^1\sim p(x^1)$, then $x^2\sim p(x^2|x^1)$ and so on.
What can this package do?
This small package provides some utilities for creating, manipulating and evaluating Tensor Trains interpreted as functions, with a focus on the probabilistic side. Each variable $x^l$ is assumed to be multivariate. Whenever performing some probability-related operation, it is responsability of the user to make sure that the Tensor Train always represents a valid probability distribution.
Common operations are:
evaluate
a Tensor Train at a given set of indicesorthogonalize_left!
,orthogonalize_right!
: bring a Tensor Train to left/right orthogonal formcompress!
a Tensor Train using SVD-based truncationsnormalize!
a Tensor Train in the probability sense (not in the $L_2$ norm sense!), see abovesample
from a Tensor Train intended as a probability ditribution, see above+
,-
: take the sum/difference of two TensorTrains
Example
Let's construct and initialize at random a Tensor Train of the form
\[f\left((x^1,y^1), (x^2,y^2), (x^3,y^3)\right) = \sum_{a^1,a^2} [A^1(x^1,y^1)]_{a^1}[A^2(x^2,y^2)]_{a^1,a^2}[A^3(x^3,y^3)]_{a^2}\]
where $x^l\in\{1,2\}, y^l\in\{1,2,3\}$.
using TensorTrains
L = 3 # length
q = (2, 3) # number of values taken by x, y
d = 5 # bond dimension
A = rand_tt(d, L, q...) # construct Tensor Train with random positive entries
xy = [[rand(1:qi) for qi in q] for _ in 1:L] # random set of indices
p = evaluate(A, xy) # evaluate `A` at `xy`
compress!(A; svd_trunc=TruncThresh(1e-8)); # compress `A` to reduce the bond dimension
pnew = evaluate(A, xy)
ε = abs((p-pnew)/p)
References
- https://tensornetwork.org: "an open-source review article focused on tensor network algorithms, applications, and software"
- Oseledets, I.V., 2011. Tensor-train decomposition. SIAM Journal on Scientific Computing, 33(5).