We will consider two independent models:
merge trees form a complete invariant of a continuous function, up to reparametrization.
Collection of embeddings $X_s\hookrightarrow X_t, s\lt t$ (here $X_s=\{f\leq s\}$) lifts to a collection of consistent homomorphisms $H_km_{st}:H_k(s)\to H_k(t), k=0,1,\ldots$. These homomorphisms decompose in a natural way (pairing births and deaths of transient cycles).
We will identify persistence diagram with the sum of deltafunctions weighted by the dimension of the bar. Resulting counting measure is a convenient tool to describe properties of the persistence diagram.
However, the contents of the displaced quadrants $\{b\leq b_*, d\geq d_*\}$ (i.e. the numbers of bars straddling the interval $[b_*,d_*]$) are finite for all $b_*\lt d_*$ for the socalled tame filtrations.
Continuous functions on compact metric spaces are tame, with Lipschitz or Holder functions having some tameness guarantees.
Another source of ambiguity (for univariate functions) is the fact that a stem can be attached to its parent on the right or on the left side.
Same bar, different chirality.
Tameness of the this point process follows from the general results that the persistence diagram of any $\alpha$Holder function in dimension $d$ has PHdimension of $d/\alpha$.
In particular, near the diagonal the density explodes as $1/m(db)^3$.
Conditioned on the number $k$ of bars straddling an interval $I$, in a trajectory of $\brm+{mt}, m\gt 0$, the expected excess of F over M is equal to the expected number of records in a random $k$permutation, that is $k$th harmonic sum, \[ \ex(\mathbf{F}\mathbf{M})=H_k=1+1/2+\ldots+1/k. \] For Brownian motion with drift $m$ , the expected excess $\ex(\mathbf{F}\mathbf{M})$ in the bars straddling an interval of length $\Delta$ is \[ \log(1\exp(2m\Delta)). \]
As is well known, stock valuations,  perhaps,  cannot be modeled as exponentials of Brownian motions with a drift. If they were, however, the merge tree distribution of the detrended log trajectories would exhibit time reversal invariance. So, Rachel Levanger (at U Penn at the time) analysed hundreds of the stock traces.
It would be interesting to understand what makes a firm more or less FoverM prone. Many other questions are open, as well...
All periodic phenomena are cyclic, but not vice versa.
The distinction is important. There are
plenty of processes that are manifestly cyclic, but not
periodic, in the sense that no
period \(P\) can be found, such that shift of the process
along the time axis by $P$ leaves the process, or its parameters, invariant.
Examples of cyclic yet not periodic processes include cardiac cycle; muscleskeletal complex movements exercised during a gait; population dynamics in closed ecosystems; business cycles and many others.
Coupled dynamics of moose and wolve populations on Isle Royale
What we observe and measure is however parameterized not by the the internal clocks of the systems we study: the observations are parameterized by the physical time. The internal clocks do not necessarily coincide with the physical time progression, not even after some linear reparametrization.
In general, if the internal clock cannot be represented as a sum of a linear and a periodic function of the time, the cyclic phenomena cannot be made periodic by any reparametrization.
The predominant focus on periodic processes can be explained by the availability of powerful tools exploiting harmonic analysis (Fourier/Cosine transforms, wavelets etc).
These tools are however intrinsically limited the fact that they rely on the structure of the timeline as a homogenous space; without this structure, the natural bases for the decompositions  consisting of characters of the shift group  just do not exist.
From now on, we will assume that the range of the observed processes is a $d$dimensional Euclidean space $V\cong\Real^d$, with coordinates $x_j, j=1,\ldots d$.
Before attacking the specific problem of recovering the cyclic nature of the observed processes, we will address the general question, of reparametrization invariant functionals of trajectories (that is mappings $\curve:I=[0,1]\to V$):
what are the functions of trajectories that do not depend on how one traverses them?
This question is classical, and has an answer in terms of the iterated path integrals, a notion going back to Picard; reintroduced into the modern mathematical practice by K.T. Chen.
Iterated integrals are functions of the trajectories defined recursively. The linear space of iterateg integrals is filtered by their order.
The integrals
of an order $0$ are constants.
The integrals of
order $k$ are given by
\[
a(\curve)=\sum_l \int_0^1 a_l(\curve([0,t]))d\curve_l(t),
\]
where all functions $a_l$ are iterated integrals of order $\lt k$.
Equivalently, the space of iterated path integrals (for smooth trajectories) is spanned by \[ a(\curve):=\int\cdots\int_{0\lt t_1\lt\ldots\lt t_k\lt 1} \dot{\curve}_{l_{1}}(t_1)\cdots \dot{\curve}_{l_k}(t_k)dt_1\ldots dt_k. \]
Iterated integrals are reparametrization invariant.
They are also unaffected by
detours.
(We say that a path $\curve$ has a detour if it
can be represented as
\[
\curve_s\curve_d\curve_d^{1}\curve_f,
\]
a piece of trajectory traveled back and forth.)
Two $\Real^d$valued trajectories (without detours) have same iterated integrals of all orders, if and only if the trajectories coincide up to a reparametrization and shift.
This suggests that any exploratory analysis of parametrizationindependent features of time series can rely on these functionals as a rich enough tool.
Over the past few years, the idea to use iterated integrals as a tool in data analysis has been promoted by T. Lyons and his students..
The first order iterated integrals just measure the total
increments.
The second order integrals turned out to be
very helpful in the problem of recovery of cyclic
phenomena.
Recall that second order iterated integrals measure the algebraic areas encompassed by the trajectory.
The space of iterated integral of the second order is spanned by \[ A_{kl}:=\frac{1}{2}\int \curve_l d \curve_k  \curve_k d \curve_l. \] Clearly, these integrals are equal to the oriented areas of the twodimensional projections of the trajectory $\curve$ on the coordinate planes (with starting and ending points connected by a straight segment).
One of the patterns we aim to capture is the leaderfollower relationship.
Plotting the functions shown on the left against each other produces a curve encircling a positive oriented area (on the right).
The key observation is the following: if one effect precedes another, the oriented area on the parametric joint plot of the corresponding functions will surround a positive algebraic area.
Of course, the semantics of this assumes that $k$ and $l$ are consubstantial; if their nature is antagonistic, the leadfollow relation flips.
We arrange the iterated integrals \[ A_{kl}:=\frac{1}{2}\int_{t} \curve_l d \curve_k  \curve_k d \curve_l \] into a skewsymmetric matrix $A$.
The skewsymmetric matrix $$ A=(A_{kl})_{1\leq k,l\leq d} $$ is called the lead matrix.
An interpretation of the lead matrix coefficients is that an element $A_{kl}$ is positive if $l$ follows $k$.
Specializing, we concentrate on a class of cyclic phenomena described by the chain of offsets model.
Under this model, the observed variables are (noisy and perhaps rescaled versions of) the same function on the circle $\circle$: \[ \curve_l(\tau)=c_l \phi(\tau  \alpha_l), l=1,\ldots,d, \] where $\phi$ is the archetypal behavior, $c_l\gt 0$ some scaling parameters, and $\alpha_l$ are some phases.
Informally, under the chain of offsets model, the observables are retracing the same behavior, with some delays.
Important to emphasize that even assuming circle as the state space model, we rely on explicitly reparametrization invariant methods, making the approach stable with respect to arbitrary changes in time variable.
The central question we will address now is the recovery the cyclic order of the functions $\curve_l, l=1,\ldots,d$, (or, equivalently, the cyclic order of $\alpha_l, l=1,\ldots,d$) from the (perhaps, noisy version) of the trajectory, $\curve+\noise$, the sampled lead matrix.
Expand the primary function $\phi$ (defined on circle $\circle$ parameterized by the internal clock $\icl$) into the Fourier series, \begin{equation} \label{eq:phi} \phi(\tau)=\sum_k c_k \exp(2\pi i k \tau). \end{equation}
Then it is immediate that
The lead matrix over a period $A$ is given by $$ A=\left(\psi(\alpha_k\alpha_l)\right)_{kl} $$ where \[ \psi(t)=2\pi\sum_m mc_m^2\sin(mt). \]
To be able to use the lead matrix in applications, one needs some statistical guarantees of its recoverability from noisy observation, \[ \curve^\noise(\tau)=\curve(\tau)+\noise(\tau). \]
A naive procedure of taking the sampled lead matrix with coefficients \[ \left(A(\curve^\noise_{a,b})\right)_{kl}:=\frac12\int_a^b \curve^\noise_k(t) d\curve^\noise_l(t)\curve^\noise_l(t)d \curve^\noise_k(t) \] and averaging over time might easily fail.
The reason is, essentially, the linear growth of the sample algebraic areas for many models of noise (such as Brownian loops):
Levy algebraic area formula \[ p(a)=\frac{\pi}{2T}\cosh^{2}(a/T) \] shows that the algebraic area for Brownian loops grows linearly in time, just as the algebraic areas of a periodic process does.
Still, a modification of the procedure can overcome this difficulty.
If the noise process is time reversible (sufficient, not necessary) and its algebraic area process is ergodic enough (say, has exponential decay of correlations), then partitioned sampled lead matrix converges to its noiseless version.
Let $l_1, l_2, l_3,\ldots$ a sequence of the interval lengths, such that $l_k\to\infty; l_k/L_k=0$, where $L_k=\sum_{j\leq k} l_j$. Define \[ \bar{A}^\noise(k)=\frac{1}{L_k}\sum_{j\leq k} A(\curve^\noise_{L_{j1},L_j}). \] Then the normalized sampled lead matrices converge to the normalized lead matrix over one period, \[ \bar{A}^\noise(k)/\parallel \bar{A}^\noise(k)\parallel \to A^p/\parallel A^p\parallel \]
This law of large numbers proves the consistency of the partitioned sample lead matrices estimator in our model. We will be using therefore the lead matrices derived from the observed traces as an approximation to actual ones.
A natural simplifying assumption, not implausible in applications, is that the underlying signal is dominated by the leading coefficient: $c_1^2$ is much larger than $\sum_{k\geq 2} c_k^2$ (we ignore the constant term, as it is irrelevant to the cyclic behavior).
If $\phi$ is just a single harmonic, then the skew symmetric matrix $A$ would have rank two: spanned by the two rank one matrices, $A^\pm$, with $$ A^+_{kl}=a_ka_l \sin(\alpha_k)\cos(\alpha_l) $$ and $$ A^_{kl}=a_ka_l \cos(\alpha_k)\sin(\alpha_l). $$
Even if the function $\phi$ is not harmonic, and
the observations are noisy, one still can expect, that if the
sampled lead matrix approximates the one period lead matrix
well, and the leading harmonic is dominating $\phi$, then
$\bar{A}$ is well approximated (in Frobenius norm) by the
rank 2 matrix $P\bar{A}P$.
(Here $P$ is the projection
on the 2plane spanned by the eigenvectors corresponding to
the largest in absolute value purely imaginary eigenvalues.)
One can prove that the components of the nontrivial eigenvectors of this rank 2 matrix are given by a (real linear) transformation of \[ v_1=e^{i\alpha}\left(e^{2\pi i\alpha_1},e^{2\pi i\alpha_2},\ldots,e^{2\pi i\alpha_d}\right), v_2=\overline{v}_1 \] (for some phase $\alpha$).
In other words, the arguments of the components of the leading eigenvectors of the lead matrix reconstruct the cyclic order of the components.
The spectrum of the sampled lead matrix by itself can serve as an indicator of the explanatory power of the cyclicity algorithm.
In general, the spectrum of the skewsymmetric lead matrix consists of purely imaginary pairs $\pm i\lambda_j$, and zeros. We assume that the absolute values of the spectral pairs are ordered, $$ \lambda_1=\lambda_2\geq \lambda_3\ldots $$ The ratio $\lambda_1/\lambda_3$ indicates how well the lead matrix can be described by the noiseless harmonic model (for which the ratio is $+\infty$).
We combined the ideas outlined above into a pipe, resulting in a software tool for recovering the cyclic structure among the observed time series, under assumption of the chain of offsets model.
The workflow consists of
Some examples follow.
We generated $d=12$ traces of harmonic functions $\sin(t\alpha_k)$ with random phase offsets (cyclically ordered), and sampled them, at $625$ samples per period for $16$ periods. We can see the banded, nearly cyclic lead matrix, the spectrum having just two nontrivial (complex conjugate) eigenvalues, and the components of the leading eigenvector close to the circle of radius $1/\sqrt{d}$.
Next, consider a similar example, adding to each of the samples an iid normally distributed value with SNR $\sigma\approx1.42$
The recovery of the cyclic ordering is almost perfect.
The US economy goes through highly aperiodic cycles of business activity. At the aggregate level, the key indicator is the GDP growth, but many correlates of the business activity exist, most notably, the unemployment data, bonds returns etc.
The data are notoriously noisy.
Investors long sought to exploit the return dependences of the stock returns on the stages of the business cycle.
We considered the sectorial indices (as reported by Standard and Poor) for the daily closings, for the past ten years.
These are the aggregates corresponding to the subsets of stock values tracked by S&P as a part of their industrial portfolio.
It is "well known" that these industrial sectors exhibit some kind of cyclic behavior (some showing better returns at the start of a business cycle; others  towards its end).
Here are the standard definition of the industries:


Assuming the chain of offset model, we analyzed
10 time series of the industrial sectors. The data were downloaded from
The resulting (cyclic) order, shown below, is in a close agreement with the financial "folklore knowledge":
Cyclicity  Fidelity 

FN  FN 
IN  CY 
CY  IN 
UT  TC 
EN  BM 
NC  NC 
BM  EN 
HC  HC 
TL  TL 
TC  UT 
We used the fMRI data supplied by the Human Connectome Project, a large consortium of research centers, based at the Washington University in St. Louis, aiming at aggregating high quality annotated data. In this study, we pulled 200 fMRI traces of resting subjects.
The subjects were scanned using a customized Siemens MAGNETOM Skyra 3T at Washington University in St. Louis Missouri using the following parameters
For each of the subjects, the data were then aggregated into 33 voxel groups, the ROIs, corresponding to anatomically identified regions of the brain, often with welldefined functions associated.
For each of the subjects, our cyclicity analysis was run, yielding a collection of lead matrices ($33\times 33$ skewsymmetric matrices); the implied cyclic permutations and and vectors of the amplitudes of the components of the leading eigenvalues, serving as a proxy for the signal strength.
Predictive power of the leading eigenvector in describing the cyclic structure of the ROIs in the time series can be deduced, heuristically, from the ratio of the absolute value of the leading eigenvalue over the larges of the absolute values of the remaining components: if the reordered absolute values are $\lambda_1=\lambda_2>\lambda_3=\lambda_4>\ldots$, then the quality of the ordering defined by the components of $u_1$ is $\lambda_1/\lambda_3$.
Two representative sets of results (out of 200) are shown here. A subject with a high $\lambda_1/\lambda_3$ ratio and, correspondingly, with strongly banded lead matrix after reordering the ROIs according to their phases in the leading eigenvector.
The results of another subject, shown below, are far less clear, with relatively low $\lambda_1/\lambda_3$ ratio and essentially absent banded structure of the lead matrix in the reordered basis.
Still, the majority of subjects showed a large $\lambda_1/\lambda_3$ ratio.
Our goal in this study was to understand whether there exist some common cyclic process, a consciousness pattern, that can be detected in a large number of the subjects.
We identified a group of ROIs consistently exhibiting high signal. These results were then used to generate the group of 10 most pronounced ROIs:
A striking pattern of the data is the total dominance of the left primary auditory cortex in as compared to the right primary auditory cortex which hardly shows up at all. Other symmetric regions appear in pairs so this contrast is indicative of strong asymmetry either in the functionality (implausible) or in stimuli.
The lead matrix provides an insight into the temporal, cyclic behavior of the interrelated time series. The patterns of cyclic behavior that are observed universally, or which are prevalent in a population are of especial interest. Again, the left primary auditory cortex is a dominant ROI, represented in most of the frequent triples.
These results strongly suggest the triggering of what is assumed to be default network manifestations by an external stimulus.