[ascl:2105.006]
The Sequencer: Detect one-dimensional sequences in complex datasets
The Sequencer reveals the main sequence in a dataset if one exists. To do so, it reorders objects within a set to produce the most elongated manifold describing their similarities which are measured in a multi-scale manner and using a collection of metrics. To be generic, it combines information from four different metrics: the Euclidean Distance, the Kullback-Leibler Divergence, the Monge-Wasserstein or Earth Mover Distance, and the Energy Distance. It considers different scales of the data by dividing each object in the input data into separate parts (chunks), and estimating pair-wise similarities between the chunks. It then aggregates the information in each of the chunks into a single estimator for each metric+scale.
[ascl:1705.015]
WeirdestGalaxies: Outlier Detection Algorithm on Galaxy Spectra
WeirdestGalaxies finds the weirdest galaxies in the Sloan Digital Sky Survey (SDSS) by using a basic outlier detection algorithm. It uses an unsupervised Random Forest (RF) algorithm to assign a similarity measure (or distance) between every pair of galaxy spectra in the SDSS. It then uses the distance matrix to find the galaxies that have the largest distance, on average, from the rest of the galaxies in the sample, and defined them as outliers.