FLOPART: Functional Labeled Optimal Partitioning

https://github.com/tdhock/FLOPART/workflows/R-CMD-check/badge.svg

Functional Labeled Optimal Partitioning (FLOPART) is an optimal peak detection algorithm with label constraints. The dynamic programming algorithm computes a segmentation and corresponding set of peaks which minimizes the penalized Poisson loss (for non-negative integer count data), subject to the following constraints:

Installation

if(require("remotes"))install.packages("remotes")
remotes::install_github("tdhock/FLOPART")

Usage

The main driver function is FLOPART, which takes three arguments * coverage is a data table with columns chromStart, chromEnd, count. * label is a data table with columns chromStart, chromEnd, annotation. * penalty is a non-negative numeric value, larger for fewer changepoints/peaks.

The constraints for the different label types are as follows:

For a simple real data example,

> library(data.table) # for print method.
> data("Mono27ac.simple", package="FLOPART")
> Mono27ac.simple
$coverage
      chrom chromStart chromEnd count
   1: chr11     145000   146765     0
   2: chr11     146765   146807     1
   3: chr11     146807   175254     0
   4: chr11     175254   175296     1
   5: chr11     175296   175738     0
  ---                                
2975: chr11     326980   326981     5
2976: chr11     326981   326983     6
2977: chr11     326983   326985     5
2978: chr11     326985   326992     4
2979: chr11     326992   327000     3

$label
   chrom chromStart chromEnd annotation
1: chr11     180000   200000    noPeaks
2: chr11     208000   220000    peakEnd
3: chr11     300000   308250  peakStart
4: chr11     308260   320000    peakEnd

To run FLOPART on these data,

fit <- with(Mono27ac.simple, FLOPART::FLOPART(coverage, label, penalty=1400))

The fit is a list of results; we can use head to look at the first few items in each of these results.

> lapply(fit, head)
$coverage_dt
   chromStart chromEnd count weight
1:     145000   146765     0   1765
2:     146765   146807     1     42
3:     146807   175254     0  28447
4:     175254   175296     1     42
5:     175296   175738     0    442
6:     175738   175780     1     42

$label_dt
   chromStart chromEnd annotation type firstRow lastRow
1:     180000   200000    noPeaks    0       14     118
2:     208000   220000    peakEnd   -1      725    1322
3:     300000   308250  peakStart    1     2723    2770
4:     308260   320000    peakEnd   -1     2773    2871

$cost_mat
           [,1]       [,2]
[1,] 0.00000000 0.00000000
[2,] 0.11067717 0.11067717
[3,] 0.01052251 0.01052251
[4,] 0.01909784 0.01909784
[5,] 0.01886280 0.01886280
[6,] 0.02660139 0.02660139

$intervals_mat
     [,1] [,2]
[1,]    1    1
[2,]    2    1
[3,]    3    3
[4,]    3    5
[5,]    4    4
[6,]    4    5

$segments_dt
          mean firstRow lastRow state chromStart chromEnd     status
1:  0.06556501        1     197     0     145000   206725 background
2: 13.17743878      198    1132  2986     206725   209216       peak
3:  0.22431609     1133    1423     0     209216   236120 background
4:  7.38781362     1424    1792  2986     236120   237515       peak
5:  0.19459495     1793    2079     0     237515   267598 background
6:  3.68970814     2080    2552  2986     267598   270853       peak

The main output of the algorithm is segments_dt (shown above) which is a data table of optimal segment means, given the penalty and constraints. Each row of this data table describes a segment in the optimal model:

Other outputs which could be of interest:

Related work