Introduction to rosette simulation

Example

Again, we use the oct1 dataset as our example.

Read Processed Rosace Object

data("oct1_rosace")
key <- "1SM73"
type <- "growth"

Create Naive Score

Rosette will learn the distributional properties of variant scores from score estimates. The score estimates can be naive (e.g. simple linear regression) or more complicated (e.g. rosace)

oct1_rosace <- runSLR(oct1_rosace, name = "1SM73_2", type = "Assay")

Create Rosette Object

rosette <- CreateRosetteObject(object = oct1_rosace,
                               score.name = "1SM73_2_SLR",
                               pos.col = "position", mut.col = "mutation",
                               ctrl.col = "type", ctrl.name = "synonymous",
                               project.name = "1SM73_2_SLR")

Generate Summary Statistics

Dispersion

Two dispersion parameters, dispersion of the sequencing count and dispersion of the variant library, are calculated from raw count. The former measures how much variability in variant representation there is before and during sequencing, and the later measures how much variability in variant representation there is before the cell selection. The dispersion parameters are automatically inferred when “CreateRosetteObject” is called.

rosette@disp

## [1] 10.31088

rosette@disp.start

## [1] 8.114383

Mutant Group Label

To account for similar functional effects among mutants (substitutions, insertions, or deletions of amino acids), we categorized them into mutant groups using hierarchical clustering.

hclust <- HclustMutant(rosette, save.plot = FALSE)
rosette <- GenMutLabel(rosette, hclust = hclust, Gmut = 4, save.plot = FALSE)

Variant Group Label

Within each mutant group, the variants can have either neutral, loss of function, or gain of function effect. We therefore categorize the variants into three groups and estimate the score distribution parameters for each group.

PlotScoreHist(rosette, var.group = FALSE, mut.group = FALSE)

rosette <- GenVarLabel(rosette, Gvar = 2)
PlotScoreHist(rosette, var.group = TRUE, mut.group = TRUE)

Weight of variant group within mutant group

Then, infer the distribution for the number of variants within each variant group and mutant group at each position. ‘pos.missing’ specifies the percentage of missing variants allowed at each position.

rosette <- PMVCountDist(rosette, pos.missing = 0.2)

Create config for simulation

Next, create a config file with other user defined properties for simulation, such as the number of rounds and replicates, experimental type (growth or binding), wild-type effect (binding) or doubling rate (growth), sequencing depth, shrinkage factor for library or sequencing dispersion, and simulation mode (clean or with replication error).

cfg <- CreateConfig(rosette,
                    n.sim = 2, save.sim = "../tests/results/sim", type.sim = "growth",
                    n.rep = 3, n.round = 3,
                    null.var.group = 'var1', wt.effect = -2,
                    seq.shrink = 1.2, seq.depth = 100,
                    lib.shrink = 2,
                    var.shrink = 1, pos.flag = TRUE,
                    mode.sim = "clean")

Run simulation

Finally, run the simulation with desired output format.

runRosette(config = cfg, save.tsv = TRUE, save.rosace = TRUE, save.enrich2 = TRUE)