sigshared contains general utilities used in many other R packages in the sigverse
Unless you’re developing / maintaining a sigverse-compatible package, there is probably very little reason to ever download this package explicitly. We recommend just installing the full sigverse as described here
You can install the development version of sigshared from GitHub with:
# install.packages("devtools")
devtools::install_github("selkamand/sigshared")| Data Structure | Requirements |
|---|---|
| Signature | The profile of a mutational signature data.frames with 3 columns
|
| Signature Collections | Lists of signature data.frames, where name of list entry is the name of the signature |
| Signature Annotations | Signature level annotations. data.frames with 4 required columns:
class and subclass of aetiology do not need to conform to any specific ontology. However, we include the data-dictionary used by sigstash collections below (see Signature Aetiology Classes) |
| Catalogue | The mutational profile of a sample, described by tallying mutations belonging to each mutational channel. Catalogues are not always observational. They can also be simulated from signature models. data.frames with 4 required columns:
|
| Catalogue Collections | Lists of catalogue data.frames (1 per sample) where name represents a sample identifier. |
| Cohort Signature Analysis Results | data.frame with 4 columns:
3 . contribution_absolute
u te_experimental_p_value()`) |
| Bootstraps | data.frame with 1 row per signature per bootstrap
|
| Model Specification | Named numeric vector where names represent signatures and values represent their proportional contribution to the model. |
| Cohort Metadata | Data frames describing sample-level metadata with required columns:
Can include additional columns with other metadata. |
| UMAP | Data frames representing UMAP coordinates with at least 3 columns:
|
| Similarity Against Cohort | data.frame that describes how similar a sample catalogue is to others in the cohort. Contains 2 columns:
|
You can assert an object belongs to each of data types
library(sigshared)
# Generate Example Datatypes
# Signatures
signature = example_signature()
signature_collection = example_signature_collection()
signature_annotations = example_annotations()
# Catalogues
catalogue = example_catalogue()
catalogue_collection = example_catalogue_collection()
# Cohort Analysis Results
cohort_analysis = example_cohort_analysis()
# Model
model = example_model()
# Assert Signatures
assert_signature(signature)
assert_signature_collection(signature_collection)
assert_signature_annotations(signature_annotations)
# Assert catalogues
assert_catalogue(catalogue)
assert_catalogue_collection(catalogue_collection)
# Assert Analyses
assert_cohort_analysis(cohort_analysis)
# Assert Model
assert_model(model, signature_collection,arg_name = "bob")library(knitr)
kable(sig_aetiology_classes())| class | subclass |
|---|---|
| artefact | 8-oxo-guanine |
| artefact | sequencing_artefact |
| artefact | germline_contamination |
| artefact | oversegmentation |
| clock-like | clock-like |
| dysfunctional_dna_repair | MMR |
| dysfunctional_dna_repair | HR |
| dysfunctional_dna_repair | NER |
| dysfunctional_dna_repair | BER |
| dysfunctional_dna_repair | NHEJ |
| dysfunctional_dna_replication | proofreading |
| dysfunctional_dna_replication | polymerase_mutations |
| treatment_associated | chemotherapy_platinum |
| treatment_associated | chemotherapy_thiopurine |
| treatment_associated | chemotherapy_pyrimidine_antagonists |
| treatment_associated | chemotherapy_unknown |
| treatment_associated | chemotherapy_nitrogen_mustards |
| treatment_associated | triazenes |
| treatment_associated | immunosuppression |
| environmental_mutagens | tobacco |
| environmental_mutagens | haloalkanes |
| environmental_mutagens | UV |
| environmental_mutagens | aristolochic_acid |
| plants_and_microbes | aflatoxin |
| plants_and_microbes | colibactin |
| adenosine_deamination | adenosine_deaminases |
| cytosine_deamination | cytidine_deaminases |
| cytosine_deamination | cytosine_deamination |
| immune | ROS |
| dysfunctional_epigenetics | topology |
| chromosomal | chromosomal_losses |
| chromosomal | chromosomal_instability |
| chromosomal | chromothripsis |
| ploidy | diploid |
| ploidy | tetraploid |
| unknown | unknown |
-
catalogue
A single mutational catalogue for a sample. Adata.framewith columns:channel,type,fraction,count. Catalogues may be empirical (observed) or simulated.
See:sigshared::example_catalogue() -
signature
A mutational signature profile. Adata.framewith columns:type,channel,fraction.
See:sigshared::example_signature() -
signatures
A collection of signatures. A named list ofsignaturedata.frames.
See:sigshared::example_signature_collection() -
catalogues
A collection of catalogues. A named list ofcataloguedata.frames.
See:sigshared::example_catalogue_collection() -
model
A named numeric vector describing a signature mixture. Names are signature IDs, values are their proportional contributions (e.g.,c(SBS1 = 0.6, SBS5 = 0.4)).
See:sigshared::example_model() -
cohort
A data.frame describing signature contributions per sample. Columns:sample,signature,contribution_absolute,contribution.
See:sigshared::example_cohort() -
cohort_metadata
Sample-level metadata as a data.frame. Must include columns:sample,disease. Can include others.
See:sigshared::example_metadata() -
similarity_against_cohort
A data.frame summarizing pairwise similarity between a sample and all others in the cohort. Columns:sample,cosine_similarity.
See:sigshared::example_similarity_against_cohort() -
umap
A 2D UMAP projection of catalogue similarities. A data.frame with columns:sample,dim1,dim2.
See:sigshared::example_umap()
For each data structure used by the sigverse we include a toy example.
example_annotations()
#> signature aetiology class
#> 1 sig1 A clock like signature clock-like
#> 2 sig2 An AID/APOBEC related signature cytidine deaminases
#> subclass
#> 1 clock-like
#> 2 cytidine deaminases
example_bootstraps()
#> bootstrap signature contribution_absolute contribution
#> 1 1 Signature1 300 0.30
#> 2 1 Signature2 690 0.69
#> 3 1 Signature3 10 0.01
#> 4 2 Signature1 440 0.44
#> 5 2 Signature2 500 0.50
#> 6 2 Signature3 60 0.06
example_catalogue()
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444
example_catalogue_collection()
#> $catalogue1
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444
#>
#> $catalogue2
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444
#>
#> $catalogue3
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444
example_signature()
#> channel type fraction
#> 1 A[T>C]G T>C 0.4
#> 2 A[T>C]C T>C 0.1
#> 3 A[T>C]T T>C 0.5
example_signature_collection()
#> $sig1
#> channel type fraction
#> 1 A[T>C]G T>C 0.4
#> 2 A[T>C]C T>C 0.1
#> 3 A[T>C]T T>C 0.5
#>
#> $sig2
#> channel type fraction
#> 1 A[T>C]G T>C 0.4
#> 2 A[T>C]C T>C 0.1
#> 3 A[T>C]T T>C 0.5
example_model()
#> sig1 sig2
#> 0.3 0.7
example_cohort_analysis()
#> sample signature contribution_absolute contribution p_value
#> 1 sample1 sig1 3 0.3 0.05
#> 2 sample1 sig2 7 0.7 0.10
#> 3 sample2 sig1 40 0.4 0.20
#> 4 sample2 sig2 60 0.6 0.15
example_similarity_against_cohort()
#> sample cosine_similarity
#> 1 sample1 0.95
#> 2 sample2 0.89
#> 3 sample3 0.78
#> 4 sample4 0.85
#> 5 sample5 0.92
#> 6 sample6 0.88
#> 7 sample7 0.90
#> 8 sample8 0.86
#> 9 sample9 0.80
#> 10 sample10 0.84
example_cohort_metadata()
#> sample disease
#> 1 sample1 Melanoma
#> 2 sample2 Melanoma
#> 3 sample3 Melanoma
#> 4 sample4 Melanoma
#> 5 sample5 Melanoma
#> 6 sample6 Lung Cancer
#> 7 sample7 Lung Cancer
#> 8 sample8 Lung Cancer
#> 9 sample9 Lung Cancer
#> 10 sample10 Lung Cancer
example_umap()
#> sample dim1 dim2
#> 1 sample1 0.5 -0.4
#> 2 sample2 1.2 0.6
#> 3 sample3 -0.7 1.3
#> 4 sample4 2.3 -0.8
#> 5 sample5 -1.5 2.0
#> 6 sample6 0.0 -1.2
#> 7 sample7 1.8 0.5
#> 8 sample8 -0.3 1.1
#> 9 sample9 0.9 -0.9
#> 10 sample10 -1.1 0.0We also include examples from real SBS mutational signature analysis of the colo829 melanoma cell line
head(example_catalogue_colo829())
#> channel type fraction count
#> 1 A[C>A]A C>A 0.0035802073 134
#> 2 A[C>A]C C>A 0.0016565138 62
#> 3 A[C>A]G C>A 0.0004542054 17
#> 4 A[C>A]T C>A 0.0015229240 57
#> 5 A[C>G]A C>G 0.0016565138 62
#> 6 A[C>G]C C>G 0.0011488725 43
head(example_bootstraps_colo829())
#> signature bootstrap contribution_absolute contribution
#> 1 SBS1 Rep_1 0.0000 0.000000000
#> 2 SBS2 Rep_1 271.9152 0.007265021
#> 3 SBS3 Rep_1 0.0000 0.000000000
#> 4 SBS4 Rep_1 0.0000 0.000000000
#> 5 SBS5 Rep_1 0.0000 0.000000000
#> 6 SBS6 Rep_1 0.0000 0.000000000# List of signatures -> matrix (rows = channels; columns = signatures; values = fractions)
sig_collection_reformat_list_to_matrix(example_signature_collection())
#> sig1 sig2
#> A[T>C]G 0.4 0.4
#> A[T>C]C 0.1 0.1
#> A[T>C]T 0.5 0.5
#> attr(,"type")
#> [1] "T>C" "T>C" "T>C"
# Matrix -> List of signatures
sig_collection_reformat_matrix_to_list(example_signature_collection_matrix(),
values = "fraction")
#> $sig1
#> channel type fraction
#> 1 A[T>C]G T>C 0.4
#> 2 A[T>C]C T>C 0.1
#> 3 A[T>C]T T>C 0.5
#>
#> $sig2
#> channel type fraction
#> 1 A[T>C]G T>C 0.4
#> 2 A[T>C]C T>C 0.1
#> 3 A[T>C]T T>C 0.5
# List of signatures -> tidy data.frame
sig_collection_reformat_list_to_tidy(example_signature_collection())
#> signature type channel fraction
#> 1 sig1 T>C A[T>C]G 0.4
#> 2 sig1 T>C A[T>C]C 0.1
#> 3 sig1 T>C A[T>C]T 0.5
#> 4 sig2 T>C A[T>C]G 0.4
#> 5 sig2 T>C A[T>C]C 0.1
#> 6 sig2 T>C A[T>C]T 0.5
# Tidy data.frame -> list of signatures
sig_collection_reformat_tidy_to_list(example_signature_collection_tidy())
#> $sig1
#> type channel fraction
#> 1 T>C A[T>C]G 0.4
#> 2 T>C A[T>C]C 0.1
#> 3 T>C A[T>C]T 0.5
#>
#> $sig2
#> type channel fraction
#> 4 T>C A[T>C]G 0.4
#> 5 T>C A[T>C]C 0.1
#> 6 T>C A[T>C]T 0.5
# All the above methods work with catalogues
sig_collection_reformat_list_to_matrix(example_catalogue_collection(), values = "count")
#> catalogue1 catalogue2 catalogue3
#> A[T>C]G 5 5 5
#> A[T>C]C 10 10 10
#> A[T>C]T 12 12 12
#> attr(,"type")
#> [1] "T>C" "T>C" "T>C"
sig_collection_reformat_matrix_to_list(example_catalogue_collection_matrix(),
values = "count")
#> $catalogue1
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444
#>
#> $catalogue2
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444
#>
#> $catalogue3
#> channel type count fraction
#> 1 A[T>C]G T>C 5 0.1851852
#> 2 A[T>C]C T>C 10 0.3703704
#> 3 A[T>C]T T>C 12 0.4444444sigverse dependencies are kept minimal. To mitigate the cost to readability, sigshared includes some baseR implementations of common data.frame manipulation functions and other utilities.
# Setup data
mtcars <- mtcars[1:5,1:5]
# Rename a dataframe
brename(mtcars, c(miles_per_gallon = "mpg"))
#> miles_per_gallon cyl disp hp drat
#> Mazda RX4 21.0 6 160 110 3.90
#> Mazda RX4 Wag 21.0 6 160 110 3.90
#> Datsun 710 22.8 4 108 93 3.85
#> Hornet 4 Drive 21.4 6 258 110 3.08
#> Hornet Sportabout 18.7 8 360 175 3.15
# Select a subset of columns
bselect(mtcars, c("mpg"))
#> mpg
#> Mazda RX4 21.0
#> Mazda RX4 Wag 21.0
#> Datsun 710 22.8
#> Hornet 4 Drive 21.4
#> Hornet Sportabout 18.7
# Evaluate code with a specific random seed
with_seed(seed = 123, { runif(1) })
#> [1] 0.2875775
# Compute fraction from count vector
compute_fraction(c(1, 100, 10, 40))
#> [1] 0.006622517 0.662251656 0.066225166 0.264900662Most types of data used by sigverse are so simple-stuctured we can just expect data.frames/lists and use custom assertions to ensure it matches expectation. There are a couple of exceptions, however, where we provide S3 classes abstract away complexity in the datastores. For example we use 2 different s3 objects to store signature analysis results.
-
signature_analysis_results
The numeric results of signature analysis (no-visualisation). Designed to allow a creation of all sigstory visualisations from EXCLUSIVELY data in the object. -
sigstory_visualisations
Contains mostly visualisations only + metrics we display in sigstory reports. This helps us keep sigstory very light (logic is hard to debug in knitted quarto templates). Additionally, this object type allows us to easily write all visualisations to disk so they can be pulled into non-sigstory reporting tools.
Unless you’re developing a sigstory compatible R package you probably won’t ever need to think about either of these object types.