Journal
Elife
Publication Date
2021
Volume
10
First Page
e67403
Document Type
Open Access Publication
DOI
10.7554/eLife.67403
Rights and Permissions
eLife 2021;10:e67403 DOI: 10.7554/eLife.67403. Copyright Friedman et al. This article is distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.
Recommended Citation
Friedman, Ryan Z.; Granas, David M.; Myers, Connie A.; Corbo, Joseph C.; Cohen, Barak A.; and White, Michael A., "Information content differentiates enhancers from silencers in mouse photoreceptors." Elife. 10, e67403 (2021).
https://digitalcommons.wustl.edu/open_access_pubs/10945
Sequences were named using the following nomenclature: ‘chrom-start-stop_annotations_variant’. ‘Chrom’, ‘start’, and ‘stop’ correspond to the mm10 genomic coordinates of the sequences in BED format. ‘Annotations’ is a four-letter string where the first position indicates CRX-binding status (ChIP-seq peak or Unbound), the second position indicates CRX motif status (PWM hit, Shape motif, or Both PWM and shape motif), the third position indicates ATAC-seq status (peak in Rods but not cones, peak in Cones but not rods, peak in both rod and cone Photoreceptors, or peak in None of the above), and the fourth position indicates histone ChIP-seq status (‘Enhancer marked’ with H3K27Ac+H3K4me3-, ‘Promoter marked’ with H3K27Ac+H3K4me3+, Q for H3K27Ac-H3K4me3+, or Neither mark). ‘Variant’ indicates whether the sequence is genomic (‘WT’), mutated CRX motifs (‘MUT-allCrxSites’), scrambled shape motif (‘MUT-shape’), or a scrambled control (‘scrambled’).
Supplemnetary file 2 FASTA file of all sequences in library 2.txt (995 kB)
Sequences were named as in Supplementary file 1
Supplementary file 3 Expression measurements and annotations of all sequences.txt (1884 kB)
Values are tab-delimited. Rows are named based on the sequence name from Supplementary files 1 and 2 without the ‘variant’ information. Columns ending in ‘_WT’ indicate the wild-type sequence with the Rho promoter, ‘_MUT’ as the CRX motif mutant sequence with the Rho promoter, and ‘_POLY’ as the wild-type sequence with the Polylinker. Sequences with the scrambled shape motif were excluded from the ‘_MUT’ columns. Columns are named as follows: label, the sequence name from Supplementary files 1 and 2 without the ‘variant’ information; expression, average activity of the sequence, NaN indicates sequence was missing from the plasmid pool; expression_std, standard deviation of activity; expression_reps, number of replicates in which the sequence was measured; expression_pvalue, p-value from Welch’s t-test of log-normal data for the null hypothesis that the activity of the sequence with Rho is no different than the Rho promoter alone; expression_qvalue, FDR-correction of the p-values; library, which library contains the sequence; expression_log2, log2 average activity of the sequence; group_name, activity classification of the sequence with the Rho promoter; plot_color, hex code for visualization; variant, the ‘variant’ portion of the sequence identifier; wt_vs_mut_log2, log2 fold change between the wild-type and mutant version of the sequence, NaN indicates the wild-type and/or mutant version was not measured; wt_vs_mut_pvalue, p-value from Welch’s t-test for the null hypothesis that the wild-type and mutant sequences have the same activity; wt_vs_mut_qvalue, FDR-correction of the p-values; autonomous_activity, Boolean value for if the wild-type sequence is autonomous with the Polylinker; crx_bound, nrl_bound, and mef2d_bound, Boolean values for if the sequence overlaps a ChIP-seq peak for the corresponding TF; binding_group, string denoting each of the eight possible combinations of CRX, NRL, and MEF2D binding.
Supplementary file 4 Predicted occupancy scores for each transcription factor (TF) and each sequence.txt (1962 kB)
Values are tab-delimited. Rows are named based on the sequence name from Supplementary files 1 and 2 including the ‘variant’ information. Columns are the predicted occupancy scores for the denoted TF.
Supplementary file 5 Information content and related metrics for each sequence.txt (757 kB)
Values are tab-delimited. Rows are named based on the sequence name from Supplementary files 1 and 2, including the ‘variant’ information. Columns are named as follows: total_occupancy, total predicted occupancy of all eight transcription factors (TFs); diversity, number of TFs with predicted occupancy above 0.5; entropy, information content (which is also entropy).
Supplementary file 6 Primers used in this study.xlsx (10 kB)
Transparent reporting form.docx (175 kB)
Figures and figure supplements.pdf (5900 kB)