Alignment files of high-throughput sequencing data against the UCSC mm9 genome

Note that the reads included in these alignments are collapsed so that a single entity represent a unique sequence using fastx_collapser in FASTX-Toolkit. So, you need to refer the name of each read for number of duplicated reads (e.g. "51782-15" means that the sequence no. 51782 was read 15 times.)

Example of formatting in accompanied annotation files:

# {read id} {number of matches in genome} {representative class} {mapped transcripts}
606-781 3       LINE    LINE:L1_Mur1,CDS:Rps24,UTR3:Rps24
  

Custom SNP-corrected non-redundant transcriptome

This sequence set was prepared for direct alignments to transcriptome to allow consistent alignment around exon-exon junctions on alternatively spliced genes.

Alignment files of high-throughput sequencing data against SNP-corrected non-redundant transcriptome

Short tag alignment files of high-throughput sequencing data against SNP-corrected non-redundant transcriptome

These alignments were performed from the first 27 nucleotides of sequences instead of those after 3'-adapter clipping.

Detected LIN28A binding sites and scores

These binding sites are detected for FDR < 0.001 cutoff. Example:

# {transcript} {0-based coordinate} {base} {CLIP-seq depth} {Shannon's entropy}
NM_001014974 2930 G 215 1.389300743
  

Source Codes written for Data Analysis

  • Full source code and revision history are available from github.
  • Most part of the software is distributed under the MIT license.
  • Requires a modern distribution of POSIX-compatible operating system with Python and GNU toolchain. See README file included in the tarball for instructions.
  • Contact Hyeshik Chang <hyeshik@snu.ac.kr> for any request or question about this.

Raw and processed sequence read data

Sequence files can be downloaded from NCBI Gene Expression Omnibus (GSE37114) or NCBI Sequence Read Archive (SRP012118).