1、ChIP-seq analysis with MACS2 Tips and tricks,Sami Heikkinen, PhD Docent in Molecular Bioinformatics Institute of Biomedicine, UEF,Schmidt et al, Methods, 2009,ChIP-Seq simplified,Where?,Park, Nat Rev Genetics, 2009,From binding to binding sites,Typically millions of reads per sample,Park, Nat Rev Ge
2、netics, 2009,ChIP-seq,200 bp,36-50 bp,Control sample: “Input” or “IgG” Input: sonicated chromatin without immunoprecipitation IgG: “unspecific” IP,MACS2,Model-based Analysis of ChIP-Seq Original version published by Yong Zhang and Tao Liu from the lab of X. Shirley Liu at the Dana-Farber Cancer Inst
3、itute, Boston Genome Biology 2008, 9:R137 now at version 2.1.0.20140616, developed and maintained by Tao Liu at https:/ https:/ of command line programs to call peaks in ChIP-seq data Much improved since v1.x!,diffpeak,bdgdiff,bdgcmp,bdgbroadcall,MACS2 program(s),peaks.narrowPeak,callpeak,summits.be
4、d,peaks.xls,model.r,model.pdf,INPUT DATA: aligned sequence reads,OUTPUT FILEs,treat_pileup.bdg,control_lambda.bdg,refinepeaks,refinepeak.bed,randsample,filterdup,predictd,pileup,pileup.bdg,bdgpeakcall,OUTPUT,callpeak - Options,Various options to indicate/control input, output, peak modelling and pea
5、k callingmacs2 callpeak usage: macs2 callpeak -h -t TFILE TFILE . -c CFILE CFILE .-f AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE-g GSIZE -keep-dup KEEPDUPLICATES-buffer-size BUFFER_SIZE -outdir OUTDIR -n NAME-B -verbose VERBOSE -trackline -SPMR-s TSIZE -bw BW -m MFOLD MFOLD -fix-bimod
6、al-nomodel -shift SHIFT -extsize EXTSIZE-q QVALUE -p PVALUE -to-large -ratio RATIO-down-sample -seed SEED -nolambda-slocal SMALLLOCAL -llocal LARGELOCAL -broad-broad-cutoff BROADCUTOFF -call-summits-t/-treatment FILENAME This is the only REQUIRED parameter for MACS.,Using MACS connect to server,Open
7、 the SSH client at Win All programs SSH Secure shell Secure shell client “Quick connect” connection : intron.uef.fi username : password: ,Unix 101,pwd show Present Working Directory cd Change Directory e.g. cd /home/work/public to get to the folder we use today (from wherever you are) or, to get bac
8、k to your home directory: cd $HOME or, back one step cd , or two steps cd / Usage tip: use up/down arrow keys to move in command history ls LiSt files in directory e.g. ls -l to show file and folder names AND other info (Long format) head / tail show first/last lines of a (text) file e.g. head -20 r
9、ef_hg19.txt Usage tip: use the TAB key to fill in available file/folder names,Using MACS - setup,cd /home/work/public mkdir macsout_: e.g. spheikki for me each student MUST have their own folder!to avoid overlapping MACS outputs checks on seq files ls l seq head seq/* check that macs2 works macs2 ca
10、llpeak,callpeak - Options,Various options to indicate/control input, output, peak modelling and peak callingmacs2 callpeak usage: macs2 callpeak -h -t TFILE TFILE . -c CFILE CFILE .-f AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE-g GSIZE -keep-dup KEEPDUPLICATES-buffer-size BUFFER_SIZE
11、-outdir OUTDIR -n NAME-B -verbose VERBOSE -trackline -SPMR-s TSIZE -bw BW -m MFOLD MFOLD -fix-bimodal-nomodel -shift SHIFT -extsize EXTSIZE-q QVALUE -p PVALUE -to-large -ratio RATIO-down-sample -seed SEED -nolambda-slocal SMALLLOCAL -llocal LARGELOCAL -broad-broad-cutoff BROADCUTOFF -call-summits,ca
12、llpeak Options - Input,Input files arguments:-t TFILE TFILE ., -treatment TFILE TFILE .ChIP-seq treatment file. If multiple files are given as -t A B C, then they will all be read and combined. REQUIRED.-c CFILE CFILE ., -control CFILE CFILE .Control file. If multiple files are given as -c A B C, th
13、en they will all be read and combined.-f AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPE, -format AUTO,BAM,SAM,BED,ELAND,ELANDMULTI,ELANDEXPORT,BOWTIE,BAMPEFormat of tag file, “AUTO“, “BED“ or “ELAND“ or “ELANDMULTI“ or “ELANDEXPORT“ or “SAM“ or “BAM“ or “BOWTIE“ or “BAMPE“. The default A
14、UTO option will let MACS decide which format the file is. Please check the definition in README file if you choose ELAND/ELANDMULTI/ELANDEXPORT/SAM/BAM/BOWTIE. DEFAULT: “AUTO“-g GSIZE, -gsize GSIZEEffective genome size. It can be 1.0e+9 or 1000000000, or shortcuts:hs for human (2.7e9), mm for mouse
15、(1.87e9), ce for C. elegans (9e7) anddm for fruitfly (1.2e8), Default:hs-keep-dup KEEPDUPLICATESIt controls the MACS behavior towards duplicate tags at the exact same location - the same coordination and the same strand. The auto option makes MACScalculate the maximum tags at the exact same location
16、 based on binomal distribution using 1e-5 as pvalue cutoff; and the all option keeps every tags. If aninteger is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location. Default: 1-buffer-size BUFFER_SIZEBuffer size for incrementally
17、increasing internal array size to store reads alignment information. In most cases, you dont have to change this parameter.However, if there are large number of chromosomes/contigs/scaffolds in your alignment, its recommended to specify a smaller buffer size in orderto decrease memory usage (but it
18、will take longer time to read alignment files). Minimum memory requested for reading an alignment file isabout # of CHROMOSOME * BUFFER_SIZE * 2 Bytes. DEFAULT: 100000,callpeak Options - Output,Output arguments:-outdir OUTDIRIf specified all output files will be written to that directory. Default: t
19、he present working directory-n NAME, -name NAMEExperiment name, which will be used to generate output file names. DEFAULT: “NA“-B, -bdgWhether or not to save extended fragment pileup, and local lambda tracks (two files) at every bp into a bedGraph file. DEFAULT: False-verbose VERBOSESet verbose leve
20、l of runtime message. 0: only show critical message, 1: show additional warning message, 2: show process information,3: show debug messages. DEFAULT:2-tracklineTells MACS to include trackline with bedGraph files. To include this trackline while displaying bedGraph at UCSC genome browser, can show na
21、meand description of the file as well. However my suggestion is to convert bedGraph to bigWig, then show the smaller and faster binary bigWig fileat UCSC genome browser, as well as downstream analysis. Require -B to be set. Default: Not include trackline.-SPMRIf True, MACS will save signal per milli
22、on reads for fragment pileup profiles. Require -B to be set. Default: False,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limitsmacs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -n def
23、aults,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limitsmacs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -n defaults ls l macsout_ head -40 macsout_/*,callpeak Options Peak calling
24、1,Peak calling arguments 2: -nolambdaIf True (=set), MACS will use fixed background lambda as local lambda for every peak region. Normally, MACS calculates a dynamic local lambda to reflect the local bias due to potential chromatin structure.-slocal SMALLLOCALThe small nearby region in basepairs to
25、calculate dynamic lambda. This is used to capture the bias near the peak summit region. Invalid if there is no controldata. If you set this to 0, MACS will skip slocal lambda calculation. *Note* that MACS will always perform a d-size local lambda calculation. The finallocal bias should be the maximu
26、m of the lambda value from d, slocal, and llocal size windows. DEFAULT: 1000-llocal LARGELOCALThe large nearby region in basepairs to calculate dynamic lambda. This is used to capture the surround bias. If you set this to 0, MACS will skip llocallambda calculation. *Note* that MACS will always perfo
27、rm a d-size local lambda calculation. The final local bias should be the maximum of the lambda valuefrom d, slocal, and llocal size windows. DEFAULT: 10000.-broadIf set, MACS will try to call broad peaks by linking nearby highly enriched regions. The linking region is controlled byanother cutoff thr
28、ough -linking-cutoff. The maximum linking region length is 4 times of d from MACS. DEFAULT: False-broad-cutoff BROADCUTOFFCutoff for broad region. This option is not available unless -broad is set. If -p is set, this is a pvalue cutoff,otherwise, its a qvalue cutoff. DEFAULT: 0.1-call-summitsIf set,
29、 MACS will use a more sophisticated signal processing approach to find subpeak summits in each enriched peak region.DEFAULT: False,-call-summits,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4: Adjust mfold limitsFrom command
30、history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -call-summits -n cs.defaults,callpeak Options Peak calling 2,Peak calling arguments 1:-q QVALUE, -qvalue QVALUEMinimum FDR (q-value) cutoff for peak detection.
31、 DEFAULT: 0.05. -q, and -p are mutually exclusive.-p PVALUE, -pvalue PVALUEPvalue cutoff for peak detection. DEFAULT: not set. -q, and -p are mutually exclusive. If pvalue cutoff is set, qvalue will not be calculated and reported as -1in the final .xls file.-to-largeWhen set, scale the small sample
32、up to the bigger sample. By default, the bigger dataset will be scaled down towards the smaller dataset, which will lead tosmaller p/qvalues and more specific results. Keep in mind that scaling down will bring down background noise more. DEFAULT: False-ratio RATIOWhen set, use a custom scaling ratio
33、 of ChIP/control (e.g. calculated using NCIS) for linear scaling. DEFAULT: ingore-down-sampleWhen set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. Warning: This option will make yourresult unstable and irreproducible since each time, random reads w
34、ould be selected. Consider to use randsample script instead. If used togetherwith SPMR, 1 million unique reads will be randomly picked. Caution: due to the implementation, the final number of selected reads may not be as youexpected! DEFAULT: False-seed SEEDSet the random seed while down sampling da
35、ta. Must be a non-negative integer in order to be effective. DEFAULT: not set,callpeak Options The Model,Shifting model arguments:-s TSIZE, -tsize TSIZETag size (=read length). This will overide the auto detected tag size. DEFAULT: Not set-bw BWBand width for picking regions to compute fragment size
36、. This value is only used while building the shifting model. DEFAULT: 300-m MFOLD MFOLD, -mfold MFOLD MFOLDSelect the regions within MFOLD range of high-confidence enrichment ratio against background to build model. Fold-enrichmentin regions must be lower than upper limit, and higher than the lower
37、limit. Use as “-m 10 30“. DEFAULT:5 50-fix-bimodalWhether turn on the auto pair model process. If set, when MACS failed to build paired model, it will use the nomodel settings, the -exsize parameter to extend each tags towards 3 direction. Not to use this automate fixation is a default behavior now.
38、 DEFAULT: False-nomodelWhether or not to build the shifting model. If True, MACS will not build model. by default it means shifting size = 100, try to set extsize to change it.DEFAULT: False-shift SHIFT(NOT the legacy -shiftsize option!) The arbitrary shift in bp. Use discretion while setting it oth
39、er than default value. When NOMODEL is set, MACS will usethis value to move cutting ends (5) towards 5-3 direction then apply EXTSIZE to extend them to fragments. When this value is negative, ends will bemoved toward 3-5 direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * h
40、alf of EXTSIZE together with EXTSIZE option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you cant set values other than 0 if format is BAMPE for paired-end data.DEFAULT: 0.-extsize EXTSIZEThe arbitrary extension size in bp. When nomodel is true, MACS will use this v
41、alue as fragment size to extend each read towards 3 end, then pile them up.Its exactly twice the number of obsolete SHIFTSIZE. In previous language, each read is moved 5-3 direction to middle of fragment by 1/2 d, thenextended to both direction with 1/2 d. This is equivalent to say each read is exte
42、nded towards 5-3 into a d size fragment. DEFAULT: 200. EXTSIZE andSHIFT can be combined when necessary. Check SHIFT option.,macs model: the d (page 1),macs model: the d (page 2),Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: Adjust model band width Run 4:
43、 Adjust mfold limitsFrom command history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -call-summits -bw 200 n bw200.cs,Using MACS test different settings,Run 1: Using default settings Run 2: Call summits Run 3: A
44、djust model band width Run 4: Adjust mfold limitsFrom command history, find the previous macs2 command and edit the red parts: macs2 callpeak -t seq/treat_chr3.sam -c seq/input_chr3.sam -outdir macsout_ -call-summits -bw 200 -m 40 80 n m40.80.bw200.cs,ChIP-seq results compared,ChIP-seq results: run
45、3 vs 1-2,ChIP-seq results: run 4 vs 1-3,Summary,MACS v2 is easy to use even on command line Has many settings, but finally only a few that really need to be used, and fewer still that (typically) need optimizing for you project BUT, if you need to optimize, there are many options for doing it Can also detect broad peak regions and allows for custom analysis protocols via the bedGraph outputs,