Snakemake Combine analysis of different input types in one workflow
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I am sorry for the newbie question regarding snakemake:
Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.
Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.
Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).
More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).
I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?
snakemake
add a comment |
I am sorry for the newbie question regarding snakemake:
Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.
Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.
Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).
More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).
I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?
snakemake
add a comment |
I am sorry for the newbie question regarding snakemake:
Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.
Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.
Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).
More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).
I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?
snakemake
I am sorry for the newbie question regarding snakemake:
Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.
Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.
Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).
More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).
I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?
snakemake
snakemake
asked Nov 23 '18 at 15:18
JoergLJoergL
1
1
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:
touch s1.fq.gz s2.fq.gz s3.bam s4.bam
This workflow will apply rule "assemble" to s1.fq.gz
and s2.fq.gz
only and rule annotate
to all the four:
samples= ['s1', 's2', 's3', 's4']
rule all:
input:
expand('{sample}.annotated.bam', sample= samples)
rule assemble:
input:
fq= '{sample}.fq.gz'
output:
bam= '{sample}.bam'
shell:
r"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= '{sample}.bam'
output:
bam= '{sample}.annotated.bam'
shell:
r"""
my_annotator {input.bam} > {output.bam}
"""
You can test the execution with Snakemake -p -n
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Snakemake reads the config file using the--configfile
options and the content of the file is automatically stored into the variableconfig
accessible within the Snakefile. You can useprint(config)
towards the top of the Snakefile to see the structure ofconfig
(which is basically a dictionary)
– dariober
Nov 27 '18 at 14:59
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
add a comment |
this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)
samples:
SRR653893:
fw: SRR653893_1.fastq.gz
rv: SRR653893_2.fastq.gz
genomes:
GCF:
fasta: GCF_000008985.1_ASM898v1_genomic.fna
add a comment |
@user:1114453
The example works fine, apparently because the output files of rule “assemble” for sample 3 and 4 were created beforehand.
I tried to make some structuring for the input data and results. This is my input folder
.
`-- input
|-- s1.fq.gz
|-- s2.fq.gz
|-- s3.bam
`-- s4.bam
Using snakemak (example below), I try to make assembly rule assemble
for s1
and s2
and copy the already assembled rule cp_assemblies
s3
and s4
to the assembly folder. Then from the assembly folder I run the annotation of all samples. How I can improve the code to deal with such situation?
samples= ['s1', 's2', 's3', 's4']
assemblies = ['s3', 's4']
input_dir="./input/"
results_dir="./results/"
rule all:
input:
expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)
rule cp_assemblies:
input:
fa= input_dir + '{sample}.bam'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
cp -v -f {input.fa} > {output.bam}
"""
rule assemble:
input:
fq= input_dir + '{sample}.fq.gz'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= results_dir + 'assembly/{sample}.bam'
output:
bam= results_dir + 'annotation/{sample}.annotated.bam'
shell:
"""
my_annotator {input.bam} > {output.bam}
"""
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449173%2fsnakemake-combine-analysis-of-different-input-types-in-one-workflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:
touch s1.fq.gz s2.fq.gz s3.bam s4.bam
This workflow will apply rule "assemble" to s1.fq.gz
and s2.fq.gz
only and rule annotate
to all the four:
samples= ['s1', 's2', 's3', 's4']
rule all:
input:
expand('{sample}.annotated.bam', sample= samples)
rule assemble:
input:
fq= '{sample}.fq.gz'
output:
bam= '{sample}.bam'
shell:
r"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= '{sample}.bam'
output:
bam= '{sample}.annotated.bam'
shell:
r"""
my_annotator {input.bam} > {output.bam}
"""
You can test the execution with Snakemake -p -n
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Snakemake reads the config file using the--configfile
options and the content of the file is automatically stored into the variableconfig
accessible within the Snakefile. You can useprint(config)
towards the top of the Snakefile to see the structure ofconfig
(which is basically a dictionary)
– dariober
Nov 27 '18 at 14:59
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
add a comment |
Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:
touch s1.fq.gz s2.fq.gz s3.bam s4.bam
This workflow will apply rule "assemble" to s1.fq.gz
and s2.fq.gz
only and rule annotate
to all the four:
samples= ['s1', 's2', 's3', 's4']
rule all:
input:
expand('{sample}.annotated.bam', sample= samples)
rule assemble:
input:
fq= '{sample}.fq.gz'
output:
bam= '{sample}.bam'
shell:
r"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= '{sample}.bam'
output:
bam= '{sample}.annotated.bam'
shell:
r"""
my_annotator {input.bam} > {output.bam}
"""
You can test the execution with Snakemake -p -n
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Snakemake reads the config file using the--configfile
options and the content of the file is automatically stored into the variableconfig
accessible within the Snakefile. You can useprint(config)
towards the top of the Snakefile to see the structure ofconfig
(which is basically a dictionary)
– dariober
Nov 27 '18 at 14:59
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
add a comment |
Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:
touch s1.fq.gz s2.fq.gz s3.bam s4.bam
This workflow will apply rule "assemble" to s1.fq.gz
and s2.fq.gz
only and rule annotate
to all the four:
samples= ['s1', 's2', 's3', 's4']
rule all:
input:
expand('{sample}.annotated.bam', sample= samples)
rule assemble:
input:
fq= '{sample}.fq.gz'
output:
bam= '{sample}.bam'
shell:
r"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= '{sample}.bam'
output:
bam= '{sample}.annotated.bam'
shell:
r"""
my_annotator {input.bam} > {output.bam}
"""
You can test the execution with Snakemake -p -n
Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:
touch s1.fq.gz s2.fq.gz s3.bam s4.bam
This workflow will apply rule "assemble" to s1.fq.gz
and s2.fq.gz
only and rule annotate
to all the four:
samples= ['s1', 's2', 's3', 's4']
rule all:
input:
expand('{sample}.annotated.bam', sample= samples)
rule assemble:
input:
fq= '{sample}.fq.gz'
output:
bam= '{sample}.bam'
shell:
r"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= '{sample}.bam'
output:
bam= '{sample}.annotated.bam'
shell:
r"""
my_annotator {input.bam} > {output.bam}
"""
You can test the execution with Snakemake -p -n
answered Nov 26 '18 at 8:59
darioberdariober
1,1411222
1,1411222
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Snakemake reads the config file using the--configfile
options and the content of the file is automatically stored into the variableconfig
accessible within the Snakefile. You can useprint(config)
towards the top of the Snakefile to see the structure ofconfig
(which is basically a dictionary)
– dariober
Nov 27 '18 at 14:59
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
add a comment |
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Snakemake reads the config file using the--configfile
options and the content of the file is automatically stored into the variableconfig
accessible within the Snakefile. You can useprint(config)
towards the top of the Snakefile to see the structure ofconfig
(which is basically a dictionary)
– dariober
Nov 27 '18 at 14:59
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.
– JoergL
Nov 27 '18 at 10:53
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Sorry for the inconvenience. config.yaml appeared just below my initial question.
– JoergL
Nov 27 '18 at 11:37
Snakemake reads the config file using the
--configfile
options and the content of the file is automatically stored into the variable config
accessible within the Snakefile. You can use print(config)
towards the top of the Snakefile to see the structure of config
(which is basically a dictionary)– dariober
Nov 27 '18 at 14:59
Snakemake reads the config file using the
--configfile
options and the content of the file is automatically stored into the variable config
accessible within the Snakefile. You can use print(config)
towards the top of the Snakefile to see the structure of config
(which is basically a dictionary)– dariober
Nov 27 '18 at 14:59
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
or course. My question is just how I solve the problem when using a config.yaml file? I have an example above
– JoergL
Nov 28 '18 at 15:03
add a comment |
this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)
samples:
SRR653893:
fw: SRR653893_1.fastq.gz
rv: SRR653893_2.fastq.gz
genomes:
GCF:
fasta: GCF_000008985.1_ASM898v1_genomic.fna
add a comment |
this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)
samples:
SRR653893:
fw: SRR653893_1.fastq.gz
rv: SRR653893_2.fastq.gz
genomes:
GCF:
fasta: GCF_000008985.1_ASM898v1_genomic.fna
add a comment |
this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)
samples:
SRR653893:
fw: SRR653893_1.fastq.gz
rv: SRR653893_2.fastq.gz
genomes:
GCF:
fasta: GCF_000008985.1_ASM898v1_genomic.fna
this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)
samples:
SRR653893:
fw: SRR653893_1.fastq.gz
rv: SRR653893_2.fastq.gz
genomes:
GCF:
fasta: GCF_000008985.1_ASM898v1_genomic.fna
answered Nov 27 '18 at 11:01
JoergLJoergL
1
1
add a comment |
add a comment |
@user:1114453
The example works fine, apparently because the output files of rule “assemble” for sample 3 and 4 were created beforehand.
I tried to make some structuring for the input data and results. This is my input folder
.
`-- input
|-- s1.fq.gz
|-- s2.fq.gz
|-- s3.bam
`-- s4.bam
Using snakemak (example below), I try to make assembly rule assemble
for s1
and s2
and copy the already assembled rule cp_assemblies
s3
and s4
to the assembly folder. Then from the assembly folder I run the annotation of all samples. How I can improve the code to deal with such situation?
samples= ['s1', 's2', 's3', 's4']
assemblies = ['s3', 's4']
input_dir="./input/"
results_dir="./results/"
rule all:
input:
expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)
rule cp_assemblies:
input:
fa= input_dir + '{sample}.bam'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
cp -v -f {input.fa} > {output.bam}
"""
rule assemble:
input:
fq= input_dir + '{sample}.fq.gz'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= results_dir + 'assembly/{sample}.bam'
output:
bam= results_dir + 'annotation/{sample}.annotated.bam'
shell:
"""
my_annotator {input.bam} > {output.bam}
"""
add a comment |
@user:1114453
The example works fine, apparently because the output files of rule “assemble” for sample 3 and 4 were created beforehand.
I tried to make some structuring for the input data and results. This is my input folder
.
`-- input
|-- s1.fq.gz
|-- s2.fq.gz
|-- s3.bam
`-- s4.bam
Using snakemak (example below), I try to make assembly rule assemble
for s1
and s2
and copy the already assembled rule cp_assemblies
s3
and s4
to the assembly folder. Then from the assembly folder I run the annotation of all samples. How I can improve the code to deal with such situation?
samples= ['s1', 's2', 's3', 's4']
assemblies = ['s3', 's4']
input_dir="./input/"
results_dir="./results/"
rule all:
input:
expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)
rule cp_assemblies:
input:
fa= input_dir + '{sample}.bam'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
cp -v -f {input.fa} > {output.bam}
"""
rule assemble:
input:
fq= input_dir + '{sample}.fq.gz'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= results_dir + 'assembly/{sample}.bam'
output:
bam= results_dir + 'annotation/{sample}.annotated.bam'
shell:
"""
my_annotator {input.bam} > {output.bam}
"""
add a comment |
@user:1114453
The example works fine, apparently because the output files of rule “assemble” for sample 3 and 4 were created beforehand.
I tried to make some structuring for the input data and results. This is my input folder
.
`-- input
|-- s1.fq.gz
|-- s2.fq.gz
|-- s3.bam
`-- s4.bam
Using snakemak (example below), I try to make assembly rule assemble
for s1
and s2
and copy the already assembled rule cp_assemblies
s3
and s4
to the assembly folder. Then from the assembly folder I run the annotation of all samples. How I can improve the code to deal with such situation?
samples= ['s1', 's2', 's3', 's4']
assemblies = ['s3', 's4']
input_dir="./input/"
results_dir="./results/"
rule all:
input:
expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)
rule cp_assemblies:
input:
fa= input_dir + '{sample}.bam'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
cp -v -f {input.fa} > {output.bam}
"""
rule assemble:
input:
fq= input_dir + '{sample}.fq.gz'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= results_dir + 'assembly/{sample}.bam'
output:
bam= results_dir + 'annotation/{sample}.annotated.bam'
shell:
"""
my_annotator {input.bam} > {output.bam}
"""
@user:1114453
The example works fine, apparently because the output files of rule “assemble” for sample 3 and 4 were created beforehand.
I tried to make some structuring for the input data and results. This is my input folder
.
`-- input
|-- s1.fq.gz
|-- s2.fq.gz
|-- s3.bam
`-- s4.bam
Using snakemak (example below), I try to make assembly rule assemble
for s1
and s2
and copy the already assembled rule cp_assemblies
s3
and s4
to the assembly folder. Then from the assembly folder I run the annotation of all samples. How I can improve the code to deal with such situation?
samples= ['s1', 's2', 's3', 's4']
assemblies = ['s3', 's4']
input_dir="./input/"
results_dir="./results/"
rule all:
input:
expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)
rule cp_assemblies:
input:
fa= input_dir + '{sample}.bam'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
cp -v -f {input.fa} > {output.bam}
"""
rule assemble:
input:
fq= input_dir + '{sample}.fq.gz'
output:
bam= results_dir + 'assembly/{sample}.bam'
shell:
"""
my_assembler {input.fq} > {output.bam}
"""
rule annotate:
input:
bam= results_dir + 'assembly/{sample}.bam'
output:
bam= results_dir + 'annotation/{sample}.annotated.bam'
shell:
"""
my_annotator {input.bam} > {output.bam}
"""
answered Mar 25 at 11:16
myamya
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449173%2fsnakemake-combine-analysis-of-different-input-types-in-one-workflow%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown