Snakemake Combine analysis of different input types in one workflow

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I am sorry for the newbie question regarding snakemake:

Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.

Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.

Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).

More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).

I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?

asked Nov 23 '18 at 15:18

JoergL

add a comment |

I am sorry for the newbie question regarding snakemake:

Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.

Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.

Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).

More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).

I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?

asked Nov 23 '18 at 15:18

JoergL

add a comment |

I am sorry for the newbie question regarding snakemake:

Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.

Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.

Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).

More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).

I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?

asked Nov 23 '18 at 15:18

JoergL

I am sorry for the newbie question regarding snakemake:

Genrally put:
What is the most elegant way to generate a workflow with two different input types in a combined way.

Let's say I have a number of samples wit different input types. Type a) is raw data in fastq format. Type b) is already assembled.

Now I want a pipeline which does assembly for all samples of type a. Next, it should do annotation with all samples (a and b).

More exemplaric:
Currently, I have a config file with the entries "samples" (type a) and "genomes" (type b).

I can generate a rule spades for samples and follow-up rule prokka for samples.
I could of course add a second rule prokka2 for genomes but how can I have a combined rule prokka for both types?

snakemake

asked Nov 23 '18 at 15:18

JoergL

asked Nov 23 '18 at 15:18

JoergL

asked Nov 23 '18 at 15:18

JoergL

asked Nov 23 '18 at 15:18

JoergL

asked Nov 23 '18 at 15:18

JoergL

add a comment |

3 Answers
3

active

oldest

votes

Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:

touch s1.fq.gz s2.fq.gz s3.bam s4.bam

This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:

samples= ['s1', 's2', 's3', 's4']



rule all:

    input:

        expand('{sample}.annotated.bam', sample= samples)



rule assemble:

    input:

        fq= '{sample}.fq.gz'

    output:

        bam= '{sample}.bam'

    shell:

        r"""

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= '{sample}.bam'

    output:

        bam= '{sample}.annotated.bam'

    shell:

        r"""

        my_annotator {input.bam} > {output.bam}

        """

You can test the execution with Snakemake -p -n

answered Nov 26 '18 at 8:59

dariober

1,1411222

Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

– JoergL
Nov 27 '18 at 10:53

Sorry for the inconvenience. config.yaml appeared just below my initial question.

– JoergL
Nov 27 '18 at 11:37

Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

– dariober
Nov 27 '18 at 14:59

or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

– JoergL
Nov 28 '18 at 15:03

add a comment |

this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)

samples:

 SRR653893:

  fw: SRR653893_1.fastq.gz

  rv: SRR653893_2.fastq.gz

 genomes:

   GCF:

    fasta:  GCF_000008985.1_ASM898v1_genomic.fna

answered Nov 27 '18 at 11:01

JoergL

add a comment |

@user:1114453

The example works fine, apparently because the output files of rule “assemble” for sample 3 and 4 were created beforehand.

I tried to make some structuring for the input data and results. This is my input folder

.

`-- input

    |-- s1.fq.gz

    |-- s2.fq.gz

    |-- s3.bam

    `-- s4.bam

Using snakemak (example below), I try to make assembly rule assemble for s1 and s2 and copy the already assembled rule cp_assemblies s3 and s4 to the assembly folder. Then from the assembly folder I run the annotation of all samples. How I can improve the code to deal with such situation?

samples= ['s1', 's2', 's3', 's4']

assemblies = ['s3', 's4'] 



input_dir="./input/"

results_dir="./results/"



rule all:

    input:

        expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)



rule cp_assemblies:

    input:

        fa= input_dir + '{sample}.bam'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        cp -v -f {input.fa} > {output.bam}

        """



rule assemble:

    input:

        fq= input_dir + '{sample}.fq.gz'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= results_dir + 'assembly/{sample}.bam'

    output:

        bam= results_dir + 'annotation/{sample}.annotated.bam'

    shell:

        """

        my_annotator {input.bam} > {output.bam}

        """

answered Mar 25 at 11:16

mya

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53449173%2fsnakemake-combine-analysis-of-different-input-types-in-one-workflow%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:

touch s1.fq.gz s2.fq.gz s3.bam s4.bam

This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:

samples= ['s1', 's2', 's3', 's4']



rule all:

    input:

        expand('{sample}.annotated.bam', sample= samples)



rule assemble:

    input:

        fq= '{sample}.fq.gz'

    output:

        bam= '{sample}.bam'

    shell:

        r"""

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= '{sample}.bam'

    output:

        bam= '{sample}.annotated.bam'

    shell:

        r"""

        my_annotator {input.bam} > {output.bam}

        """

You can test the execution with Snakemake -p -n

answered Nov 26 '18 at 8:59

dariober

1,1411222

Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

– JoergL
Nov 27 '18 at 10:53

Sorry for the inconvenience. config.yaml appeared just below my initial question.

– JoergL
Nov 27 '18 at 11:37

Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

– dariober
Nov 27 '18 at 14:59

or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

– JoergL
Nov 28 '18 at 15:03

add a comment |

Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:

touch s1.fq.gz s2.fq.gz s3.bam s4.bam

This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:

samples= ['s1', 's2', 's3', 's4']



rule all:

    input:

        expand('{sample}.annotated.bam', sample= samples)



rule assemble:

    input:

        fq= '{sample}.fq.gz'

    output:

        bam= '{sample}.bam'

    shell:

        r"""

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= '{sample}.bam'

    output:

        bam= '{sample}.annotated.bam'

    shell:

        r"""

        my_annotator {input.bam} > {output.bam}

        """

You can test the execution with Snakemake -p -n

answered Nov 26 '18 at 8:59

dariober

1,1411222

Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

– JoergL
Nov 27 '18 at 10:53

Sorry for the inconvenience. config.yaml appeared just below my initial question.

– JoergL
Nov 27 '18 at 11:37

Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

– dariober
Nov 27 '18 at 14:59

or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

– JoergL
Nov 28 '18 at 15:03

add a comment |

Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:

touch s1.fq.gz s2.fq.gz s3.bam s4.bam

This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:

samples= ['s1', 's2', 's3', 's4']



rule all:

    input:

        expand('{sample}.annotated.bam', sample= samples)



rule assemble:

    input:

        fq= '{sample}.fq.gz'

    output:

        bam= '{sample}.bam'

    shell:

        r"""

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= '{sample}.bam'

    output:

        bam= '{sample}.annotated.bam'

    shell:

        r"""

        my_annotator {input.bam} > {output.bam}

        """

You can test the execution with Snakemake -p -n

answered Nov 26 '18 at 8:59

dariober

1,1411222

Snakemake will figure out by itself that some samples are already partially processed and it will take them forward as required. For example, given these input files:

touch s1.fq.gz s2.fq.gz s3.bam s4.bam

This workflow will apply rule "assemble" to s1.fq.gz and s2.fq.gz only and rule annotate to all the four:

samples= ['s1', 's2', 's3', 's4']



rule all:

    input:

        expand('{sample}.annotated.bam', sample= samples)



rule assemble:

    input:

        fq= '{sample}.fq.gz'

    output:

        bam= '{sample}.bam'

    shell:

        r"""

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= '{sample}.bam'

    output:

        bam= '{sample}.annotated.bam'

    shell:

        r"""

        my_annotator {input.bam} > {output.bam}

        """

You can test the execution with Snakemake -p -n

answered Nov 26 '18 at 8:59

dariober

1,1411222

answered Nov 26 '18 at 8:59

dariober

1,1411222

answered Nov 26 '18 at 8:59

dariober

1,1411222

answered Nov 26 '18 at 8:59

dariober

1,1411222

Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

– JoergL
Nov 27 '18 at 10:53

Sorry for the inconvenience. config.yaml appeared just below my initial question.

– JoergL
Nov 27 '18 at 11:37

Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

– dariober
Nov 27 '18 at 14:59

or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

– JoergL
Nov 28 '18 at 15:03

add a comment |

Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

– JoergL
Nov 27 '18 at 10:53

Sorry for the inconvenience. config.yaml appeared just below my initial question.

– JoergL
Nov 27 '18 at 11:37

Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

– dariober
Nov 27 '18 at 14:59

or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

– JoergL
Nov 28 '18 at 15:03

Hey,wonderful! Thank you very much. How would this work with a config.yaml file? See answer below.

– JoergL
Nov 27 '18 at 10:53

Sorry for the inconvenience. config.yaml appeared just below my initial question.

– JoergL
Nov 27 '18 at 11:37

Snakemake reads the config file using the --configfile options and the content of the file is automatically stored into the variable config accessible within the Snakefile. You can use print(config) towards the top of the Snakefile to see the structure of config (which is basically a dictionary)

– dariober
Nov 27 '18 at 14:59

or course. My question is just how I solve the problem when using a config.yaml file? I have an example above

– JoergL
Nov 28 '18 at 15:03

add a comment |

this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)

samples:

 SRR653893:

  fw: SRR653893_1.fastq.gz

  rv: SRR653893_2.fastq.gz

 genomes:

   GCF:

    fasta:  GCF_000008985.1_ASM898v1_genomic.fna

answered Nov 27 '18 at 11:01

JoergL

add a comment |

this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)

samples:

 SRR653893:

  fw: SRR653893_1.fastq.gz

  rv: SRR653893_2.fastq.gz

 genomes:

   GCF:

    fasta:  GCF_000008985.1_ASM898v1_genomic.fna

answered Nov 27 '18 at 11:01

JoergL

add a comment |

this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)

samples:

 SRR653893:

  fw: SRR653893_1.fastq.gz

  rv: SRR653893_2.fastq.gz

 genomes:

   GCF:

    fasta:  GCF_000008985.1_ASM898v1_genomic.fna

answered Nov 27 '18 at 11:01

JoergL

this is how my config.yaml looks like. I just do not know how I can handle files from samples and genomes in one rule (or finde a better config solution)

samples:

 SRR653893:

  fw: SRR653893_1.fastq.gz

  rv: SRR653893_2.fastq.gz

 genomes:

   GCF:

    fasta:  GCF_000008985.1_ASM898v1_genomic.fna

answered Nov 27 '18 at 11:01

JoergL

answered Nov 27 '18 at 11:01

JoergL

answered Nov 27 '18 at 11:01

JoergL

answered Nov 27 '18 at 11:01

JoergL

add a comment |

.

`-- input

    |-- s1.fq.gz

    |-- s2.fq.gz

    |-- s3.bam

    `-- s4.bam

samples= ['s1', 's2', 's3', 's4']

assemblies = ['s3', 's4'] 



input_dir="./input/"

results_dir="./results/"



rule all:

    input:

        expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)



rule cp_assemblies:

    input:

        fa= input_dir + '{sample}.bam'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        cp -v -f {input.fa} > {output.bam}

        """



rule assemble:

    input:

        fq= input_dir + '{sample}.fq.gz'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= results_dir + 'assembly/{sample}.bam'

    output:

        bam= results_dir + 'annotation/{sample}.annotated.bam'

    shell:

        """

        my_annotator {input.bam} > {output.bam}

        """

answered Mar 25 at 11:16

mya

add a comment |

.

`-- input

    |-- s1.fq.gz

    |-- s2.fq.gz

    |-- s3.bam

    `-- s4.bam

samples= ['s1', 's2', 's3', 's4']

assemblies = ['s3', 's4'] 



input_dir="./input/"

results_dir="./results/"



rule all:

    input:

        expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)



rule cp_assemblies:

    input:

        fa= input_dir + '{sample}.bam'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        cp -v -f {input.fa} > {output.bam}

        """



rule assemble:

    input:

        fq= input_dir + '{sample}.fq.gz'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= results_dir + 'assembly/{sample}.bam'

    output:

        bam= results_dir + 'annotation/{sample}.annotated.bam'

    shell:

        """

        my_annotator {input.bam} > {output.bam}

        """

answered Mar 25 at 11:16

mya

add a comment |

.

`-- input

    |-- s1.fq.gz

    |-- s2.fq.gz

    |-- s3.bam

    `-- s4.bam

samples= ['s1', 's2', 's3', 's4']

assemblies = ['s3', 's4'] 



input_dir="./input/"

results_dir="./results/"



rule all:

    input:

        expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)



rule cp_assemblies:

    input:

        fa= input_dir + '{sample}.bam'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        cp -v -f {input.fa} > {output.bam}

        """



rule assemble:

    input:

        fq= input_dir + '{sample}.fq.gz'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= results_dir + 'assembly/{sample}.bam'

    output:

        bam= results_dir + 'annotation/{sample}.annotated.bam'

    shell:

        """

        my_annotator {input.bam} > {output.bam}

        """

answered Mar 25 at 11:16

mya

.

`-- input

    |-- s1.fq.gz

    |-- s2.fq.gz

    |-- s3.bam

    `-- s4.bam

samples= ['s1', 's2', 's3', 's4']

assemblies = ['s3', 's4'] 



input_dir="./input/"

results_dir="./results/"



rule all:

    input:

        expand( results_dir + 'annotation/{sample}.annotated.bam', sample= samples)



rule cp_assemblies:

    input:

        fa= input_dir + '{sample}.bam'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        cp -v -f {input.fa} > {output.bam}

        """



rule assemble:

    input:

        fq= input_dir + '{sample}.fq.gz'

    output:

        bam= results_dir + 'assembly/{sample}.bam'

    shell:

        """

        my_assembler {input.fq} > {output.bam}

        """



rule annotate:

    input:

        bam= results_dir + 'assembly/{sample}.bam'

    output:

        bam= results_dir + 'annotation/{sample}.annotated.bam'

    shell:

        """

        my_annotator {input.bam} > {output.bam}

        """

answered Mar 25 at 11:16

mya

answered Mar 25 at 11:16

mya

answered Mar 25 at 11:16

mya

answered Mar 25 at 11:16

mya

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Wsrtjtyk