Python microbiorust-py
For ease of use you can call directly from Python
🦀🐍
No need to install Rust
Install microbiorust-py using pip:
pip install microbiorust
example script for converting from GenBank to protein fasta:
import microbiorust as mb
#to use json.loads:
import json
#convert genbank to GFF3 (writes directly to file)
#include DNA sequence in the GFF3 file dna=True, otherwise dna=False only writes the feature table data
mb.gbk_to_gff("filename.gbk", dna=True)
faa_collection = mb.gbk_to_faa("filename.gbk")
#print the locus tag and protein fasta as valid protein fasta format
for info in faa_collection.values():
print(f">{info.locus_tag}\n{info.faa}\n")
#export directly to JSON (zero dependencies, handled by Rust)
json_str = faa_collection.to_json()
print(json_str)
#capture the json in a Python dictionary
data = json.loads(json_str)
#write json to a file, NB: JSON format file is larger than gbk, embl or gff3
with open("output.json", "w") as f:
f.write(json_str)
#to write directly to a file in a different format
records = mb.parse_gbk("filename.gbk")
records.write_faa("output.faa") # writes protein fasta
records.write_fna("output.fna") # writes DNA sequence contig fasta
microbiorust-py contains the following features:
GenBank input:
parse_gbk # parse GenBank format into a Python dictionary-like object (Collection)
gbk_to_faa # converts gbk format to protein fasta as above
gbk_to_fna # converts gbk format to DNA sequence contig fasta
gbk_to_ffn # converts gbk format to DNA gene sequence fasta
gbk_to_gff # converts gbk format to GFF3
gbk_to_faa_count # counts numbers of protein fasta converted from gbk
EMBL input as above:
parse_embl # parse EMBL format into a Python dictionary
embl_to_faa
embl_to_fna
embl_to_gff
BLAST input:
parse_tabular # streaming parse BLAST input as -outfmt 6
parse_XML # streaming BLAST parser for -outfmt 5
Alignment input (MSA):
subset_msa_alignment # subset the alignment by row and column
get_consensus # get a consensus sequence for your MSA alignment
Calculate sequence metrics:
hydrophobicity
amino_counts # counts of each amino acid in the provided sequence
amino_percentage # percentage of each amino acid in the provided sequence