Bio.Alphabet to molecule type transition for file output #3156

peterjc · 2020-07-27T12:00:51Z

Cross reference #2046, we are removing Bio.Alphabet. There are relatively few use-cases where it is needed, one being working with file formats which track DNA vs RNA vs protein. For example, the SeqXML format needs to know the kind of sequence.

I wanted to write down some examples, how they might have looked the old way, should look the new way, and if and how a backward compatible version might look. This should influence the documentation for Biopython 1.78, and perhaps the code itself.

Old fashioned, worked up to Biopython 1.77:

from Bio.Alphabet import generic_dna
from Bio import SeqIO
# This file has a single record only
record = SeqIO.read("Tests/Fasta/wisteria.nu", "fasta", generic_dna)
rec_start = record[:20]
SeqIO.write(rec_start, "start_only.xml", "seqxml")

Current style (with the code right now on master):

from Bio import SeqIO
# This file has a single record only
record = SeqIO.read("Tests/Fasta/wisteria.nu", "fasta")
rec_start = record[:20]
rec_start.annotations["molecule_type"] = "DNA"
SeqIO.write(rec_start, "start_only.xml", "seqxml")

Possible backward compatible form to run on both, assuming Bio.Alphabet is still here but deprecated - this does not look very nice:

import warnings
from Bio import BiopythonDeprecationWarning
with warnings.catch_warnings():
    warnings.simplefilter("ignore", BiopythonDeprecationWarning)
    from Bio.Alphabet import generic_dna

from Bio import SeqIO
record = SeqIO.read("Tests/Fasta/wisteria.nu", "fasta", generic_dna)
rec_start = record[:20]
rec_start.annotations["molecule_type"] = "DNA"  # harmless on older Biopython
SeqIO.write(rec_start, "start_only.xml", "seqxml")

Possible backward compatible form to run on both, assuming Bio.Alphabet is gone:

try:
    from Bio.Alphabet import generic_dna
except ImportError:
    generic_dna = None

from Bio import SeqIO
# This file has a single record only
record = SeqIO.read("Tests/Fasta/wisteria.nu", "fasta", generic_dna)
rec_start = record[:20]
rec_start.annotations["molecule_type"] = "DNA"
SeqIO.write(rec_start, "start_only.xml", "seqxml")

Both examples assume that SeqIO.read retains a stub alphabet argument (or at least, an optional third argument defaulting to None).

Hopefully only a minority of Biopython usage will require backward compatibility like this.

We could make molecule type an optional argument to SeqIO.read and SeqIO.parse (setting at input time), perhaps replacing the former optional alphabet argument.

We might make molecule type an optional argument to SeqIO.write (setting at output time), perhaps allowing:

from Bio import SeqIO
# This file has a single record only
record = SeqIO.read("Tests/Fasta/wisteria.nu", "fasta")
rec_start = record[:20]
SeqIO.write(rec_start, "start_only.xml", "seqxml", "DNA")

The text was updated successfully, but these errors were encountered:

peterjc · 2020-07-29T08:08:40Z

Example based on #2488, old code using Bio.Alphabet:

from Bio.Alphabet import generic_dna
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
seq = Seq("ATGCGTGCAT", generic_dna)
record = SeqRecord(seq, id="test")
SeqIO.write(record, "test_write.gb", "genbank")

New version:

from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
seq = Seq("ATGCGTGCAT")
record = SeqRecord(seq, id="test", annotations={"molecule_type": "DNA"})
SeqIO.write(record, "test_write.gb", "genbank")

Possible backward compatible version assuming Bio.Alphabet is simply removed:

try:
    from Bio.Alphabet import generic_dna
except ImportError:
    generic_dna is None
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
if generic_dna:
    # Newer Biopython refuses second argument
    seq = Seq("ATGCGTGCAT", generic_dna)
else:
    seq = Seq("ATGCGTGCAT")
record = SeqRecord(seq, id="test", annotations={"molecule_type": "DNA"})
SeqIO.write(record, "test_write.gb", "genbank")

There is more than more way to write this - hopefully we can agree something we all find simple and clear to recommend as best practice?

peterjc · 2020-11-25T14:15:06Z

Belatedly closing with https://biopython.org/wiki/Alphabet

This was referenced Jul 27, 2020

Discard alphabet when adding Seq objects #3153

Merged

No more Bio.Alphabet in SeqIO #3166

Merged

Should we remove/replace Bio.Alphabet? #2046

Closed

This was referenced Jul 29, 2020

Alphabet PendingDeprecationWarning when I didn't use Alphabet #2488

Closed

remove Bio.Alphabet and use an informative error message #3173

Merged

peterjc mentioned this issue Sep 2, 2020

Page about Bio.Alphabet removal biopython/biopython.github.io#173

Merged

peterjc closed this as completed Nov 25, 2020

oschwengers mentioned this issue Aug 5, 2022

ValueError: Need a Nucleotide or Protein alphabet oschwengers/bakta#116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bio.Alphabet to molecule type transition for file output #3156

Bio.Alphabet to molecule type transition for file output #3156

peterjc commented Jul 27, 2020

peterjc commented Jul 29, 2020

peterjc commented Nov 25, 2020

Bio.Alphabet to molecule type transition for file output #3156

Bio.Alphabet to molecule type transition for file output #3156

Comments

peterjc commented Jul 27, 2020

peterjc commented Jul 29, 2020

peterjc commented Nov 25, 2020