New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle feature references another sequence #2334
Handle feature references another sequence #2334
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2334 +/- ##
=======================================
Coverage 83.88% 83.89%
=======================================
Files 317 317
Lines 51548 51556 +8
=======================================
+ Hits 43242 43251 +9
+ Misses 8306 8305 -1
Continue to review full report at Codecov.
|
Rebasing and dealing with conflicts and new release... |
026f378
to
4d87f65
Compare
I haven't forgotten about this, but we've been distracted with the Python 2/3 cleanup in the meantime. |
Rebasing again (and leaving out the dual licensing changes)... |
4f3320d
to
07d000d
Compare
Code seems to be working fine, but have not found a good example yet - for instance none of the features here give the expected translation: from Bio import SeqIO
files = ("FP885871.gbk", "FP885876.gbk")
references = SeqIO.to_dict(SeqIO.read(_, "gb") for _ in files)
for acc, ref in references.items():
print(f"{acc} length {len(ref)} has {len(ref.features)} features")
for f in ref.features:
if f.type == "CDS" and len(f.location.parts) > 1:
print(f"{f.type} feature at {f.location}")
#print(f)
expected = f.qualifiers["translation"][0]
try:
translation = f.extract(ref.seq, references).translate(cds=True)
except:
translation = f.extract(ref.seq, references).translate(to_stop=True)
print(expected + "<-- expected")
print(translation + "<-- computed", expected == translation)
if "exception" in f.qualifiers and f.qualifiers["exception"] == ["RNA editing"]:
continue
assert expected == translation, f"{expected} vs {translation}" Some (all?) of them have human readable comments explaining why (in more detail than just "RNA editing") at least. |
67bd109
to
90e50a6
Compare
Sadly this example from ftp://ftp.ncbi.nlm.nih.gov/genomes/archive/old_refseq/Arabidopsis_thaliana/ used in the Tutorial from drawing tRNA on chromosomes does not appear to have any clear trans-splicing annotation: filenames = [
"CHR_I/NC_003070.gbk",
"CHR_II/NC_003071.gbk",
"CHR_III/NC_003074.gbk",
"CHR_IV/NC_003075.gbk",
"CHR_V/NC_003076.gbk",
]
references = SeqIO.index_db("Arabidopsis.idx", filenames, "genbank") Perhaps a newer version might... or maybe this is just not an appropriate example. |
Zebrafish to the rescue - a well behaved example from Adam Sjøgren, BX465837.11 which has a couple of CDS with trans-splicing with BX537337.9 https://www.ncbi.nlm.nih.gov/nuccore/BX465837.11 Should work in EMBL format too: https://www.ebi.ac.uk/ena/browser/api/embl/BX465837.11 |
@MarkusPiotrowski - would you be a good person to review this? I'm hoping to turn the Zebrafish example into something for the Tutorial and/or docstrings. |
This is going to need some merge conflicts resolved now... |
If the location refers to other records, those records can be supplied in an optional references dictionary, where the records will be looked up by the ref (key) and the value is expected to be the same type as the parent_sequence parameter (and thus the type extract() returns). Refs biopython#808
This is to faciliate using SeqIO.to_dict or SeqIO.index etc with the extract method. In that use case, want to focus on the annotation of the feature, not the parent records.
90e50a6
to
1fdc9ab
Compare
@mdehoon do you want to review? |
I guess these exceptions:
in |
Yes, as per my comment #808 (comment) in those cases |
304dcd3
to
585fc9a
Compare
Thank you. |
This pull request addresses issue #808, including the work of Adam Sjøgren via the mailing list.
I hereby agree to dual licence this and any previous contributions under both
the Biopython License Agreement AND the BSD 3-Clause License.
I have read the
CONTRIBUTING.rst
file, have runflake8
locally, andunderstand that AppVeyor and TravisCI will be used to confirm the Biopython unit
tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rst
andCONTRIB.rst
as part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)