New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UniProt release 2019_11 changes FT and CC lines #2417
Comments
I can reproduce this, and if I tweak the assert statement to report the problem line:
Looking at the data via https://www.uniprot.org:443/uniprot/P49798.txt (e.g.
This does not look like the old SwissProt style feature lines, but more like the EMBL (and GenBank) format features. Something strange here... |
Hmm, https://web.expasy.org/docs/userman.html#FT_line describes this new feature line (currently says Release 2019_11) Biopython expects the older style still used in Release 2019_04 at least - the most recent snapshot of the documentation on archive.org: https://web.archive.org/web/20190528232014/https://web.expasy.org/docs/userman.html#FT_line There ought to be an announcement about this change somewhere... |
Found it, https://www.uniprot.org/news/2019/12/18/release - they also changed the CC lines too. |
@sdecesco in the short term, I would suggest you use the UniProt XML output instead, which can be parsed with |
See GitHub issue #2417, UniProt release 2019_11 changed the FT and CC lines. [ci skip]
Thanks @peterjc I tried Bio.seqIO but it didn't really provide me with the info I wanted. I wrote a bit of code to extract the way it used to in I haven't tested it extensively though.
|
@sdecesco The problem though is that according to the description of the new FT line, the location is not necessarily a simple integer or pair of integers, but it could be things like Perhaps the best way forward is to define a new |
The changes are deliberately following the GenBank/EMBL/INSDC style, so the location object in However, the low level |
Using the XML is not always an option since UniProt doesn't provide them for older versions of an entry. Useful if you want to compare annotations between different versions |
See #2484 for a bug fix |
Can we close this issue with #2484 applied? |
Yes #2484 fixed this issue. |
Setup
I am reporting a problem with Biopython version, Python version, and operating
system as follows:
3.5.5 |Anaconda custom (64-bit)| (default, Mar 12 2018, 23:12:44)
[GCC 7.2.0]
CPython
Linux-3.10.0-862.14.4.el7.x86_64-x86_64-with-centos-7.7.1908-Core
1.75
Expected behaviour
reading a swissprot record, the features are supposed to be correctly parsed
Actual behaviour
It fails at the feature parsing step. throwing an assertion error.
Steps to reproduce
The text was updated successfully, but these errors were encountered: