We made it! Well, actually, as usual, Peter made it. His cleverly worded blog item about the “Doktor Who” model of Open Source development caught Glyn Moody’s attention who then managed to get it accepted as a Slashdot news item. All you geeks out there know that this is a great honor. None of my three attempts to get something accepted made it in the last ten years 🙂
This particular patch was submitted by Mark Rijnbeek in my team via the CDK patch tracker. I first go to the patch page, assign the patch to myself and set the patch status to “pending” to indicate that it is being worked on. It patches a piece of code which uses the VFLIB graph matching library to provide substructure searches with an Ullman and a VF2 algorithm.
Again, for my own records:
The executive summary of the reviewing task goes like:
browse the code
mark up code you think is buggy
note missing unit tests
note missing JavaDoc
warn for subjected PMD warnings
optionally note other problems
optionally any other comment you have
And this is how it went – I’m leaving out things that were not applicable:
Browse the code and mark-up buggy parts
Egon had made an older version of this code available via GIT. I checked it out and looked at the code, which looked horrible because it was a 1:1 translation of a horrible looking C code. Clearly, a decent naming of the variables would greatly improve the code but I remember a statement that the translator himself could not make sense out of this, so the original author is to blame :-). I do not get the impression that this problem can be rectified quickly. In fact, it took Mark a few days to debug this code by adding a rich collection of debug messages. I’m not sure that this is how it should be. The code is essentially unreadable.
Note missing unit tests and javadoc
Mark and Rajarhsi supplied a number of unit tests for the code and they all pass. The code itself has javadoc and there are usage examples – very good.
Meanwhile, Egon, our CDK Uberworker, has posted the following:
Hi Rajarshi, Mark,
I have had a look at the vflib branch, and note that the code is aimed
at the standard module; like all new code, but for code in this module
in particular, should adhere to CDK's 'stable' standards...
(BTW, there is a Nightly at [0] which has been running on an older
version of the patch)
The below are some guidelines, please feel free to ask me or search
the cdk-devel archives for the details.
1. clean JavaDoc
You can use DocCheck to check that your clean has clean JavaDoc:
ant -f javadoc.xml doccheck
A common error is missing periods at the end of first sentences in the
JavaDoc. The first sentence is important to get right, per JavaDoc
standards.
2. no PMD warning (or with a good excuse)
ant -f pmd.xml
3. unit test coverage
Each module has a test suite MfooTests, which points to a Test class
doing coverage testing... new unit tests classes must be added to this
suite, MstandardTests for the vflib patch. The coverage testing class
will then check that all new code is tested.
I note missing tests of NodePair and State.
Then these issues have been resolved, I'll look at the
code/functionality itself.
Right – so that happens when you are not fast enough. And I stopped being fast enough long ago: So I guess I’ll just leave it where Egon is leaving it. So, guys, go and fix that stuff and then we’ll look at it again 🙂
If the pompous title caught your attention, and you are ashamed of that: Don’t worry. It is all true. My cheminformatics and metabolism group at the European Bioinformatics Institute (EBI) is looking for a phd student this year and all you need to do is apply through the regular route. The range of possible topics is wide open, going from metabolomics via automated structure elucidation of metabolites to mining chemical information from the printed literature, and more. Your own suggestions are of course welcome.
The EBI is the world’s largest open provider of biological and chemical information.We are located, together with the Sanger Institute for Genome Research, on the beautiful campus of Hinxton Hall, a few miles south of Cambridge.
One of the small lakes on the Wellcome Trust Campus in Hinxton
Our PhD students are enrolled with the University of Cambridge.
The important part for now: The application deadline for the Fall PhD selection is July 15 puttygen , 2009. And: Please drop me a note if you applied.
We are in the process of hiring five new ChEBI team members at the moment – one, our new curator Steve Turner, has arrived and our candidate for one of the software engineer positions accepted our offer today.
NMRShiftDB is a database of organic compounds and their nuclear magnetic resonance (NMR) data. At present, we hold 30.000 compounds and 1D NMR spectra for carbon, proton and some other nuclei.
NMRShiftDB developer Stefan Kuhn, in collaboration with the EBI systems group, has now established an NMRShiftDB node at the European Bioinformatics Institute (EBI), which brings us up to four working nodes in the NMRShiftDB network again.
Congratulations to the ChEBI team for publishing ChEBI version 57.
ChEBI Release 57 now contains links to NMRShiftDB. Search ChEBI for “caffeine” PuTTY quit command , for example, and you find the link to the carbon NMR spectrum of caffeine on the “automatic XREFs” page of ChEBI, in the “Small Molecules” section.
ChEBI now contains just under 17,963 manually annotated entries of which 108 entries have been submitted via the ChEBI Submission tool (www.ebi.ac.uk/chebi/submissions). The next ChEBI Release will be on the 24 June 2009.
We received our official award letter from BBSRC Tools and Resources Fund today for the ChEBI ontology development grant. Needless to say, we are thrilled. We are now going to work together with Michael Ashburner’s group at the University of Cambridge to align ChEBI with other OBO Foundry ontologies by adoption of the Basic Formal Ontology and the Relationship Types Ontology.
This will include extensive annotation of the ChEBI ontology required after adoption of BFO and RO. The adoption of the BFO will require a major reorganisation of the upper levels of the ChEBI ontology in order to allow it to align to the BFO. This
reorganisation can only be achieved by manual annotation although some semi-automatic means will be employed to aidthe curator. In addition to the reorganisation of the upper levels https://puttygen.in , new relationships will be introduced semi-automatically but as the ChEBI ethos requires that all data is manually checked to maintain ChEBI’s high standards of data quality, we expect a major annotation task. The project is funded for three years. Stay tuned. We’ll report on our progress on a regular basis.
The number of structures and spectra in NMRShiftDB now exceeds 31.000 and 35.000 puttygen ssh , respectively. The number of proton spectra alone is now 12.934. This is due to NMRShiftDB developer Stefan Kuhn in my group importing a recent donation from our collaborators Reinhard Dunkel and Heinz Kolshorn. Thanks to Heinz and Reinhard for their generosity.
It’s going to be all over the place soon anyway, so I’ll make it short: The Royal Society of Chemistry has announced that it has aquired ChemSpider. This is great news and I’m confident that it will be a move to even more openess in chemistry and cheminformatics. It will also allow the RSC to use Tony fantastic tools for even more semantic markup of articles. I’m looking forward to talking to everyone about the implications. For now, congratulations, Tony, and congratulations, RSC, for this great deal.
With ChEBI release 56 behind us, I thought I’d share some insight into how ChEBI is created and what we do to prepare a release. In the last years, the ChEBI team on average consisted of two software engineers maintaining and improving the software and two to three curators doing the data entry and curation. It is remarkable, that, by now, the question of which chemical compounds make it into ChEBI is completely community driven. Requests to enter compounds are submitted by users and other database maintainers via the ChEBI curator request tracker on SourceForge. Besides increasing the public knowledge of mankind, the biggest benefit and driving force for submitters is the assignement of a stable ChEBI identifier which then can be cited and linked to from other resources.
With ChEBI release 55 we have introduced the new submission tool which now allows our submitter to create ChEBI datasets themselves which a) gives our users more control over what they want to see in ChEBI and b) saves our curators some duplicate work.
In preparation for a release, here is what the ChEBI team does.
Create automatic cross-references to PubChem, UniProt, IntEnz, BRENDA, SABIO-RK, ArrayExpress, IntAct, Patents etc…These are all run a week before the release and are based on ChEBI identifier matching or text matching.
Annotation of entity of the month
Submissions deposited directly into the database by users are processed by our annotators.
On the release day:
Data is exported overnight into multiple formats, OBO format, SDF, Oracle data dumps and PostgreSQL/MySQL dumps.
Public web site updated with the entity of the month.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.AcceptRead More
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.