This is a brief HOWTO on compiling OSRA, (Optical Structure Recognition) on Ubuntu Jaunty. To quote the OSRA home page, OSRA is
… is a utility designed to convert graphical representations of chemical structures, as they appear in journal articles, patent documents, textbooks, trade magazines etc., into SMILES (Simplified Molecular Input Line Entry Specification – see http://en.wikipedia.org/wiki/SMILES) or SD file – a computer recognizable molecular structure format. OSRA can read a document in any of the over 90 graphical formats parseable by ImageMagick – including GIF, JPEG, PNG, TIFF, PDF, PS etc., and generate the SMILES or SDF representation of the molecular structure images encountered within that document …
Update: I’ve a newer document that shows how to install Osra on Ubuntu 11.10 (Oneiric):
Make a directory to compile the source:
mkdir /tmp/OSRA; cd /tmp/OSRA;
Be careful doing this in /tmp is cleaned upon reboot the directory may be removed.
Install dependencies needed by the OS:
sudo apt-get install libgraphicsmagick1-dev libmagick++-dev libgraphicsmagick++1-dev potrace gocr libtclap-dev libopenbabel-dev libopenbabel3 openbabel libnetpbm10 libnetpbm10-dev
Don’t install ocrad and remove it if it’s on your system (you can probably reinstall if you need to after you get Osra to compile):
sudo apt-get remove –purge ocrad;
Source Code:
Instead of manually getting the source packages download the sources used to build the packages for Ubuntu if available. Make sure the src lines are commented in, in your /etc/apt/sources.list . This will automatically download and extract the code into the current directory:
cd /tmp/OSRA; apt-get source gocr ocrad potrace;
This downloads Gocr 0.46 which the OSRA docs say may not work:
– GOCR/JOCR, optical character recognition library, version 0.43 or later (version 0.45 recommended, do not use 0.46! See special instructions for 0.47 compilation below)
Get the Osra Source and extract it
cd /tmp/OSRA;
wget http://cactus.nci.nih.gov/osra/osra-1.2.1.tgz;
tar xzvf osra-1.2.1.tgz
cd /tmp/OSRA2/osra-1.2.1;
Make a backup copy of the OSRA Makefile:
cp Makefile Makefile.bak;
Edit the Makefile
Change the following lines:
GOCR=../gocr-0.45/
to
GOCR=../gocr-0.46/
OPENBABEL=/usr/local/
to
OPENBABEL=/usr/
TCLAPINC=-I/usr/local/include/tclap/
to
TCLAPINC=-I/usr/include/tclap/
GOCR=../gocr-0.46/
to
GOCR=../gocr-0.45/
Compiling
Compile, but don’t install the potrace source:
cd /tmp/OSRA/potrace-1.8;
./configure;
make;
Compile the OSRA source:
cd /tmp/OSRA/osra-1.2.1;
make;
This produces a working OSRA binary:
./osra
./osra [-f <can/smi/sdf>] [-g] [-p] [-s <dimensions, 300×400>] [-n] [-r
<default: auto>] [-o <filename prefix>] [-t <0.2..0.8>] [–]
[–version] [-h] <filename>
Now I just need a file to test it against to see if it will run correctly.
If you want to build with Gocr 0.47 this step is required:
cd /tmp/OSRA/gocr-0.47;
./configure CPPFLAGS=-fPIC LDFLAGS=-fPIC;
make libs;
I followed you instructions but I get compilation errors. I tried with osra-1.3.5 and also with osra-1.2.1 but I’m getting compilation errors.
patching file ../ocrad-0.17//character.h
g -g -O2 -fPIC -I../ocrad-0.17/ -D_LIB -D_MT -Wall -I../potrace-1.8//src/ -I../gocr-0.45//src/ -I../gocr-0.45//include/ -I/usr//include/openbabel-2.0/ -I/usr/include/tclap/ -I/usr/include/ImageMagick -g -O2 -Wall -W -c osra_ocr.cpp
In file included from /usr/include/c /4.3/cwchar:52,
from /usr/include/c /4.3/bits/postypes.h:47,
from /usr/include/c /4.3/bits/char_traits.h:47,
from /usr/include/c /4.3/string:47,
from ../ocrad-0.17/common.h:18,
from osra_ocr.cpp:34:
/usr/include/wchar.h:140: error: declaration of ‘wchar_t* wcscpy(wchar_t*, const wchar_t*) throw ()’ throws different exceptions
pgm2asc.h:36: error: from previous declaration ‘wchar_t* wcscpy(wchar_t*, const wchar_t*)’
/usr/include/wchar.h:208: error: declaration of ‘wchar_t* wcsdup(const wchar_t*) throw ()’ throws different exceptions
pgm2asc.h:40: error: from previous declaration ‘wchar_t* wcsdup(const wchar_t*)’
/usr/include/wchar.h:214: error: declaration of ‘wchar_t* wcschr(const wchar_t*, wchar_t) throw ()’ throws different exceptions
pgm2asc.h:35: error: from previous declaration ‘wchar_t* wcschr(const wchar_t*, wchar_t)’
/usr/include/wchar.h:249: error: declaration of ‘size_t wcslen(const wchar_t*) throw ()’ throws different exceptions
pgm2asc.h:37: error: from previous declaration ‘size_t wcslen(const wchar_t*)’
osra_ocr.cpp: In function ‘char get_atom_label(Magick::Image, Magick::ColorGray, int, int, int, int, double, int, int)’:
osra_ocr.cpp:56: warning: deprecated conversion from string constant to ‘char*’
osra_ocr.cpp: In function ‘bool detect_bracket(int, int, unsigned char*)’:
osra_ocr.cpp:202: warning: deprecated conversion from string constant to ‘char*’
make: *** [osra_ocr.o] Error 1
What am I going wrong? please help
What version of Ubuntu are you on?
I just tried rebuilding 1.3.5 on Ubuntu Karmic and I had to make some changes to the steps above. Note, I had to use ocrad 0.19 from GNU and not the source package that Ubuntu has. The Osra website says you have to use 0.19 for Osra 1.3.5 to compile.
So do the following to compile 1.3.5 on Ubuntu Karmic:
cd /tmp/OSRA; wget http://cactus.nci.nih.gov/osra/osra-1.3.5.tgz; tar xzvf osra-1.3.5.tgz
cd /tmp/OSRA; wget http://ftp.gnu.org/gnu/ocrad/ocrad-0.19.tar.gz
tar xzvf ocrad-0.19.tar.gz; cd ocrad-0.19/; ./configure; make;
Make the following changes the the Osra Makefile (in /tmp/OSRA/osra-1.3.5):
POTRACE=../potrace-1.8/
GOCR=../gocr-0.46/
OCRAD=../ocrad-0.19/
OPENBABEL=/usr/
Then run make and it should work.
Let me know if this works for you and if it does I’ll do a new posting on how to do this on Karmic.
If your not on Karmic let me know.
Also, this is late at night and there might be some typos or errors … 🙂
Thanks, it worked, however there was one more thing I had to do. I had to search and replace all the /usr/local/… occurrences in the Makefile and change them to /usr/ not just the 4 variables above.
BTW: I use Ubuntu 9.04 Jaunty but with the Karmic instruction it works fine.
Thank you very very much you are a life saver
Hi AM,
Glad you were able to get it to compile. 🙂
Cheers
Mick
I have some problems witch compiling, gocr/pgm2asc.h is in folder but configure don’t like it.
Is there any hope for creating deb file for people that don’t want to compile?
I have to tried to install the OSRA 1.3.8 on ubuntu 10.10, but I was not lucky. When I tried to configure it I got an error around the gocr pgm2asc.h file. But when I try to change the path of the gocr source it also could find the pgm2asc.h
Somebody has an idea what am I wrong or how can I complie the OSRA under ubuntu 10.10 or in the lucky case under ubuntu 11.10?
I’ll see if I can get it to work on 11.10 in the next week or so.
Thank you very much Mick!
I got it to build, I had to install the patched gocr that they mention in their install docs, pointing configure at the sources didn’t work.
Here’s my quick tips, I’ll write a new posting about this later this weekend:
Get Osra, at time of writing it’s 1.3.8:
cd /tmp; mkdir OSRA; cd OSRA; wget http://cactus.nci.nih.gov/osra/osra-1.3.8.tgz ; wget http://cactus.nci.nih.gov/osra/gocr-0.50pre-patched.tgz ;
Extract both packages:
tar xzvf osra-1.3.8.tgz
tar xzvf gocr-0.50pre-patched.tgz
Install the following (you will need some gaphicsmagick packages also, libgraphicsmagick1-dev libgraphicsmagick 3 and maybe some others).
sudo apt-get install openbabel potrace ocrad libtclap-dev libopenbabel-dev libpotrace-dev libocrad-dev
You need to make and install the patched Gocr:
cd /tmp/OSRA/gocr-0.50pre-patched
sudo make install;
This will install the patched gocr into /usr/local . This may cause conflicts if you’ve installed gocr using another methods for instance using apt-get or aptitude.
Next compile Osra:
cd /tmp/OSRA/gocr-0.50pre-patched
./configure
make all
To install Osra on your system, run the following using sudo:
sudo make install
Look for it in /usr/local/bin .
New posting about Osra 1.3.8 on Ubuntu 11.10:
http://timony.com/mickzblog/2012/03/24/build-install-osra-1-3-8/