openbabel-sys 0.5.4+openbabel-3.1.1

Native bindings to OpenBabel
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
/**********************************************************************
Copyright (C) 2007 by Chris Morley

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation version 2 of the License.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.
***********************************************************************/
#include <iostream>
#include <sstream>
#include <string>
#include <algorithm>
#include <iterator>
#include <openbabel/babelconfig.h>
#include <openbabel/obconversion.h>
#include <openbabel/mol.h>

#include <zlib.h>

using namespace std;
namespace OpenBabel
{

class PNGFormat : public OBFormat
{
public:
  PNGFormat()
  {
    OBConversion::RegisterFormat("png",this);
    OBConversion::RegisterOptionParam("y", this, 1, OBConversion::INOPTIONS);
    OBConversion::RegisterOptionParam("y", this, 1, OBConversion::OUTOPTIONS);
  }

  virtual const char* Description()
  {
    return
    "PNG 2D depiction\n"
    "or add/extract chemical structures from a .png file\n\n"

    "The PNG format has several uses. The most common is to generate a\n"
    ":file:`.png` file for one or more molecules.\n"
    "2D coordinates are generated if not present::\n\n"
    "  obabel mymol.smi -O image.png\n\n"

    "Chemical structure data can be embedded in the :file:`.png` file\n"
    "(in a ``tEXt`` chunk)::\n\n"
    "  obabel mymol.mol -O image.png -xO molfile\n\n"

    "The parameter of the ``-xO`` option specifies the format (\"file\"can be added).\n"
    "Note that if you intend to embed a 2D or 3D format, you may have to call\n"
    "``--gen2d`` or ``--gen3d`` to generate the required coordinates if they are\n"
    "not present in the input.\n\n"

    "Molecules can also be embedded in an existing PNG file::\n\n"
    "  obabel existing.png mymol1.smi mymol2.mol -O augmented.png -xO mol\n\n"

    "Reading from a PNG file will extract any embedded chemical structure data::\n\n"
    "  obabel augmented.png -O contents.sdf\n\n"

    "Write Options e.g. -xp 500\n"
    " p <pixels> image size, default 300\n"
    " w <pixels> image width (or from image size)\n"
    " h <pixels> image height (or from image size)\n"
    " c# number of columns in table\n"
    " r# number of rows in table\n"
    " N# max number objects to be output\n"
    " u no element-specific atom coloring\n"
    "    Use this option to produce a black and white diagram\n"
    " U do not use internally-specified color\n"
    "    e.g. atom color read from cml or generated by internal code\n"
    " b <color> background color, default white\n"
    "    e.g ``-xb yellow`` or ``-xb #88ff00`` ``-xb none`` is transparent.\n"
    "    Just ``-xb`` is black with white bonds.\n"
    "    The atom symbol colors work with black and white backgrounds,\n"
    "    but may not with other colors.\n"
    " B <color> bond color, default black\n"
    "    e.g ``-xB`` yellow or ``-xB #88ff00``\n"
    " C do not draw terminal C (and attached H) explicitly\n"
    "    The default is to draw all hetero atoms and terminal C explicitly,\n"
    "    together with their attched hydrogens.\n"
    " a draw all carbon atoms\n"
    "    So propane would display as H3C-CH2-CH3\n"
    " d do not display molecule name\n"
    " m do not add margins to the image\n"
    "    This only applies if there is a single molecule to depict.\n"
    "    Implies -xd.\n"
    " s use asymmetric double bonds\n"
    " t use thicker lines\n"
    " A display aliases, if present\n"
    "    This applies to structures which have an alternative, usually\n"
    "    shorter, representation already present. This might have been input\n"
    "    from an A or S superatom entry in an sd or mol file, or can be\n"
    "    generated using the --genalias option. For example::\n \n"

    "      obabel -:\"c1cc(C=O)ccc1C(=O)O\" -O out.png\n"
    "             --genalias -xA\n \n"

    "    would add a aliases COOH and CHO to represent the carboxyl and\n"
    "    aldehyde groups and would display them as such in the svg diagram.\n"
    "    The aliases which are recognized are in data/superatom.txt, which\n"
    "    can be edited.\n"
    " O <format ID> Format of embedded text\n"
    "      For example, ``molfile`` or ``smi``.\n"
    "      If there is no parameter, input format is used.\n"
    " y <additional chunk ID> Write to a chunk with specified ID\n\n"

    "Read Options e.g. -ay\n"
    " y <additional chunk ID> Look also in chunks with specified ID\n\n"

    "If Cairo was not found when Open Babel was compiled, then\n"
    "the 2D depiction will be unavailable. However, it will still be\n"
    "possible to extract and embed chemical data in :file:`.png` files.\n"
    "\n"

    ".. seealso::\n\n"

    "    :ref:`PNG_2D_depiction`\n\n"
    ;
  };

  virtual const char* TargetClassDescription()
  {
    static string txt;
    txt = " PNG_files\n"; //so reports "n PNG_files converted"
    txt += OBFormat::TargetClassDescription(); //to display OBMol options in GUI
    return txt.c_str();
  }

  virtual unsigned int Flags()
  {
      return READONEONLY | READBINARY | WRITEBINARY | DEPICTION2D;
  };

  virtual bool ReadChemObject(OBConversion* pConv)
  {
    bool ret = ReadMolecule(nullptr, pConv);
    pConv->GetChemObject(); //increments output index
    return ret;
  };
  virtual bool WriteChemObject(OBConversion* pConv)
  {
    //If there is a PNG input file, embed all the subsequent molecules in it
    if(!CopyOfInput.empty() && bytesToIEND>0)
    {
      OBBase* pOb = pConv->GetChemObject();
      return WriteMolecule(pOb, pConv);
    }
    else
    {
      _hasInputPngFile = false;
      //draw image in PNG2, which will test whether embedding is required
      OBFormat* ppng2 = OBConversion::FindFormat("_png2");
      if(!ppng2)
      {
        obErrorLog.ThrowError("PNG Format","PNG2Format not found. Probably the Cairo library is not loaded.", obError);
        return false;
      }
      bool ret = ppng2->WriteChemObject(pConv);
      if(pConv->IsLast())
        pConv->SetOutFormat(""); // report as output objects, not "PNG_files"
      return ret;
    }
  };

virtual bool ReadMolecule(OBBase* pOb, OBConversion* pConv);
virtual bool WriteMolecule(OBBase* pOb, OBConversion* pConv);

private:
  int _count; //number of chemical objects converted
  vector<char> CopyOfInput;
  unsigned bytesToIEND; //number of bytes up to but not including the IEND chunk.
  unsigned origBytesToIEND; //saved between WriteMolecule calls
  bool _hasInputPngFile;

  //Read and write number consisting of 4 bytes with most significant bytes first.
  //Should be independent of compiler and platform.
  unsigned long Read32(istream& ifs)
  {
    char ch;
    unsigned long val=0;
    for(int i=0; i<4; ++i)
    {
      if(!ifs.get(ch))
        return 0;
      val = val * 0x100 + (unsigned char)ch;
    }
    return val;
  }

  void Write32(unsigned long val, ostream& ofs)
  {
    char p[4];
    for(int i=0; i<4; ++i)
    {
      p[3-i] = (char)val % 0x100;
      val /= 0x100;
    }
    ofs.write(p, 4);
  }
};

  ////////////////////////////////////////////////////

//Make an instance of the format class
PNGFormat thePNGFormat;

/////////////////////////////////////////////////////////////////

bool PNGFormat::ReadMolecule(OBBase* pOb, OBConversion* pConv)
{
  istream& ifs = *pConv->GetInStream();
  if(pConv->IsFirstInput())
  {
    _count=0;
    _hasInputPngFile=true;
  }
  const unsigned char pngheader[] = {137,80,78,71,13,10,26,10,0};
  char readbytes[9];
  ifs.read(readbytes, 8);

  if(!equal(pngheader, pngheader+8, readbytes))
  {
    obErrorLog.ThrowError("PNG Format","Not a PNG file", obError);
     return false;
  }

  //Loop through all the chunks
  while(ifs)
  {
    unsigned int len = Read32(ifs);
    ifs.read(readbytes,4);
    string chunkid(readbytes, readbytes+4);
    if(chunkid=="IEND")
    {
      bytesToIEND = ifs.tellg();
      bytesToIEND -= 8;
      break;
    }
    streampos pos = ifs.tellg();

    const char* altid = pConv->IsOption("y",OBConversion::INOPTIONS);
    if(chunkid=="tEXt" || chunkid=="zTXt" || (altid && chunkid==altid))
    {
      string keyword;
      getline(ifs, keyword, '\0');
      unsigned int datalength = len - keyword.size()-1;

      //remove "file" from end of keyword
      transform(keyword.begin(),keyword.end(),keyword.begin(),::tolower);
      string::size_type pos = keyword.find("file");
      if(pos!=string::npos)
        keyword.erase(pos);

      OBFormat* pFormat = OBConversion::FindFormat(keyword.c_str());
      if(pFormat)
      {
        //We have found embedded text that we need to extract
        stringstream ss;
        if(chunkid[0]!='z')
        {
          //Copy it to a stringstream
          istreambuf_iterator<char> initer(ifs);
          ostreambuf_iterator<char> outiter(ss);
          for (unsigned int i = 0; i < datalength; ++i)
            *outiter++ = *initer++;
        }

        else
        {
          //Needs to be uncompressed first
          Bytef* pCompTxt = new Bytef[datalength];
          ifs.read((char*)pCompTxt, datalength);
          --datalength; //for compression method byte
          uLongf uncompLen;
          Bytef* pUncTxt = new Bytef[datalength*6];//guess uncompressed length. NASTY!
          if(*pCompTxt!=0 /*compression method*/
            || uncompress(pUncTxt, &uncompLen, pCompTxt+1, datalength)!=Z_OK)
          {
            obErrorLog.ThrowError("PNG Format","Errors in decompression", obError);
            delete[] pUncTxt;
            delete[] pCompTxt;
            return false;
          }
          pUncTxt[uncompLen] = '\0';
          ss.str((char*)pUncTxt);
          delete[] pUncTxt;
          delete[] pCompTxt;
        }

        //Use a new OBConversion object to convert embedded text
        OBConversion conv2(&ss, pConv->GetOutStream());
        conv2.CopyOptions(pConv);
        conv2.SetInAndOutFormats(pFormat, pConv->GetOutFormat());
        _count += conv2.Convert();

        ifs.ignore(4);//CRC
        continue; //already at the end of the chunk
      }
    }
    //Move to end of chunk
    ifs.seekg(pos);
    ifs.ignore(len+4); //data + CRC
  }


  //if we will be writing a png file, read and save the whole input file.
  CopyOfInput.clear();
  if(pConv->GetOutFormat()==this)
  {
    ifs.seekg(0);
    copy(istreambuf_iterator<char>(ifs), istreambuf_iterator<char>(),back_inserter(CopyOfInput));
  }

  if(pConv->IsLastFile() && _count>0)
  {
    pConv->ReportNumberConverted(_count); //report the number of chemical objects
    pConv->SetOutFormat(this); //so that number of files is reported as "PNG_files"
  }

  return true;
}

/////////////////////////////////////////////////////////////////
bool PNGFormat::WriteMolecule(OBBase* pOb, OBConversion* pConv)
{
  // Embeds molecules into a png file in CopyOfInput
  ostream& ofs = *pConv->GetOutStream();

  if(!CopyOfInput.empty() && bytesToIEND>0)
  {
    //copy the generated or saved png file, except the IEND chunk, to the output
    ostreambuf_iterator<char> outiter(pConv->GetOutStream()->rdbuf());
    //In Windows the output stream needs to be in binary mode to avoid extra CRs here
    copy(CopyOfInput.begin(), CopyOfInput.begin()+bytesToIEND, outiter);
    origBytesToIEND = bytesToIEND;
    bytesToIEND=0;//to ensure not copied again
  }

  //Convert pOb and write it to a tEXt chunk
  const char* otxt = pConv->IsOption("O", OBConversion::OUTOPTIONS);
  OBConversion conv2;
  conv2.CopyOptions(pConv); //So that can use commandline options in this conversion
  string formatID;
  if(otxt && *otxt)
  {
    formatID = otxt;
    // Format name can have "file" at the end;
    // e.g. "molfile" is written in PNG chunk, but the format is "mol"
    string::size_type pos = formatID.find("file");
    if(pos!=string::npos)
      formatID.erase(pos);
  }
  else //if no param on -xO option, format is input format
    formatID = pConv->GetInFormat()->GetID();
  if(!conv2.SetOutFormat(OBConversion::FindFormat(formatID)))
  {
    obErrorLog.ThrowError("PNG Format","Format not found", obError);
    return false;

  }
  //Write new chunk
  stringstream ss;
  ss.str("");
  const char* pid = pConv->IsOption("y");
  if(pid && strlen(pid)==4)
    ss << pid;
  else
    ss  << "tEXt";
  ss  << formatID << '\0';
  bool ret = conv2.Write(pOb, &ss);
  if(ret)
  {
    unsigned long len = ss.str().size() - 4; //don't count length of tEXt
    Write32(len, ofs);
    ofs << ss.str();

    //ss has type, keyword and data
    uLong crc = crc32(0L, Z_NULL, 0);
    crc       = crc32(crc, (unsigned char*)ss.str().c_str(), ss.str().size());
    Write32(crc, ofs);

  }
  else
    obErrorLog.ThrowError("PNG Format","Failed when converting the molecule", obError);

  if(pConv->IsLast())
  {
    //Write the IEND chunk
    ostreambuf_iterator<char> outiter(pConv->GetOutStream()->rdbuf());
    copy(CopyOfInput.begin()+origBytesToIEND, CopyOfInput.end(), outiter);
    CopyOfInput.clear();

    // If there is an input PNG file, decrement output index to not count it
    if(_hasInputPngFile)
      pConv->SetOutputIndex(pConv->GetOutputIndex()-1);
    // and report as output objects, not "PNG_files"
    pConv->SetOutFormat(formatID.c_str());
  }

  return ret;
}

/*
Reading
PNGFormat extracts chemical information that is embedded in PNG files.
The data can be in chunks of types tEXt, zTXt or, if in any type specified
with the -aa option. If the first letter of the type is 'z' the data is
decompressed.
The keyword in the chunk should be an OpenBabel Format ID, optionally with file added,
e.g. cml, InChI, molfile.
There can be multiple molecules in each chunk, multiple chunks with
chemical info and multiple png files can be read together.

Writing
This embeds chemical information into an existing PNG file.
A PNG file should be the first input file, followed by one or more chemical
files in any format. Each can contain multiple molecules. Each molecule is output
in a separate chunk in a format specified by the -xO option. the default with no
option is InChI. The chunk ID is normally tEXt but can be specified in the -xa option.
For example
  obabel OrigImg.png Firstmol.smi Secondmol.mol2 OutImg.png -xO "cml" -xa "chEm"

It should be possible to embed into png filesusing the API.
The following is simplified and UNTESTED:

OBConversion conv;
conv.SetInAndOutFormats("png","png");
stringstream ss;
ifstream ifs("img.png");
ofstream ofs("img_with_chem.png");
OBMol mol;
conv.Read(&mol, &ifs); //Reads the embedded molecule
...manipulate mol
Note that the content of the PNG file is stored in PNGFormat, so do
not input from another PNG file until this one is written.

//Set the format of the embedded molecule on output
conv.AddOption("O",OBConversion::OUTOPTIONS,"smi");
conv.Write(&mol, ofs);

*/
} //namespace OpenBabel