Developer Documentation

wxPython 2.8 vs 3.0

SOFA may have to support both wxPython 2.8 and 3.0 depending on platform and OS version. The current code base works on both. If 2.8 has to be supported on some platforms we can manage although it would be best to be able to target just 3.0.

Windows - 3.0 works fine and is baked into binary

OSX - 2.8 works fine but can change to 3.0 easily enough (homebrew installs 3 rather than 2.8 so, in some ways, easier). If changing to 3.0 need to rebake wxPython etc into the binary using pyinstaller.

Ubuntu - 2.8 works fine on all versions until Xenial in which case need 3.0. May be best to specify 3.0 in the control file for the debian package. As long as it runs on Trusty OK then we are done. Can make just the one version of the binary. But it all depends on whether the versions of wxPython 3.0 available on Trusty have the latest bug fix (https://bugs.launchpad.net/ubuntu/+source/wxpython3.0/+bug/1388847) backported. The fix enables wxHTML2 to work without having to load environment variable as per LD_PRELOAD = /usr/lib/i386-linux-gnu/libwx_gtk2u_webview-3.0.so.0.2.0 in my case (i386 vs x86_64). While waiting the following works on some systems - export LD_PRELOAD='/usr/lib/i386-linux-gnu/libwx_gtk2u_webview-3.0.so.0.2.0' && sofastats

Action - wait until the bug fix is released. Current version of wxPython3.0 is 3.0.2.0+dfsg-1build1 - we need wxpython3.0 - 3.0.2.0+dfsg-2. Then try to install using a new SOFA debian package I create requiring 3.0. Software installation problem solved when 16.04 upgraded May 22nd (probably from changes introduced earlier).

Note - don't try to run the 2.8 version while loading the LD_PRELOAD command needed as workaround bug in wxHTML2 just fixed. Breaks RadioBoxes .

Weird Locale Issues

If a report is taken from one machine to another, it might stop working if the css files referenced in the report are no longer in the same location e.g. locale changed and now they are in u“/home/jonas/Dokumente/sofastats/css/default.css” rather than u“/home/jonas/Documents/sofastats/css/default.css”. Or perhaps just shifted.

Setting up Own Python Environment

When I developed the SOFA Windows version I manually installed Python 2.7 and a list of libraries I needed e.g. numpy, wxPython (GUI) etc. But I knew that my users would not necessarily have Python installed, or any of those libraries. Or worse, they might have conflicting versions. So I made a special executable called sofastats.exe. It has Python 2.7 baked in as well as all the modules imported into launch_win.py e.g.

import cgi
import codecs
etc

When users run sofastats.exe it imports import2run.py which, in turn imports start.py. start.py is the script which actually opens SOFA.

But you don't actually need sofastats.exe to run SOFA (or your own starter script) if you manually install all the libraries required. Here is a list I have of what I originally used:

NB All Python 2.7

Python 2.7 http://www.python.org/download/releases/2.7.3/
PyLab - no installation required - is there if matplotlib and numpy installed
Matplotlib http://sourceforge.net/projects/matplotlib/files/matplotlib/ NB take mpl-data from latest version and put in 3 SOFA Dev
numpy http://sourceforge.net/projects/numpy/files/NumPy/
wxPython http://www.wxpython.org/download.php
MySQLdb http://www.codegood.com/archives/129
pySQLite http://code.google.com/p/pysqlite/downloads/list
wxmpl http://agni.phys.iit.edu/~kmcivor/wxmpl/ probably need setuptools http://pypi.python.org/pypi/setuptools#files unzip, go into wxmpl folder and find setup.py. This is the folder in the console we need to get to so we can run C:\Python27\python.exe setup.py install (see http://agni.phys.iit.edu/~kmcivor/wxmpl/documentation/README.txt)
pywin32 http://sourceforge.net/projects/pywin32/files/pywin32/Build%20217/
comtypes http://sourceforge.net/projects/comtypes/files/comtypes/
pgdb No separate installation - part of psycopg below
psycopg http://www.stickpeople.com/projects/python/win-psycopg/ (ignore PyGreSQL - no 2.7 version and fading fast)

You can ignore the rest unless you are trying to run the export output plug-in:

wkhtmltopdf http://code.google.com/p/wkhtmltopdf/downloads/list - HTML → PDF
Pdftk http://www.pdflabs.com/docs/install-pdftk/ - Malformed → Standard PDF
pyPdf http://pybrary.net/pyPdf/ - Count PDF pages ready for making PNGs
GhostScript http://downloads.ghostscript.com/public/binaries/ - PDF → PNG (use GS directly because version of ImageMagick inside exe can't find GS inside exe). And it can't do it by itself (ImageMagick cannot handle PostScript and PDF files itself and by its own. For this it uses a third party software called Ghostscript as a 'delegate'. http://stackoverflow.com/questions/6591011/imagemagick-errors-convert-pdf-to-images)
ImageMagick http://www.imagemagick.org/script/binary-releases.php/ - PNG → trimmed PNG of correct dpi
PythonMagick http://www.lfd.uci.edu/~gohlke/pythonlibs/ - PNG → trimmed PNG of correct dpi (Python wrapper)

And we can talk further if you want to work with MS Access and MS SQL Server re: DAO.

Re: dao file, it is because we can't dynamically get it from the frozen (or some similar issue) but it works if we generate it, grab it as a module, and import it. Start with installing pywin. Then go to site-packages\win32com\Pythonwin.exe and run COM Makepy Utility for DAO 3.6. Then go to win32com\gen_py folder and copy content of one of the py files and save as the module to import e.g. dao36_from_gen_py_after_makepy.py.

So you are welcome to install all the libraries and then run start.py. I can recommend eclipse with the PyDev plug-in as a good debugging environment.

Running Your Own Script

On Windows, SOFA comes with its own binary executable which has Python, numpy, matplotlib etc baked in. That way it can't clash with anything else installed on the user's computer and it is guaranteed to work. The bridging script between the fixed binary and the flexible scripts is called import2run.py. Currently it imports and then runs the start.py module. You could get it to do something else such as run your own script (keep it in the same folder as all the other SOFA scripts such as output.py, start.py etc).

Scripting

Overall

When you run a report:

config settings are read from the GUI e.g. variable 1 is age, variable 2 is height, chart type is scatterplot etc.
a script is created in the local _internal folder called script.py based on these config settings
the script is run which creates an HTML output file is created called sofa_use_only_report.htm

The HTML in the file is used by SOFA in as many as 3 ways:

GUI display (always)
When adding to an existing html report (optional)
When turning most recent output displayed in GUI into a PDF (optional)

The following two settings are used whenever there are linked images referred to in the HTML e.g. png charts:

add_to_report = True
report_name = u"/home/g/Documents/sofastats/reports/default_report.htm"

They are redundant in some scripts but I am happy to leave them there in those instances in case I ever want to add linked pngs at some future point. Anyway, here is how they are used when there are images to be made and linked:

From charting_pylab.py:

  """
  ...
  If adding to report, save image to a subfolder in reports named after the
      report. Return a relative image source. Make subfolder if not present.
      Use image name guaranteed not to collide. Count items in subfolder and
      use index as part of name.
  If not adding to report, save image to internal folder, and return absolute
      image source.  Remember to alternate sets of names so always the
      freshest image showing in html (without having to reload etc).
  """

So what should you do. Treat script.py as the starting point for making your own script. Set add_to_report to True and report_name to whatever you want the output file to be named. And edit the following as required so it uses the report_name instead of sofa_use_only_report.htm:

fil = codecs.open(u"/home/g/Documents/sofastats/reports/sofa_use_only_report.htm", "w", "utf-8")
css_fils=[u"/home/g/Documents/sofastats/css/default.css"]
fil.write(output.get_html_hdr("Report(s)", css_fils, has_dojo=False, new_js_n_charts=None, default_if_prob=True))
...
fil.write(anova_output) # or whatever the output is
...
fil.write(output.get_html_ftr())
fil.close()

Basically you are free to use Python to grab the text html from the script and put the content into a file. E.g.

with codecs.open(report_name, "r", encoding="utf-8") as f:
    f.write(output_made_by_script) # get that output however you like
    f.close()

For an example of how to get output:

anova_output = stats_output.anova_output(samples, F, p, dics, sswn, dfwn,
          mean_squ_wn, ssbn, dfbn, mean_squ_bn, label_a, label_b, label_avg,
          add_to_report, report_name,
          css_fil=u"/home/g/Documents/sofastats/css/default.css",
          css_idx=0, dp=dp, level=mg.OUTPUT_RESULTS_ONLY,
          page_break_after=False)

Appending to a Report

Appending to an existing report in SOFA is complicated by the fact that the report has to be a complete, functioning piece of HTML at every point, as does the snippet you're trying to add. Which is why headers are stripped off etc. And if the report has multiple styles in it e.g. default/grey spirals/lucid spirals/pebbles etc the header has to cover all possibilities and keep the styles distinct. The existing code functions quite well but it could do with a refactoring one day to make it more coherent.

Exporting Images/PDFs

The two main functions you may need in export_output are:

export2imgs() and possibly export2pdf() depending on what you are trying to generate.

If you are interested, here is some background information on what is going on behind the scenes:

There are two types of charts in SOFA - actual PNGs generated by matplotlib. And dynamic Javascript/SVG images made using the DOJO library that can only be viewed in a web browser. Exporting the PNGs doesn't involve any work as the PNGs are already made. But exporting the dynamic output as images is quite a trick.

Only a fully-featured web rendering engine can correctly handle Javascript & SVG to turn the DOJO charts into something you can see. The wkhtmltopdf library is used to turn the HTML into a PDF. There is a bit of work getting the correct HTML out so that it only include the image you want and you will see from the internal documentation of the export_output script there were lots and lots of small technical issues to resolve. E.g.

  PDFs made by wkhtmltopdf might be systematically malformed from a strict
      point of view (ghostscript and Adobe might complain) so running it
      through pdftk will fix it.

I am leaving nearly all of that detail out of this description to simplify it.

Anyway, we make a large PDF with lots of wasted whitespace around it to make sure we don't leave anything out. Then we want to auto-crop it. This is a very expensive operation when the resolution is high. So I did it in three steps.

- Make a temporary low-resolution image and autocrop that. Then get its dimensions. - Then make a high-res image and crop it to the dimensions gathered plus a small extra margin to account for “rounding errors” with the low resolution dimension results. - Then autocrop the pre-cropped result. Even though that is an expensive operation it only has to handle the last little bit of auto-cropping and finding the edges. So it is fast enough.

OK - now we have a PDF of the desired resolution and it is cropped to the right size.

Now we have to convert the PDF to a PNG. The pythonmagick library is used for this step.

VDT Files

A vdt file is simply the following:

A text file with a vdts extension containing four Python variables:

var_labels={"my_var_name": "my_var_label", ...}

var_notes={"my_var_name": "text about my var", ...}

var_types={"my_var_name": "my_var_type_str"}
# available variable types are defined in my_globals.py (around line 378 at present) as per:
# VAR_TYPE_CAT = _("Nominal (names only)") i.e. "Nominal (names only)"
# VAR_TYPE_ORD = _("Ordinal (rank only)") i.e. "Ordinal (rank only)"
# VAR_TYPE_QUANT = _("Quantity (is an amount)") i.e. "Quantity (is an amount)"

val_dics={"my_var_name": my_var_val_dic, ...}
# where my_var_val_dic is a dictionary mapping values to value labels e.g.
 u"agegroup": {1.0: u"< 20",
               2.0: u"20-29",
               3.0: u"30-39",
               4.0: u"40-64",
               5.0: u"65+"},

Writing to a file in Python is very easy:

import codecs
with codecs.open('my_file_path_here', 'w', encoding='utf-8') as f:
    f.write(var_labels_dic)
    f.write(var_notes_dic)
    f.write(var_types_dic)
    f.write(val_dics_dic)
    f.close()

SOFA Statistics

Table of Contents

Developer Documentation

wxPython 2.8 vs 3.0

Weird Locale Issues

Setting up Own Python Environment

Running Your Own Script

Scripting

Overall

Appending to a Report

Exporting Images/PDFs

VDT Files

SOFA Statistics

User Tools

Site Tools

Table of Contents

Developer Documentation

wxPython 2.8 vs 3.0

Weird Locale Issues

Setting up Own Python Environment

Running Your Own Script

Scripting

Overall

Appending to a Report

Exporting Images/PDFs

VDT Files

Page Tools