Dashboard > Report and analysis modules > Import Export Module EpiInfo SPSS
  Report and analysis modules Log In | Sign Up   View a printable version of the current page.  
  Import Export Module EpiInfo SPSS
Added by mariusgf2, last edited by mariusgf2 on May 16, 2007  (view change)
Labels: 

INF 5750 - DHIS 2 ? EXPORT GROUP REPORT

Export to EpiInfo and SPSS Group

autumn 2006

geoffrer — Geoffrey Rekier
oyvinrot — Øyvind Rotnes
mariusgf — Marius G. Fredriksen

Group assignment

It is desireable to get data from DHIS 2, via a file export, and into various systems for analysis. The central theme of these projects is to find out the types of data formats these applications can import, if and how the DHIS 2 data can be formatted accordingly and write code to take care of the actual export. This project requires that the students understand the DHIS 2 import/export solution and the DHIS 2 web solution. It is expected that we contribute to an overall design of the import/export system and coordinate common aspects of their solutions.
Possible deliveries:

  • A report on the import data formats of the individual applications
  • A report on the DHIS 2 data model and how this can be related to the format above.
  • A working implementation of an exporter for the applications
  • A web interface for doing the export

Given information

SPSS and EpiInfo are tools for data analysis. EpiInfo is tailored to epidemiologists, whereas SPSS is a general statistical package. Many public health analysys use these programs to get an overview of the health situation in the area.

Description of the project work

According to the assignment, all members of the export group were to coordinate common aspects of the different export solutions (idsp, SPSS and EpiInfo). We therefore started our group meetings in plenum with all export-group people attending group meetings in the cafeteria. At the first meeting we generally divided the export team in three groups, each having the responsibility of investigating the (three) different applications in question (idsp, SPSS and EpiInfo) and their preferred export formats. The groups were originally divided as follows:

  • IDSP: stianast, okhustad
  • SPSS: oyvinrot, mortkard, ridaa
  • EpiInfo: geoffrer, mariusgf

It was decided upon persuasive decision that stianast (Stian Aleksander Strandli) was to be our group leader for the export team, and that we would gradually coordinate our efforts as we dug into the material through these Monday meetings, emails and work in the lab on Tuesdays.

We started by checking out the DHIS 2 source code from the source code repository, and got the application up and running so we could get ourselves aquainted with the existing code and the structure of the application.
Since the DHIS 2 source code, basically lacks any form of code documentation, besides the code itself, we spent the first weeks setting up, trying out, and experimenting with the application, specifically the export related functionality and code.

When we felt we had a basic understanding of the overall code-structure, we started exploring our EpiInfo and a trial version of SPSS.

On Knuts? recommendation we also sent email to Mr. Yeshambel Wudu, and some SPSS users to get feedback on preferred export solutions by actual users of the application.

SPSS supports a wide range of different formats. EpiInfo also supports many different ones but most are old formats like Excel 97. From these findings we concluded that the most sensible format would be a CSV format. We took contact with EpiInfo support by email and they confirmed the idea. We also contacted SPSS, which gave us access to their knowledge base. By exploring their knowledge base it seemed obvious that most other developers with similar tasks in mind had chosen a CSV format.

At this point we received a cryptic reply from Mr. Yeshambel Wudu that explained something about exporting from and importing to EpiInfo, but this was not really helping us, so we replied by asking him to detail the specifics of the task at hand.

We then went back to the DHIS 2 source code only to find that a CSV export already existed. It therefore made sense to try and adapt the existing CSV solution to be suitable for SPSS and EpiInfo.

There was a problem importing the CSV exported file in EpiInfo due to the comma delimiter not being acceptable since the comma value was used for separating numbers (i.e. 1,2).
In SPSS you can choose among delimiters before importing, so we decided to change the delimiter to a tabulation which was acceptable in both cases. Since the line breaker semi-colon, at the end of each line, was included as a value in EpiInfo, we needed to remove it. One last thing we needed to adapt, was the header line of the CSV format which needed to explain what the values of the following lines represented.

At this stage we discovered that Hans (hannsto) was implementing a general CSV exporter, which is generally a better idea, and after a discussion with him we decided to try to adapt to his model, creating an ONGL expression designed to produce a compatible format.

We set up our milestones for the remainder of the project as part of assignment 4 aimed at the described CSV export, only to shortly after receive replies from Mr Wudu and others specifying the dbf or dBASE format as a wanted export format from DHIS 2 to EpiInfo and SPSS.

Our enquiry for specifics related to preferred export formats was basically as follows:

  1. Which file format is preferred for export from DHIS 2 - Dbf or CSV?
  2. Would you like to choose some specific data or export all the data?
  3. Regarding the hierarchical organisation tree ? is the whole tree required or is it sufficient with the organisation having the data? (the last child in the tree having the data)?

Our reply gave us the following:

  1. Dbase(.dbf)
  2. All the data
  3. The whole tree

These replies made us change our strategy, to find out the specifics for the dBASE/.dbf (hereby referred to as dbf) file format. This also meant that we had to implement our own implementations of an exporter since the common denominators between this and other formats were too few. Both our applications (EpiInfo and SPSS) supports the dbf format and we therefore decided to coordinate these efforts.

Specific goals

Upon receiving these last replies our milestones were:

Milestone /Goal Date

  • Small changes in the GUI for it to use the new Java class 31.10
  • Create a first ?prototype? of a dbf exporter 07.11
  • Finish the exporter 21.11
  • Finish the GUI 28.11

The group members and division of labour:

From a situation where we previously were working mainly as a unit in the labs on Tuesdays, reflecting the decisions made by those present at the Monday sessions - we decided to divide the remaining tasks.
In order to get things done in time, and reflecting on our capabilities we decided to divide the responsibilities of tasks related to finishing a dbf file exporter and achieving our new goals in the following manner:

  • Service implementation of dbf exporter — geoffrer
  • Implementation of dbf exporter and testing — oyvinrot
  • GUI / written documentation — mariusgf

This meant coordinating efforts in a more FOSS true manner - interacting with the code repository ? with, and without luck at times. We continued our Monday meetings for weekly updates, and the lab on Tuesdays for coordination of progress and exchanging experiences. Communicating by email and instant messaging services (GAIM) to keep each other up to date at odd hours, share our findings, and avoid conflicts has been a great benefit.

Our task and how it fits within HISP / DHIS

Our task of exporting DHIS data to a specific file format, dbf in this case, fits into the District Health Information System v. 2 in the sense that it implements support for yet another export format. This is part of expanding the applications features as to grant support for analyzing data in other independent applications, storing and handling data in different manners.

This will benefit the users of DHIS who wish to analyse the data provided by DHIS in a program of their own choice, EpiInfo or SPSS in this case. We hope this in the long run will benefit the people who are the source to the data utilized by the HISP project.

What was achieved

As our assignment has evolved throughout the development process, we have in the end achieved all our goals that we set out to meet in our initial milestone defininiton of assignment 4 ~ and as defined by the assignment-text.

We have discovered the different options for import to the two applications (EpiInfo and SPSS) and made a decision to support the most demanded format for the users of these programs, namely the dbf file format.

We have analyzed the DHIS 2 data model and how it relates to implementing an export option for this format.

We have developed a working imlpementation of an exporter and developed a related web interface for doing the export.

Detailed documentation

Details on the dBase (.dbf) file format

The dBase format was the first widely used database management system or DBMS for microcomputers for a range of operating systems, and under DOS it became one of the best-selling software titles for a number of years. dBases?s underlying file format, the .dbf file, is widely used in many applications needing a simple format to store structured data.

A dBase file consists of a header record and data records. The header record defines the structure of the file and contains any other information related to the file. The header record starts at file position zero. Data records follow the header, in consecutive bytes, and contain the actual text of the fields.

The first data record in the dBase file starts at the position indicated in bytes 8 to 9 of the header record. The header also contains when the file was last updated, number of records in the file, data type and length (in bytes) of each field( data record). The length of a record, in bytes, is determined by summing the defined lengths of all fields. Integers in table files are stored with the least significant byte first. The complete dBase structure can be seen at http://www.whitetown.com/dbf-format/dbf.php.

Amongst the greatest challenges related to creating a functional exporter for this format was getting the header fields correct. According to the dbf file format we had to define the different fields by type and size, to describe the data contained in the remainder of the exported document.

By doing some extensive ?googling? (as in Google) and looking at similar solutions on the Internet, we figured out a way to implement this header structure according to the dbf requirements. We had to adapt the relevant code to suite our needs.

While taking this approach on the service layer, we reused the existing GUI to keep a consistent look-and-feel for the users of the application. We modified the GUI to suit our needs and hooked this up with our own service implementation of the exporter for EpiInfo and SPSS.

How the code works

Users perspective

From a users perspective the dbf file exporter is selected from the export menu which reveals itself in the bottom left (below CSV export) when selecting data export from the top main menu. From here the GUI should be intuitive to those already familiar with DHIS 2?s existing CSV export. A name for the export file is provided, the export button is selected and a file is exported to the destination of the users choice (if the filename does not have a ?.dbf? suffix, this is concatenated to ensure the correct output). From here the resulting data (dbf file) can be imported to EpiInfo, SPSS or other programs requiring the dbf file format.

Here is a short presentation of how to export a file with the data values and a file with the indicators and then import them in EpiInfo and SPSS:
http://heim.ifi.uio.no/~geoffrer/presentation.wmv

Programmers perspective

Our exporter module exists of five classes; DbaseDataExporter, DbaseIndicatorExporter, DbaseFileWriter, DbaseFileHeader and DbfFileException.
Each of them is explained in the context which they are used in our module. It is also advised to familiarize with the basics of the dBase(.dbf) file format.

DbaseFileHeader

While creating a dBase data file you will have to deal with two aspects:

  1. define the fields
  2. populate the data.

This class represents the header of the dBase file, and is to be used as an object for representing the file header. The header defines a descriptor for the fields. It has the necessary attributes to hold the values that are required to make a dbase file header according to the dbase file-format.

The variables that are to be constant and not changed by developers are initialized as static final with the required value. The class provides the method void addColumn( String inFieldName, char inFieldType, int inFieldLength, int inDecimalCount ) for defining each field.

A field consits of a field name, field data type, field length and number of decimals. These attributes define the format for the column in the table. Each columns field name will make out a part of the heading of the file.

The data types in the header can be one of the types in the table below:

XBase Type XBase Symbol Java Type
Character C java.lang.String
Numeric N java.lang.Double
Double F lava.lang.Double
Logical L java.lang.Boolean
Date D java.util.Date

The dBaseformat also requires that the header contains the number of records in the file. The method void setNumrecords(int number) enables you to set the number of records. Further the class has methods for getting information stored in the header.

DbaseFileWriter

A DbaseFileWriter is used for creating a dBase file. Create a DbaseFileWriter object by calling its constructor with the DbaseFileHeader (explained above) and a channel (WritableByteChannel) to the file we want to write to.

This class has the method void write( Object[] record ) which takes an array of objects and writes it to the file. Data in the records in the array must match the fields defined in the dBase file header. The number of records must also be the same as the number of records defined when making the dBase file header.

DbaseDataExporter

This class collects all the data values stored in the database and writes them to a file. The method void export( Collection<Specification> specifications, Writer writer )
The writer will typically point to a file. The collection of Specification?s is a list of constraints on a type of objects. It defines the kind of data to be put in the export file.
First the export method calls the method DbaseFileHeader writeHeader() which returns an object of type DbaseFileHeader.

The header is defined by a comma separated string, String header. The string will typically look like ?organisationUnitCode,organisationUnitName,periodTypeName,startDate? This string is used for the field names and data types in the dBase file header. The header is then used to create a DbaseFileWriter. After this, every data value in the database is made to objects and passed to the DbaseFileWriter. One object for every line in the file.

DbaseIndicatorExporter

This class collects all the indicator values stored in the database and writes them to a file. The method void export( Collection<Specification> specifications, Writer writer). The writer will typically point to a file. The collection of Specification?s is a list of constraints on a type of objects. It defines the kind of data to put in the export file.

First the export method calls the method DbaseFileHeader writeHeader() which writes the header. The header is defined by a comma separated string, String header. The string will typically look like ?organisationUnitCode, organisationUnitName,indicatorName,periodTypeName,periodName,value?. This string is used for the field names and data values in the dBase file header. The header is then used to create a DbaseFileWriter. After this, every indicator value in the database is made to objects and passed to the DbaseFileWriter. One object for every line in the file.

DbfFileException

An Exception class. To be thrown when error relating to the dbase file occurs.

Structure

The dbf export package includes data and indicators export; and the correlation between the applied frameworks is best explored discovering the configuration files (beans.xml, pom.xml, xwork.xml ..) and taking an empiric approach to the source code, which is thoroughly documented with javadoc. As an overview of the overall structure of the entities involved, the following diagram was made to give a jump-start overview.

DBFExport_big.gif

Related files:

dhis-service-import-export-default (branches)

Package org.hisp.dhis.service.importexport.dbf
se detailed information above regarding the classes.
Resources: beans.xml
bean : org.hisp.dhis.service.importexport.dbf.DBFExport beans.xml
bean : org.hisp.dhis.service.importexport.dbf.DBFIndicatorExport beans.xml
Tests: DbaseExportTest ? JUnit tests

dhis-web-importexport (branches)

Package org.hisp.dhis.importexport.action
Webwork actions (defined in xwork.xml)
DBFExportFormAction.java
DBFExportAction.java
DBFExportIndicatorsAction.java
DownloadAction.java

Package org.hisp.dhis.importexport
i18n modules ? translations
resources: beans.xml, xwork.xml, velocity.properties ? (persistent bean, ww actions config, and velocity settings)
bean: org.hisp.dhis.importexport.action.DBFExportAction beans.xml
bean: org.hisp.dhis.importexport.action.DBFExportIndicatorsAction beans.xml
dhis-web-import-export
dbfExport.vm (velocity)
dhis-web-import-export/javascripts
importexport.js ? javascript/ajax (functions: dbfExportIndicators(), dbfExport())

Regarding the details of our java code we would like to emphasize our javadoc as the primary source for programmers who wish to get a visual quick overview of the classes, methods and parameters involved, perhaps related to future modifications of our code. The javadoc is available online here: http://heim.ifi.uio.no/~geoffrer/inf5750/Javadoc/index.html and is attached as a separate document javadoc_dhis_dbf_exporter.rar to this written documentation, to elaborate on the usage of the existing code.

Final words

We would like to thank our group teachers and lecturers for making an interesting curriculum for this course. All group members have acquired new skills we all feel are relevant to our future as software developers, and are happy about having contributed to the HISP / DHIS project.

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.6 Build:#812 Aug 06, 2007) - Bug/feature request - Contact Administrators