Dashboard > DHIS-2 > ... > DHIS 2 development > Translation Statistics
  DHIS-2 Log In | Sign Up   View a printable version of the current page.  
  Translation Statistics
Added by Øystein Skadsem, last edited by margrsto on Feb 26, 2007  (view change)
Labels: 
(None)

Translation Statistics

The translation statistics plugin is a tool for finding missing translation strings in the DHIS 2 project.

Overview

Introduction

The translation statistics plugin is a tool for finding missing translation strings in the DHIS 2 project. It has been written as a Maven 2 plugin, and as such can be included in the build process. What it does is fairly straight forward:

  1. Finds all the translation files in the project (relative to the current path)
  2. Creates a data structure containing namespaces, locales and translations
  3. Creates coverage reports for each locale in a module, and lists missing translation strings for each locale.
  4. Creates a site HTML report (if run via the site-plugin)

Here is a like to a Use Case diagram that illustrates the functions of the plug-in. dhis-i18n-Use-Case-diagram.png

It is worth noticing that at the time of writing, site has not been properly configured in DHIS 2. This means that the plugin must be run manually. This is not a major drawback, however, as we see that a natural way of working would be to run the plugin for each module during development.

Design

We chose to create the plugin by seperating it into three parts, and doing one part each. First part is the gathering of the files containing translations, then analyzing these files, and finally reporting the results. The plugin is defined as a maven report plugin (see http://docs.codehaus.org/display/MAVENUSER/Write+your+own+report+plugin for a quick introduction to report-plugins, while http://maven.apache.org/guides/plugin/guide-java-plugin-development.html gives a nice intro to maven plugin-development).

The main plugin class is GenerateI18nReport. Depending on how the plugin is called, one of two methods, execute() (if called manually) and executeReport() (if called by site) is called. These methods have more or less the same work-pattern, in that they:

  • Create a FileProvider-object
  • Creates a List of files containing translations by calling Fileprovider.getFilenames()
  • Creates a TranslationStatisticsEngine-object
  • Creates a TranslationStatistics-object by calling analyze("List")
  • Creates a ReportGenerator-object
  • Calls ReportGenerator.generateReports() or ReportGenerator.generateWebReports() depending on how the plugin was called

Thus the list of files is creates, analyzed and finally reported. Look at the following two links of UML class and sequence diagram for more detail on the design and internals of the plug-in.

Parameter Configuration

There are also two parameters associated with the plug-in. namely ignoreStr and matchedStr. The first one is the java Regular expression string that match?s the file names you wonted to ignore and the second one is the files you wonted to find. However these parameter er given a default value if the user doesn?t providing. The default value er specific to DHIS 2 file naming convention ass following.

  • matchedStr = ".?i18n.?"
  • ignoreStr = ".*?module
    ."

The user can configuring the parameter values for the plug-in by editing the Maven 2 project file pom.xml. the parameters are configured by specifying a configuration element where the child elements of the configuration element are each of the parameters name as part of defining the plugin in the project as following.

<build>
  <plugins>
    <plugin>
      <groupId>org.hisp.dhis</groupId>
      <artifactId> dhis-i18n-reportgenerator</artifactId>
      <configuration>
        <matchedStr>.*?i18n.*?</matchedStr>
        <ignoreStr>.*?module\\.</ignoreStr>
      </configuration>
    </plugin>
  </plugins>
</build>

This following gives a nice intro to maven plug-in parameter configuration. http://maven.apache.org/guides/mini/guide-configuring-plugins.html

Configuration for running with Site

One final thing is that for site to pick up the plugin and be able to produce a web-report, the following needs to be added to the root pom.xml of the project:

<reporting>
  <plugins>
     <plugin>
       <groupId>org.hisp.dhis</groupId>
       <artifactId>dhis-i18n-reportgenerator</artifactId>
     </plugin>
  </plugins>
</reporting>

At the moment site is not set up for the project, so these lines have not been added, as it would serve no purpose. Running site will still generate the report with just these lines added though, no further setup is needed if all that is wanted is simple HTML-report of the current translation-coverage.

Group members and task assignments

The plugin was developed as a student project in INF5750, in the autumn of 2006. The group members were Øystein Skadsem, Naser B. Seid and Harald Holone. The initial design and planning was a joint activity, and the rest of the work was devided amongst us. Øystein looked at the maven plugin integration and report generation. Naser developed the functionality for identifying translation files, as well as bringing the other parts of our work together. Harald created the datastructure and logic for the statistics.

Terminology

Languages are referred to as locales, to fit in with the terminology used in the J2SE API. A namespace is defined as the full path (relative to the project root) to a specific translation file, not including the locale part of the file name. An example: /an/example/path/module_i18n_en_GB.properties is assigned the namespace /an/example/path/module_i18n. This means that multiple locales can exist within one namespace, and there can be translation files for more than one namespace in the same directory. It might not be a bad idea to use the pom.xml file of the project to assign more familiar names to the namespaces, but we have not pursued this idea. Further, there are plans to reorganize the translation files in DHIS 2. As far as we understand, this entails a renaming of the translation files to include the module name in the file name. This change does not affect the terminology or datastructure used, but it might change how we construct the namespace values.

The term translation string is used loosely, and refers to the key in the key/value pair representing one translated word or phrase.

Interfaces

Getting filenames

The file collecting of file names part consists of one file filtering class, one interface and its implementation. The interface is kept as simple as it?s possible. It provides one public method, getFileNames().this method traverses the file system and returns a list of full path to all files containing translations in the project.

Creating statistics

The statistics part of the translation statistics plugin consists of two interfaces. First, TranslationStatisticsEngine defines the method analyze(...), which is used to start the analysis of a set of translation files.

A convenience method getTranslationStatistics() is offered for getting to the produced translation statistics. The other interface is TranslationStatistics, which defines a set of methods for getting to the results of the analysis. The methods getLocales() and getNamespaces() returns all locales and all namespaces found in the translation files, respectively. Further, getTranslationPercentage(...) returns the translation percentage for a given locale in a given namespace, and getMissingTranslations(...) will return the keys that have not been translated for a locale in a namespace. TranslationStatistics also defines addNameSpaceLocaleTranslations(...), which is used for adding a map of translations for a locale in a namespace.

The Reports

The report-part of the plugin has the interface ReportGenerator, which has exactly two methods, generateReports() and generateWebReports(). These are intended to be called by different usage of the plugin.

generateReports() is called by the execute()-method in GenerateI18Report when the user uses this plugin from the commandline, like this:

> mvn org.hisp.dhis:dhis-i18n-reportgenerator:i18n-report

generateWebReports() is called by the executeReport()-method, which is again called by the site-plugin whenever the command mvn site is issued.

Implementation

Getting filenames

The implementation of the filename-finding is fairly straightforward, but it has the added feature that you can provide java regular expression strings for filenames that should be ignored, so you can for example avoid looking for translations in deprecated parts of the project or translation files that you already now is correct.

There are two default implantation classes, one implementation of the FileProvider interface and the other is a local implementation of the java FileFilterinterface for filtering files.

The DefaultFileProvider class makes use of the I18nFileFilter class that extends the java FileFilter interface. This class has one public method boolean accept (File file) and a constractur that tacks two arguments I18nFileFilter(String acceptstr, String ignorstr). These two regular expression string are used to make a chose whether to accept the given file or not, when the accept method is called. These two Strings are provided to the DefaultFileProvider class by the main class GenerateI18nReport during creation of this class.

The DefaultFileProvider has a private method << protected List<String>visetAllDirs(File dir) >> two help traverse the file system. This method is called by the public method getFilenames() method starting at the current root directory equal to dir. This method traverses recursively and calling File.fileList(In18FileFilte filter) for each directory and collecting all the files full path if accept have accepted them and they are not directory in un ArrayList. It then returns the array list to getFilenames(), which the returns it to the calling object.

Creating statistics

Two default implementations are provided with the plug-in, DefaultTranslationStatisticsEngine and DefaultTranslationStatistics. The default engine implementation works with translations in property files, like the ones used in DHIS. All the filenames provided by the call to analyze(...) are dissected to find the namespace and locale for each file. Each property file is then read by standard J2SE means, and the key/value pairs are added to a data structure of namespaces, locales and translations.

The default data structure implementation is a map of namespaces which in turn has a map of locales, which finally contains a map of key/value pairs for the translations. No efforts has been made to optimize the speed or size of the data structure, neither have rigourous tests of speed or size been performed. When the plugin is run manually, the typical execution time for one module seems to be 2-3 seconds.

The Reports

The plugin currently generates reports in two different ways, as outlined above. Common for them both is how they go through the provided statistics. The four metods provided by the TranslationStatistics-api – getNamespaces(), getLocales(), getTranslationPercentage(namespace, locale) and getMissingStrings(namespace, locale) – are all that is needed. The report-methods simply go through all namespaces, and for each of these go through all locales, listing out first the translation percentage, and then the list of untranslated strings for the locale, if any. The difference between the two is in how they report this to the user, which is outlined below.

To be noted is that we have emphasised on useability from a developers point of view, meaning little extra flash, just organized plain text. The only people using this plugin will be people who translate for the project and are interested in quickly getting info on where translations are needed and then fixing it or complaining about it. As you can see from the examples given below, it's now very easy to simply copy-paste the directory where translations are missing, and then copy-pasting in the missing strings in the correct locale-file.

To note is that everything is relative to the current directory, meaning that if you e.g. want to check how well dhis-web-birtviewer is translated (and not everything else), simply go to trunk/dhis-2/dhis-web/dhis-web-birtviewer/ and generate the reports from there.

The way namespaces are constructed, they tend to be long, and not very pretty when displayed in a report. The method getShortName(...) in DefaultReportGenerator provides a simple way to get a short form of the namespace which is more suitable for presentation.

Maven-log report

The generateReport-method mentioned above will use the TranslationStatstics-object provided by the TranslationStatisticsEngine to produce a summary of translations for the different modules in the project, and print it to the maven-log. A small sample of how this looks is provided here:

[INFO] I18n-report: -------------------------------
[INFO] I18n-report: Module:
[INFO] I18n-report: dhis-web-birtviewer
[INFO] I18n-report: (Path: /ifi/fenris/h34/oysts/fag/inf/inf5750/hisp/scm/trunk/dhis-2/dhis-web/dhis-web-birtviewer/src/main/resources/org/hisp/dhis/bv/ )
[INFO] I18n-report:
[INFO] I18n-report: vi_VN: 100%
[INFO] I18n-report: no_NO: 100%
[INFO] I18n-report: en_GB: 100%
[INFO] I18n-report: default: 100%
[INFO] I18n-report: -------------------------------
[INFO] I18n-report: -------------------------------
[INFO] I18n-report: Module:
[INFO] I18n-report: dhis-web-portal
[INFO] I18n-report: (Path: /ifi/fenris/h34/oysts/fag/inf/inf5750/hisp/scm/trunk/dhis-2/dhis-web/dhis-web-portal/src/main/resources/org/hisp/dhis/wp/ )
[INFO] I18n-report:
[INFO] I18n-report: vi_VN: 60%
[INFO] I18n-report: Strings lacking translation: 
[INFO] I18n-report: code
[INFO] I18n-report: alternativename
[INFO] I18n-report: shortname
[INFO] I18n-report: name
[INFO] I18n-report:
[INFO] I18n-report: no_NO: 100%
[INFO] I18n-report: en_GB: 100%
[INFO] I18n-report: -------------------------------

etc.
Each line written to the log is prefixed with "I18n-report:" to simplify picking out just the info relevant to the translations by grep or similar utilities. I.e.

> mvn org.hisp.dhis:dhis-i18n-reportgenerator:i18n-report |grep I18n-report:

should produce a lot less output, giving quicker access to the wanted info.

The HTML-report and the Sink API

The implementation of the ReportGenerator api, DefaultReportGenerator, uses the Doxia Sink API to generate a web-page. Doxia itself is fairly lacking in documentation, but it's fairly easy to understand by reading the code, and let's you easily create the pages for site, by simple methods like:

sink.body();
sink.text("some text in the body of the html-page");
sink.body_();

Which simply defines a body of the generated html-page, with the given text in it. This text can of course be formatted with other sink-methods, like:

sink.paragraph() - sink.paragraph_() – (creates a block like <p></p> HTML-tags does)
sink.lineBreak() – (</br> HTML-tag)
sink.bold() - sink.bold_() – (all sink.text() between these will be bold)
sink.italic() - sink.italic_() – (all sink.text() between these will be italic...)
sink.list() - sink.list_() – (for creating an unnumbered list)
sink.listItem() – (for adding items to this list)

These are the methods we have used in our plugin, other things you can do with Sink you can find here:
http://svn.apache.org/viewvc/maven/doxia/tags/doxia-1.0-alpha-8/doxia-sink-api/src/main/java/org/apache/maven/doxia/sink/Sink.java?view=markup
(This is the link to the current subversion-version of the Sink API, and the methods have no comments associated, but the names should give you decent hints as to what they do)

Unfortunately the file generated does not validate as XHTML 1.0 Transitional when checked with http://validator.w3.org/, but it does look exactly alike in all browser tested (Mozilla Firefox, Internet Explorer, Opera, Lynx, w3m, Links and Konqueror), so this shouldn't pose any problems. An example of how the report looks can be seen here: http://folk.uio.no/oysts/inf5750/i18n-report.html

User Documentation

The above documentation should be more than enough to explain how the plugin works as well as how to use it, but here's the short version for the people who don't care how it works, only how it can help them with translations:

The first time you want to use the plugin, you will need to compile it. Assuming you have the hisp-source in ~/scm/, simply go to ~/scm/tools/dhis-i18n-report/ and run

mvn install
This will install the plugin in your local maven-repository.

For checking a small module quickly for translations, simply go to the root of the module you are interested in, and type:

> mvn org.hisp.dhis:dhis-i18n-reportgenerator:i18n-report |grep I18n-report:

This will provide you with a nice clean list of the translation-status for the module.

For generating a web-report, go to the root of the project (e.g. scm/trunk/dhis-2/), make sure the pom.xml contains the necessary lines provided above and type in:

> mvn site

After a small while a web-site for the entire proect will be generated in target/site/, and pointing your browser to target/site/i18n-report.html will give you the web-report of the translation status.

Further work

  • Identify different translations for the same key
  • Identify different keys for the same translation string

Site powered by a free Open Source Project / Non-profit License (more) of Confluence - the Enterprise wiki.
Learn more or evaluate Confluence for your organisation.
Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.6 Build:#812 Aug 06, 2007) - Bug/feature request - Contact Administrators