Developing mission-critical software with Python: a success story

With input from client experts and industry partners, Integrated Informatics Inc. developed the Geodetics Toolkit, a commercial product. With this toolkit, their users can load, analyze, map, audit, and generate reports for seismic and well surveys. The tools in the toolbox take text and binary data formatted according to both custom and industry standards (i.e. UKOOA, SEG-Y, SEGP1, and so on) and load it into a schema of feature classes and tables.

Below, Integrated Informatics Inc. discusses reasons for choosing Python as the language for developing and delivering their commercial product. They discuss how Python is used to their advantage in the software development environment and show examples of how Python is much more than just a scripting language. Finally, they discuss the use of open source libraries in their product.

Swimming in a Sea of Surveys

Seismic and Well survey data are integral parts of any oil and gas company's database and are used in a variety of manners such as visualization, resource discovery, and inventory. The oil and gas industry has a long and rich history of computer utilization and digital data storage and over this history many different standards for survey storage have arisen and companies have chosen the standard that suits them best or even developed their own. The result is a proliferation of survey files that appear similar but are often different in very subtle ways.

Since survey information has existed for a long time it is not surprising that there are many tools available for processing the data. The problem is that such tools are not typically a part of a larger software solution. They are built for viewing and doing some processing but they often do not work well with the rest of the enterprise database. This is not ideal as the full benefit of these data can only be realized when they are overlaid, integrated, and analyzed together. Given that most oil and gas companies employ the ESRI stack as their main GIS software it is compelling to introduce capabilities for loading well and seismic from a variety of formats directly into ArcGIS.

Development in a Corporate Environment - Why Python?

For the development of the Geodetics Tools we had the option of using an ArcObjects-based approach but we choose Python for two primary reasons: 1) Speed of Development and 2) Ease of Deployment.

Speed of Development

Python and the Geoprocessing Framework are tightly integrated and Python is the recommended language for implementing script tools. Choosing Python means that as a developer you can spend your time coding functionality rather than coding a user interface. This in turn means you can deliver products to your clients faster than you might otherwise expect because core parts of the project (the user interface) are already taken care of. For example, when you create a tool through the geoprocessing framework you get a tool that looks and acts like the core tools delivered with ArcGIS. This means default validation, user interface controls and a documentation style that allows your custom tools to blend in seamlessly with other parts of the application like Model Builder.

It is worth noting that development time for the budding programmer is also decreased due to the nature of Python itself. Guido van Rossum, the author of Python, created the language to be easy and intuitive, open source, understandable as plain English and suitable to everyday tasks. To you, the GIS Analyst turned developer or dabbler, this means that Python is easy to learn and easy to read. You'll spend less time learning and more time creating solutions and improving workflow.

The Geodetics Toolbox currently contains over 40 tools. An enormous amount of time was saved because we didn't have to program UI's for each of the tools. The time saved allowed us to invest heavily into testing the tools, building a test harness, improving performance, and introducing innovations and new functionality.

Ease of Deployment

One of the beautiful things about a Python solution is the ease of deployment. There are no dlls to register, no complicated installations to run and no COM dependencies to worry about. With the Geodetics Toolbox, we are able to simply zip the solution in our office and unzip it in an accessible location on the client's network. With the code in place, the client need only to add the toolbox to ArcGIS Desktop and they have access to the functionality.

In many large corporations the software installation is the dominion of the IT department and pushing out software to the employees of the company is their job and often their headache. As a solution provider the easier you can make installation the more appreciative IT will be and the quicker your clients can access new functionality. For Integrated Informatics, the use of Python means that we are ensuring both the easiest possible installation process and the quickest possible request turnaround time for our clients and that translates into happy clients.

Development in a Corporate Environment - Automated Testing

The Python code for the Geodetics Toolbox contains well over 5000 lines of code (not including open source packages). As with all commercial products our customers send us enhancement requests on a regular basis and which we try to fulfill as quickly as possible. Requests range from new parameters to whole new toolsets. It is therefore important for us to know when changes impact existing functionality and whether any changes have caused tools to fail. With over 40 tools, each of which have up to dozens of parameters it is simply not effective to manually test each tool by hand. To achieve the desired turn-around times on requests at the expected level of quality, automated testing is an absolute must.

Python is, as they say, "batteries included" so for testing we only needed to look at Python itself for at least part of the solution. Python has an excellent standard module called unittest which is a part of the core Python install. For each tool we have a separate test script to test the many different permutations of parameters and data inputs. With individual test suites for each tool we can be very precise about what we are testing and efficient with time during the business day.

Test suites for each tool are a good step but it is critical that these test scripts be run in an automated fashion on a regular basis, not just when you remember to run them. In vogue of late is the notion of 'continuous integration', the concept that after each change checked in to the code base a trigger initiates all the tests in the test suite. This can be an excellent option for certain types of code base and certain types of tests, however, for tools that do heavy processing such high frequency testing is not always practical. More important is the idea that the test be triggered on a regular basis. This can be nightly, weekly or even monthly depending on the frequency with which the code base is updated. With an automated run of your test suite you always have a finger on the pulse of your code so that if and when a bug is introduced to the code base you can know quickly that there is a problem and correct it.

Code Re-Use: Classes in Python

One of the basic tenets in programming is to avoid code duplication. There are a couple ways that you can avoid code duplication and make your code base more efficient and effective. First you can use functions to contain pieces of code that you use over and over again. Functions can do a lot and are the first step on the way to reducing code duplication. For large code bases, functions are limited in what they can do (though they do have their place), and in many cases it is appropriate to start creating classes.

Though Python is thought of as a scripting language in the context of esri's software (for example note the use of 'scripting' in the name 'arcgisscripting'), it is actually a fully Object Oriented (OO) programming language. As such, programmers have the ability to create classes and full class hierarchies using inheritance. In large code bases, well written OO style code can help conceptually abstract complex ideas, reduce code duplication and isolate code into small concise chunks making it easier to change, manage, and test.

The very simple example below is intended to be a gentle introduction to Python classes and a small class hierarchy.

from os.path import basename

class AbstractReport(object):
    """
    Base parsing class
    """
    _table_fields = None

    def __init__(self, file_path, records):
        """
        initializes the class
        """
        self._file_path = file_path
        self._records = records

    def calc_coords(self):
        """
        calculates coordinates to be written to report table
        """
        raise NotImplementedError

    def write_table(self):
        """
        parses the records in the file
        """

        coords = self.calc_coords()
        print ('writes a table using fields %s '
               '\nand calculated coords %s to file %s' %
               (self._table_fields, coords, basename(self._file_path)))


class OrthoCheckReport(AbstractReport):
    """
    Orthongonal Check Report Class
    """
    _table_fields = ['FLD_A', 'FLD_B', 'FLD_C']

    def calc_coords(self):
        """
        calculates coordinates to be written to report table
        """
        print ('special Orthogonal Check report calculations using records %s' %
               self._records)
        return ['ortho', 'check', 'results']


class QAQCReport(AbstractReport):
    """
    QAQC Report class
    """
    _table_fields = ['FLD_X', 'FLD_Y', 'FLD_Z']

    def calc_coords(self):
        """
        calculates coordinates to be written to report table
        """
        print ('special QAQC report calculations using records %s' %
               self._records)
        return ['qaqc', 'report', 'results']


if __name__ == '__main__':
    input_file = r'c:\test\seismic_file.txt'
    records = ['reca', 'recb', 'recc']

    ocr = OrthoCheckReport(input_file, records)
    qqr = QAQCReport(input_file, records)

    ocr.write_table()
    qqr.write_table()

When run, this code prints:

special Orthogonal Check report calculations using records ['reca', 'recb', 'recc']
writes a table using fields ['FLD_A', 'FLD_B', 'FLD_C']
and calculated coords ['ortho', 'check', 'results'] to file seismic_file.txt

special QAQC report calculations using records ['reca', 'recb', 'recc']
writes a table using fields ['FLD_X', 'FLD_Y', 'FLD_Z']
and calculated coords ['qaqc', 'report', 'results'] to file seismic_file.txt

In the hierarchy above the code for the write_table method is only present in one of the classes (the AbstractReport class) but instances of the other classes ( OrthoCheckReport or QAQCReport) can still call that method. This is because both OrthoCheckReport and QAQCReport are "subclasses" of the AbtractReport "base class" and "inherit". A subclass that inherits from a base class has access to all the methods and properties of the base class. This means that regardless of what class is created above, calls to write_report will go through the same method.

The calc_coords method demonstrates what happens when code needs to be different in the subclasses from the base class. Each of the subclasses has a different way of calculating the coordinates for the table and therefore has unique code. To ensure this, the subclasses 'overload' the calc_coords method from the base class. As demonstrated above, 'overloading' in Python is as simple as just adding a method with the same name to your subclasses. This means that even though the write_table method has the exact same code for all of the classes, when it calls calc_coords it will follow a unique path through the subclasses. By doing this, unnecessary logic (like extra "if" statements) are eliminated from the code making it more streamlined and much easier to read.

To make a class inherit from another class simply include the name of the desired base class in the class declaration:

class OrthoCheckReport(AbstractReport):

When you do this make sure that the initialization (__init__) of the subclass is the same as the initialization of the base class. If it is different you will need to write an __init__ for the subclasses as well. Check the documentation and help for examples.

Use and Delivery of Open Source Packages

Why Open Source?

Python has become one of the most popular open source programming languages. As such, the users of Python have created literally thousands of open source packages, many of which are directly applicable to the kind of things you want to do in your applications. In the Geodetics Toolbox we use many open source packages to achieve our client's goals.

As an example, consider a common client's requests: create a pdf report from analysis performed in ArcGIS. As it turns out there is an easy-to-use cross platform open source package available called the ReportLab Toolkit which has the ability to create pdf documents. This package contains comprehensive and robust pdf manipulation capabilities as well as excellent documentation and a tutorial to help people get started. Using this package we were able to write reports and data to pdf documents with relative ease in a very short total development time. So next time you get a request, ask yourself the question "has someone else already done this" and search the internet before diving directly into development.

The All Important License

When you find a package that does exactly what you need the first thing to do is to read the license. Open Source licenses come in a variety of different forms and many are written to prevent software from being "closed source". It is extremely important to read the license very closely and make sure that you are using the package correctly. In the case of the ReportLab Toolkit, the license is a form of the Berkeley Software Distribution license (commonly referred to as the "BSD" license). This license is very permissive and allows the software to be used and distributed in other proprietary software given a few minor conditions. Other licenses are not nearly as permissive and are designed to ensure that software which uses other open source software is open source as well (for example the GPL). Take the time to familiarize yourself with the most common licenses so you know what and how you can use open source packages.

A useful table of licenses can be found here: http://en.wikipedia.org/wiki/Comparison_of_free_software_licenses. Another very useful site is www.opensource.org which contains information on all open source licenses.

Delivering Open Source Packages

For the Geodetics Toolbox we incorporated the ReportLab Toolkit as a subpackage to our package which means we actually delivered the code with our code. While it may seem simple to reference open source packages in your code, it is important that you actually incorporate the package. This allows you to control the version of the package being used and ensures that the package is available on the clients machine. Requesting that the client install the package themselves is a hassle to the client and should be avoided. Again, when you deliver an open source package with your code it is imperative that you read the license and fulfill the obligations in that license.

Conclusion

To deliver the functionality in the Geodetics Toolbox, we chose Python for its development language. Python and the Geoprocessing Framework enabled us to deliver the functionality quickly and have it look and feel exactly like the rest of the ArcGIS product. Using modules that are part of any Python install we created a suite of unittests to ensure the quality of our product over many code deliveries and new functionality requests. We leveraged Python's capabilities as an Object Oriented programming language to reduce code duplication in our code base making the code easier to maintain. Finally, because Python has such a large open source community we were able to find open source package that shortened our development time and helped us meet client needs.

About Integrated Informatics

Integrated Informatics Inc. is a leading consultancy for Geographic Information System implementation and development. Founded in 2002, Integrated provides spatial data management and automated mapping solutions to clients throughout North America with offices in Calgary, Alberta, St. John's, Newfoundland and Houston, Texas.

Integrated has longstanding relationships with its clients who comprise the major and super-major independent and integrated Energy companies, Provincial and State Governments, and Engineering and Environmental consultancies. We have a proven track record of developing and implementing strategies, systems, and technologies that support business goals and deliver corporate value.

Our strength is from our people. Our team is comprised of experienced professionals in the fields of Geographic Information Systems, Spatial Data Management, Project Data Management, Application Development and discipline specific experts. We have experts on staff in the disciplines of pipeline engineering, environmental and geoscience analysis. Our work environment promotes ideas, innovation, and implementation through professional development, continued training, industry involvement, and internally funded research and development.

Integrated is a Silver Tier International Esri business partner, an active participant in the beta program for holistic testing for Esri and sits on the advisory board for the GIS Program at the Southern Alberta Institute of Technology (SAIT). Visit us at www.integrated-informatics.com or contact us at gis@integrated-informatics.com for more information about our unique solutions.