Dr. Peter Schachte
School of Computing and Information Systems
The University of Melbourne
Doug McDonell Building
Parkville Campus, University of Melbourne
I regularly teach the following subjects:
|COMP90048 / COMP30020||Declarative Programming|
|COMP90041||Programming and Software Development|
|UNIB10002||Logic: Language and Information|
I am currently supervising the following research postgraduate students:
I am always looking for bright students interested in undertaking research study in an area related to programming languages.
Effective software vulnerability detection for web
Peter Stuckey, Harald Søndergaard, and Peter Schachte of the
University of Melbourne and Cristina Cifuentes and Andrew Santosa
of Oracle Corporation.
ARC Linkage Project LP140100437, 2014-2017.
This project will design and implement new and better methods to find vulnerabilities in software services delivered over the web or through the cloud, as well as methods for proving the absence of certain types of vulnerability. So-called injection attacks are pervasive and generally considered the most important security threat on today's Internet. The programming languages used for software services tend to use strings as a universal data structure, which unfortunately makes it hard to separate trusted code from untrusted user- provided data. This project will develop novel program analysis tools and string constraint solvers, and employ these tools to support sophisticated automated reasoning about string manipulating software.
Analysing Computer Arithmetic to Improve Software
Peter Stuckey, Harald Søndergaard, and Peter Schachte,
University of Melbourne, and Michael Codish, Ben-Gurion University.
ARC Discovery Project DP140102194, 2014-2016.
Most computer programs deal with integers. Automated tools designed to verify the correct behaviour of software usually assume the software deals with idealised mathematical integers, since this simplifies reasoning significantly. In reality, most programs work with integer number representations that approximate the ideal. This compromises the soundness of many verification tools. In this project we will design sound reasoning tools that are aware of the true nature of computer integer arithmetic.
Automatic software verification: Harnessing
Peter Stuckey, Harald Søndergaard, and Peter Schachte,
University of Melbourne, and Andy King, University of Kent.
ARC Discovery Project DP110102579, 2011-2013.
Software reliability is critically important but exceedingly difficult to achieve. The methods used today for automated verification of desirable properties of software have severe limitations in both scope and scale. The aim of this project is to broaden the set of program properties that can be established, as well as the programming language constructs to which they can be applied. By combining our expertise in program analysis and constraint solving technology we intend to develop better software verification methods that successfully verify more complex programs and properties than previously possible.
My publication list shows my past research. Also see Michael Ley's DBLP Bibliography Server for an index of my publications.
Aside from ongoing grant projects listed above, my current research projects, most highly speculative, include:
I am currently designing an implementing a new programming language called wybe (pronounced "WEE-buh") that combines the most important aspects of declarative and imperative programming in a single language. The objective is a language that is simple and easy to learn, yet robust and efficient to be suitable for large scale software development.
It is too early in the development of the language to give much information about it, but there are slides from a brief wybe introduction talk. There are innumerable small to large research projects to be undertaken in the development of this language.
In the area of abstract interpretation, I am interested in:
Incremental abstract interpretation
Abstract interpretation is a global static analysis technique; in general, this means that all of an applications source code must be analyzed at once. It would be impractical for a compiler to process every source file every time the smallest change is made to one of them, so it is important to find a way to avoid as much reanalysis as possible.
One approach has been suggested. Another promising approach involves using condensing analysis domains domains which do not lose precision when a predicate is analyzed independently of how it will be called, and that information is combined with information about an invocation of that predicate to determine the state after that invocation. Incrementally reanalyzing a program becomes easy with such a domain, because the goal independent analysis of a predicate cannot change unless the its code, or the analysis of a predicate it calls, has changed, and the goal dependent analysis of a predicate need not be repeated unless the goal independent analysis of a predicate it calls, or the goal dependent analysis of a predicate that calls it, has changed. By performing the analysis in these two phases, we can easily minimize the analysis we must repeat.
Currently, the only known non-trivial condensing analysis domain is the Pos domain for groundness analysis. Clearly other practical condensing domains must be found before an analysis methodology can be built around condensation! Ideally, a practical method of finding condensing domains should be found. A recent paper shows how to build condensing domains, but it is not clear that this method is practical.
Boolean functions (propositional logic formulae), or subclasses of them, are a useful abstraction for representing information about dependencies among boolean-valued variables. They certainly are effective for groundness analysis. And since most of the time in program analysis is spent manipulating abstract values, being able to efficiently manipulate boolean functions is very important.
GER is a hybrid representation for Boolean functions that separates out variables that are definitely true (and maybe those that are definitely false), equivalence classes of equivalent variables (and perhaps classes of exclusive variables), from other information. The idea is to use the most efficient representation for each class of information. Implementation has shown this representation to be significantly more efficient than the standard ROBDD representation. However, opportunities remain for further improvement, including improving the algorithms used and factoring out more information that can be represented more efficiently.
Improved ROBDD algorithms
Even using the GER representation, performance of ROBDD operations is the dominant factor determining analysis performance. We have shown that specialized ROBDD operations can significantly improve groundness analysis performance. More such optimizations can be made.
Controlled Impurity in Logic Programming
Practical purely declarative programming languages must provide an interface to existing code written in imperative languages. Until it is practical to prove the purity of foreign code, this creates the possibility that impurity will leak from the foreign code into the (supposedly) declarative program.
Of course, it is possible to write pure code using impure facilities (ultimately, all declarative code is implemented that way). However, most pure languages take the attitude that the programmer must construct her pure primitives wholly outside the pure language, using the foreign interface only to interface to pure primitives. Often, though, it would be more convenient to implement parts of the pure facility in the declarative language, using impure primitives through the foreign interface. Generally, the only alternative is to understand the implementation of the declarative language well enough to predict how it will handle impure primitives that are permitted to cross the foreign interface. This is highly undesirable, as the implementation details of the language may change, or the program may need to be ported to another implementation of the same language. If you lie to the compiler, it will eventually have its revenge.
Mercury solves this problem by requiring users to indicate which primitives made available through its foreign interface are pure and which are not. A predicate may invoke impure operations provided it is declared impure, or it is promised to be pure. The Mercury compiler is not permitted to perform certain kinds of optimizations on impure parts of a predicate, assuring the user that, although it uses impure primitives, her predicate will behave as intended.
New Mode System
Mercury uses a mode system that requires the user to specify each allowable call pattern of argument instantiations and the corresponding result pattern. Additionally, the user must specify the determinism of each mode. A major shortcoming of this approach is that the user of a predicate will not be able to use any mode that its author did not anticipate. Another shortcoming of the current approach is that it reorders code minimally to meet the declared determinism. Perhaps a more efficient implementation of the predicate could be created with more aggressive reordering. And in any case, the current mode system discourages predicates with multiple modes.
An alternative approach would be to specify the instantiation dependencies of predicates, instead of what Mercury calls modes. One might declare the mode of append/3 by specifying that the backbone of the third argument is bound iff the backbones of both of the other arguments are bound, and likewise the elements of the lists. This says all that is needed for mode analysis of a call to append/3.
I would argue that this style of mode system could be less verbose than the current system, and at least as natural. One thing this would not provide, though, is a list of the modes for each predicate that the compiler should generate code for. Ideally, code for different modes of a predicate would be generated on demand.
Sequence quantification is a refinement of bounded quantification, which has been well studied. The improvement over bounded quantification lies in giving users the ability to define new ranges to quantify over. By carefully constructing the mechanism for users to define new sequences, we are able to provide much of the power of looping constructs in conventional languages such as C or PASCAL.
Sequence Quantification can be implemented with a few very simple higher order predicates and a few user-extensible predicates defining the kinds of sequences that can be quantified over. However, this implementation gives poor performance. To get high performance, specialized versions of these predicates need to be created for each invocation. Doing this sort of specialization without accidentally slowing the code down in some cases is rather difficult. Doing it with acceptable (or even just finite) compilation times can be tricky.
A declarative approach to programming, and to software design, has much to contribute to software engineering. However, the software engineering community appears uninterested in declarative languages. Conversely, the declarative language community is largely uninterested in software engineering.
Declarative Software Design Methodology
A software design methodology oriented towards declarative implementation is needed. Ideally it would build on an existing methodology, preferably based on UML due to its popularity. Probably many of the existing UML diagrams would be equally applicable to declarative development. Even class diagrams would be applicable, though inheritance may be less desirable.
Some of the freely-distributable software I have written is available for download.
I am available for consulting jobs. See my curriculum vitae for my work experience.