Visual
|
Postal Dr. Peter Schachte School of Computing and Information Systems The University of Melbourne Victoria 3010 Australia
|
Electronic
|
Physical Room 8.15 Doug McDonell Building Parkville Campus, University of Melbourne Australia |
I regularly teach the following subjects:
COMP90048 / COMP30020 | Declarative Programming |
COMP90041 | Programming and Software Development |
UNIB10002 | Logic: Language and Information |
I am currently supervising the following research postgraduate students:
I am always looking for bright students interested in undertaking research study in an area related to programming languages.
Effective software vulnerability detection for web
services.
Collaboration between
Peter Stuckey, Harald Søndergaard, and Peter Schachte of the
University of Melbourne and Cristina Cifuentes and Andrew Santosa
of Oracle Corporation.
ARC Linkage Project LP140100437, 2014-2017.
This project will design and implement new and better methods to
find vulnerabilities in software services delivered over the web or
through the cloud, as well as methods for proving the absence of
certain types of vulnerability. So-called injection attacks are
pervasive and generally considered the most important security threat
on today's Internet. The programming languages used for software
services tend to use strings as a universal data structure, which
unfortunately makes it hard to separate trusted code from untrusted
user- provided data. This project will develop novel program analysis
tools and string constraint solvers, and employ these tools to support
sophisticated automated reasoning about string manipulating software.
Analysing Computer Arithmetic to Improve Software
Reliability.
Peter Stuckey, Harald Søndergaard, and Peter Schachte,
University of Melbourne, and Michael Codish, Ben-Gurion University.
ARC Discovery Project DP140102194, 2014-2016.
Most computer programs deal with integers. Automated tools designed
to verify the correct behaviour of software usually assume the
software deals with idealised mathematical integers, since this
simplifies reasoning significantly. In reality, most programs work
with integer number representations that approximate the ideal. This
compromises the soundness of many verification tools. In this
project we will design sound reasoning tools that are aware of the
true nature of computer integer arithmetic.
Automatic software verification: Harnessing
constraint technologies.
Peter Stuckey, Harald Søndergaard, and Peter Schachte,
University of Melbourne, and Andy King, University of Kent.
ARC Discovery Project DP110102579, 2011-2013.
Software reliability is critically important but exceedingly
difficult to achieve. The methods used today for automated
verification of desirable properties of software have severe
limitations in both scope and scale. The aim of this project is to
broaden the set of program properties that can be established, as well
as the programming language constructs to which they can be
applied. By combining our expertise in program analysis and constraint
solving technology we intend to develop better software verification
methods that successfully verify more complex programs and properties
than previously possible.
My publication list shows my past research. Also see Michael Ley's DBLP Bibliography Server for an index of my publications.
Aside from ongoing grant projects listed above, my current research projects, most highly speculative, include:
Wybe
I am currently designing an implementing a new programming
language called wybe (pronounced "WEE-buh") that
combines the most important aspects of declarative and
imperative programming in a single language.
The objective is a language that is simple and easy to learn,
yet robust and efficient to be suitable for large scale software
development.
It is too early in the development of the language to give much information about it, but there are slides from a brief wybe introduction talk. There are innumerable small to large research projects to be undertaken in the development of this language.
In the area of abstract interpretation, I am interested in:
Incremental abstract interpretation
Abstract interpretation is a global static analysis
technique; in general, this means that all of an applications
source code must be analyzed at once. It would be impractical
for a compiler to process every source file every time the
smallest change is made to one of them, so it is important to
find a way to avoid as much reanalysis as possible.
One approach has been suggested. Another promising approach involves using condensing analysis domains domains which do not lose precision when a predicate is analyzed independently of how it will be called, and that information is combined with information about an invocation of that predicate to determine the state after that invocation. Incrementally reanalyzing a program becomes easy with such a domain, because the goal independent analysis of a predicate cannot change unless the its code, or the analysis of a predicate it calls, has changed, and the goal dependent analysis of a predicate need not be repeated unless the goal independent analysis of a predicate it calls, or the goal dependent analysis of a predicate that calls it, has changed. By performing the analysis in these two phases, we can easily minimize the analysis we must repeat.
Condensing domains
Currently, the only known non-trivial condensing analysis domain is
the Pos domain for groundness analysis. Clearly
other practical condensing domains must be found before an
analysis methodology can be built around condensation!
Ideally, a practical method of finding condensing domains
should be found.
A recent paper
shows how to build condensing domains, but it is not clear
that this method is practical.
Boolean functions (propositional logic formulae), or subclasses of them, are a useful abstraction for representing information about dependencies among boolean-valued variables. They certainly are effective for groundness analysis. And since most of the time in program analysis is spent manipulating abstract values, being able to efficiently manipulate boolean functions is very important.
GER
GER is a hybrid representation for Boolean functions that
separates out variables that are definitely true (and maybe
those that are definitely false), equivalence classes of
equivalent variables (and perhaps classes of exclusive
variables), from other information. The idea is to use the
most efficient representation for each class of information.
Implementation
has shown this representation to be significantly more
efficient than the standard ROBDD representation. However,
opportunities remain for further improvement, including
improving the algorithms used and factoring out more
information that can be represented more efficiently.
Improved ROBDD algorithms
Even using the GER representation, performance of ROBDD
operations is the dominant factor determining analysis
performance. We have shown
that specialized ROBDD operations can significantly improve
groundness analysis performance. More such optimizations can
be made.
Controlled Impurity in Logic Programming
Practical purely declarative programming languages must
provide an interface to existing code written in imperative
languages. Until it is practical to prove the purity of
foreign code, this creates the possibility that impurity
will leak from the foreign code into the (supposedly)
declarative program.
Of course, it is possible to write pure code using impure facilities (ultimately, all declarative code is implemented that way). However, most pure languages take the attitude that the programmer must construct her pure primitives wholly outside the pure language, using the foreign interface only to interface to pure primitives. Often, though, it would be more convenient to implement parts of the pure facility in the declarative language, using impure primitives through the foreign interface. Generally, the only alternative is to understand the implementation of the declarative language well enough to predict how it will handle impure primitives that are permitted to cross the foreign interface. This is highly undesirable, as the implementation details of the language may change, or the program may need to be ported to another implementation of the same language. If you lie to the compiler, it will eventually have its revenge.
Mercury solves this problem by requiring users to indicate which primitives made available through its foreign interface are pure and which are not. A predicate may invoke impure operations provided it is declared impure, or it is promised to be pure. The Mercury compiler is not permitted to perform certain kinds of optimizations on impure parts of a predicate, assuring the user that, although it uses impure primitives, her predicate will behave as intended.
New Mode System
Mercury uses a
mode system that requires the user to specify each allowable
call pattern of argument instantiations and the
corresponding result pattern. Additionally, the user must
specify the determinism of each mode. A major shortcoming
of this approach is that the user of a predicate will not be
able to use any mode that its author did not anticipate.
Another shortcoming of the current approach is that it
reorders code minimally to meet the declared determinism.
Perhaps a more efficient implementation of the predicate
could be created with more aggressive reordering. And in
any case, the current mode system discourages predicates
with multiple modes.
An alternative approach would be to specify the instantiation dependencies of predicates, instead of what Mercury calls modes. One might declare the mode of append/3 by specifying that the backbone of the third argument is bound iff the backbones of both of the other arguments are bound, and likewise the elements of the lists. This says all that is needed for mode analysis of a call to append/3.
I would argue that this style of mode system could be less verbose than the current system, and at least as natural. One thing this would not provide, though, is a list of the modes for each predicate that the compiler should generate code for. Ideally, code for different modes of a predicate would be generated on demand.
Sequence Quantification
Sequence quantification is a refinement of bounded
quantification, which has been well studied. The
improvement over bounded quantification lies in giving users
the ability to define new ranges to quantify over. By
carefully constructing the mechanism for users to define new
sequences, we are able to provide much of the power of
looping constructs in conventional languages such as C or
PASCAL.
Program Specialization
Sequence Quantification can be
implemented with a few very simple higher order predicates
and a few user-extensible predicates defining the kinds of
sequences that can be quantified over. However, this
implementation gives poor performance. To get high
performance, specialized versions of these predicates need
to be created for each invocation. Doing this sort of
specialization without accidentally slowing the code down in
some cases is rather difficult. Doing it with acceptable
(or even just finite) compilation times can be tricky.
A declarative approach to programming, and to software design, has much to contribute to software engineering. However, the software engineering community appears uninterested in declarative languages. Conversely, the declarative language community is largely uninterested in software engineering.
Declarative Software Design Methodology
A software design methodology oriented towards declarative
implementation is needed. Ideally it would build on an
existing methodology, preferably based on UML due to its
popularity. Probably many of the existing UML diagrams
would be equally applicable to declarative development.
Even class diagrams would be applicable, though inheritance
may be less desirable.
Some of the freely-distributable software I have written is available for download.
I am available for consulting jobs. See my curriculum vitae for my work experience.