Requirements for the Parser Framework Library

Please note: this is a first, rough draft! This whole section of this website is a first, rough draft. So, please expect it to be rough :-)

Oh, and feedback would be greatly appreciated, from anyone who cares to give it :-D

Links

Contact

If you wish to contact me about the D&K Parser Framework Library, or about this site, you can contact me as sgbest@users.sourceforge.net.

Credits

This site is hosted by:

SourceForge.net Logo

Introduction

This part of the website contains information on the requirements for the Dragon and Kangaroo Parser Framework Library (D&K PFL). These are not requirements in the sense of what the PFL will require, but the requirements that it's expected to meet. This information may be of interest to those wishing to learn more about this project, and more about the PFL generally.

As this project is still in its early stages, these requirements are likely to change. In particular, they are likely to become more detailed and refined as time passes.

What the D&K PFL is For

In order to determine what the requirements for a library are, it's usually a good idea to consider what that library will be for. The purpose will primarily determine the requirements.

The D&K PFL is for use by parser developers, particularly in the development of parser libraries and tools. It is also for use, though more indirectly, by developers who need to make use of parsers which use this framework. It is not intended to provide a quick and easy way to define parsers, but is instead designed to be an open, extensible, versatile framework for parser development generally.

This framework is primarily aimed at the compile-time generation of efficient parsers. Usually, parsers are needed for grammars which are known prior to any actual parser code being written, and which will not change dynamically at runtime. Sometimes, runtime parser generation is required, but this is not the main concern of this library. This library is for the usual situation when grammars are static.

The main use of this framework is expected to be in the development of more specific, though perhaps still general, parser libraries. For example, an XML library could be written using this framework, and then used as just an XML library. That, in turn, could then be used as the basis for an HTML library, perhaps making further use of this framework. In the end, such libraries would be used by other developers without having to even know about the D&K PFL.

Parser development tools, along the lines of YACC and so on, can be used with this framework. Parser generators could convert grammar descriptions from whatever grammar description languages they're written in into C++ code that itself uses this library.

It is hoped that the D&K PFL will be useful for many projects. Indeed, the parsing of data for the purposes of further processing is a very common requirement!

General Requirements

There are some general requirements for the PFL.

There may be other general requirements in addition to these.

Organisational Requirements

The library should, of course, be properly organised. It should have a good, clear structure, which should be reflected in the organisation and contents of its headers.

Firstly, grammars and parsers should be distinct. A grammar may need to be used by several different parsers for several different purposes. For example, one application may need a document parsed into an abstract syntax tree (AST), while another may need to parse documents with the same grammar but without needing ASTs. There may be different error handling requirements, and different kinds of parser actions may be required. It would seem, then, that grammars and parsers should not be unified in this library.

Secondly, users of the framework should not need to write the actual parsers for grammars. Parsers should be fundamentally defined by grammars. The focus of parser development should be on describing grammars in a form that the framework uses. Parsers, then, should usually be automatically generated for grammars by the framework itself.

Similarly, users of parsers developed with the framework shouldn't need to worry about the details of the grammars. The parsers should be readily available and easy to use. The focus should, of course, be on the use of parsers, and not on how those parsers are implemented.

There should also be a distinction between grammars and lexicons, between syntactic constructs and lexical tokens. This should be reflected by a distinction between parsers and tokenisers.

As the C++ standard library's input streams already act as tokenisers, it makes sense to use them as tokenisers. More generally, it seems like a good idea to model tokenisers generally on input streams. The framework, then, should use streams and stream-like objects as tokenisers.

Grammars may be more complex than just purely context-free grammars. They may have grammatical semantics, such as identifiers having to be declared before being used. Such things require the support of such grammatical semantics by the framework. This should be distinct from the purely syntactic aspects of grammars. It is a kind of higher level, and should be presented as such.

As parsers may be required to do various different things with whatever they are parsing, there are also semantics associated with parsing to consider. These semantics should be distinct from the grammatical semantics of grammars. This, again, is a kind of higher level, and should be reflected as such in the design of the parsers.

Syntax and semantic errors that occur during parsing may also need to be dealt with by parsers. The framework should support this, providing appropriate facilities. This should be kept distinct from the usual parser semantics, which apply when parsing is successful.

Generally, each aspect of grammar and parser development should be kept distinct, even though they will inevitably be combined and used in various different ways together. Such clarity is, of course, important for maintainability, but is also of fundamental importance for extensibility and flexibility.

Documentary Requirements

Libraries aren't much good if they're not properly documented! Documentation is not only needed for use, but also for maintainability. It needs to be clear, unambiguous, well organised, properly presented, and so on. Documentation for users needs to be appropriate for users, and documentation for maintainers needs to be appropriate for maintenance and future development. And, of course, such documentation needs to be complete!

For users, there will need to at least be reference documentation. A tutorial guide would also be useful. The reference documentation would also be useful for maintainers and those pursuing further development of the framework, but would probably not be sufficient. Detailed documentation of the actual design and implementation of the library will be needed.

The source code, especially the headers, should be properly formatted and commented. Of course, the source code itself will form much of its own documentation, but conservative use of appropriate comments would help significantly.

In general, documentation for this framework should be written in a suitably generic form, so that it can be converted into various forms for distribution, storage and accessibility. The documentation itself also needs to be maintainable, along with the library, so that it can be kept up to date.