First commit

This commit is contained in:
Theodotos Andreou 2018-01-14 13:10:16 +00:00
commit c6e2478c40
13918 changed files with 2303184 additions and 0 deletions

View file

@ -0,0 +1,80 @@
Introduction
============
This project is a PHP 5.2 to PHP 7.1 parser **written in PHP itself**.
What is this for?
-----------------
A parser is useful for [static analysis][0], manipulation of code and basically any other
application dealing with code programmatically. A parser constructs an [Abstract Syntax Tree][1]
(AST) of the code and thus allows dealing with it in an abstract and robust way.
There are other ways of processing source code. One that PHP supports natively is using the
token stream generated by [`token_get_all`][2]. The token stream is much more low level than
the AST and thus has different applications: It allows to also analyze the exact formatting of
a file. On the other hand the token stream is much harder to deal with for more complex analysis.
For example an AST abstracts away the fact that in PHP variables can be written as `$foo`, but also
as `$$bar`, `${'foobar'}` or even `${!${''}=barfoo()}`. You don't have to worry about recognizing
all the different syntaxes from a stream of tokens.
Another question is: Why would I want to have a PHP parser *written in PHP*? Well, PHP might not be
a language especially suited for fast parsing, but processing the AST is much easier in PHP than it
would be in other, faster languages like C. Furthermore the people most probably wanting to do
programmatic PHP code analysis are incidentally PHP developers, not C developers.
What can it parse?
------------------
The parser supports parsing PHP 5.2-5.6 and PHP 7.
As the parser is based on the tokens returned by `token_get_all` (which is only able to lex the PHP
version it runs on), additionally a wrapper for emulating tokens from newer versions is provided.
This allows to parse PHP 7.1 source code running on PHP 5.5, for example. This emulation is somewhat
hacky and not perfect, but it should work well on any sane code.
What output does it produce?
----------------------------
The parser produces an [Abstract Syntax Tree][1] (AST) also known as a node tree. How this looks like
can best be seen in an example. The program `<?php echo 'Hi', 'World';` will give you a node tree
roughly looking like this:
```
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Scalar_String(
value: World
)
)
)
)
```
This matches the structure of the code: An echo statement, which takes two strings as expressions,
with the values `Hi` and `World!`.
You can also see that the AST does not contain any whitespace information (but most comments are saved).
So using it for formatting analysis is not possible.
What else can it do?
--------------------
Apart from the parser itself this package also bundles support for some other, related features:
* Support for pretty printing, which is the act of converting an AST into PHP code. Please note
that "pretty printing" does not imply that the output is especially pretty. It's just how it's
called ;)
* Support for serializing and unserializing the node tree to XML
* Support for dumping the node tree in a human readable form (see the section above for an
example of how the output looks like)
* Infrastructure for traversing and changing the AST (node traverser and node visitors)
* A node visitor for resolving namespaced names
[0]: http://en.wikipedia.org/wiki/Static_program_analysis
[1]: http://en.wikipedia.org/wiki/Abstract_syntax_tree
[2]: http://php.net/token_get_all

View file

@ -0,0 +1,438 @@
Usage of basic components
=========================
This document explains how to use the parser, the pretty printer and the node traverser.
Bootstrapping
-------------
To bootstrap the library, include the autoloader generated by composer:
```php
require 'path/to/vendor/autoload.php';
```
Additionally you may want to set the `xdebug.max_nesting_level` ini option to a higher value:
```php
ini_set('xdebug.max_nesting_level', 3000);
```
This ensures that there will be no errors when traversing highly nested node trees. However, it is
preferable to disable XDebug completely, as it can easily make this library more than five times
slower.
Parsing
-------
In order to parse code, you first have to create a parser instance:
```php
use PhpParser\ParserFactory;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
```
The factory accepts a kind argument, that determines how different PHP versions are treated:
Kind | Behavior
-----|---------
`ParserFactory::PREFER_PHP7` | Try to parse code as PHP 7. If this fails, try to parse it as PHP 5.
`ParserFactory::PREFER_PHP5` | Try to parse code as PHP 5. If this fails, try to parse it as PHP 7.
`ParserFactory::ONLY_PHP7` | Parse code as PHP 7.
`ParserFactory::ONLY_PHP5` | Parse code as PHP 5.
Unless you have strong reason to use something else, `PREFER_PHP7` is a reasonable default.
The `create()` method optionally accepts a `Lexer` instance as the second argument. Some use cases
that require customized lexers are discussed in the [lexer documentation](component/Lexer.markdown).
Subsequently you can pass PHP code (including the opening `<?php` tag) to the `parse` method in order to
create a syntax tree. If a syntax error is encountered, an `PhpParser\Error` exception will be thrown:
```php
use PhpParser\Error;
use PhpParser\ParserFactory;
$code = '<?php // some code';
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
try {
$stmts = $parser->parse($code);
// $stmts is an array of statement nodes
} catch (Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
A parser instance can be reused to parse multiple files.
Node tree
---------
If you use the above code with `$code = "<?php echo 'Hi ', hi\\getTarget();"` the parser will
generate a node tree looking like this:
```
array(
0: Stmt_Echo(
exprs: array(
0: Scalar_String(
value: Hi
)
1: Expr_FuncCall(
name: Name(
parts: array(
0: hi
1: getTarget
)
)
args: array(
)
)
)
)
)
```
Thus `$stmts` will contain an array with only one node, with this node being an instance of
`PhpParser\Node\Stmt\Echo_`.
As PHP is a large language there are approximately 140 different nodes. In order to make work
with them easier they are grouped into three categories:
* `PhpParser\Node\Stmt`s are statement nodes, i.e. language constructs that do not return
a value and can not occur in an expression. For example a class definition is a statement.
It doesn't return a value and you can't write something like `func(class A {});`.
* `PhpParser\Node\Expr`s are expression nodes, i.e. language constructs that return a value
and thus can occur in other expressions. Examples of expressions are `$var`
(`PhpParser\Node\Expr\Variable`) and `func()` (`PhpParser\Node\Expr\FuncCall`).
* `PhpParser\Node\Scalar`s are nodes representing scalar values, like `'string'`
(`PhpParser\Node\Scalar\String_`), `0` (`PhpParser\Node\Scalar\LNumber`) or magic constants
like `__FILE__` (`PhpParser\Node\Scalar\MagicConst\File`). All `PhpParser\Node\Scalar`s extend
`PhpParser\Node\Expr`, as scalars are expressions, too.
* There are some nodes not in either of these groups, for example names (`PhpParser\Node\Name`)
and call arguments (`PhpParser\Node\Arg`).
Some node class names have a trailing `_`. This is used whenever the class name would otherwise clash
with a PHP keyword.
Every node has a (possibly zero) number of subnodes. You can access subnodes by writing
`$node->subNodeName`. The `Stmt\Echo_` node has only one subnode `exprs`. So in order to access it
in the above example you would write `$stmts[0]->exprs`. If you wanted to access the name of the function
call, you would write `$stmts[0]->exprs[1]->name`.
All nodes also define a `getType()` method that returns the node type. The type is the class name
without the `PhpParser\Node\` prefix and `\` replaced with `_`. It also does not contain a trailing
`_` for reserved-keyword class names.
It is possible to associate custom metadata with a node using the `setAttribute()` method. This data
can then be retrieved using `hasAttribute()`, `getAttribute()` and `getAttributes()`.
By default the lexer adds the `startLine`, `endLine` and `comments` attributes. `comments` is an array
of `PhpParser\Comment[\Doc]` instances.
The start line can also be accessed using `getLine()`/`setLine()` (instead of `getAttribute('startLine')`).
The last doc comment from the `comments` attribute can be obtained using `getDocComment()`.
Pretty printer
--------------
The pretty printer component compiles the AST back to PHP code. As the parser does not retain formatting
information the formatting is done using a specified scheme. Currently there is only one scheme available,
namely `PhpParser\PrettyPrinter\Standard`.
```php
use PhpParser\Error;
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter;
$code = "<?php echo 'Hi ', hi\\getTarget();";
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$prettyPrinter = new PrettyPrinter\Standard;
try {
// parse
$stmts = $parser->parse($code);
// change
$stmts[0] // the echo statement
->exprs // sub expressions
[0] // the first of them (the string node)
->value // it's value, i.e. 'Hi '
= 'Hello '; // change to 'Hello '
// pretty print
$code = $prettyPrinter->prettyPrint($stmts);
echo $code;
} catch (Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The above code will output:
<?php echo 'Hello ', hi\getTarget();
As you can see the source code was first parsed using `PhpParser\Parser->parse()`, then changed and then
again converted to code using `PhpParser\PrettyPrinter\Standard->prettyPrint()`.
The `prettyPrint()` method pretty prints a statements array. It is also possible to pretty print only a
single expression using `prettyPrintExpr()`.
The `prettyPrintFile()` method can be used to print an entire file. This will include the opening `<?php` tag
and handle inline HTML as the first/last statement more gracefully.
Node traversation
-----------------
The above pretty printing example used the fact that the source code was known and thus it was easy to
write code that accesses a certain part of a node tree and changes it. Normally this is not the case.
Usually you want to change / analyze code in a generic way, where you don't know how the node tree is
going to look like.
For this purpose the parser provides a component for traversing and visiting the node tree. The basic
structure of a program using this `PhpParser\NodeTraverser` looks like this:
```php
use PhpParser\NodeTraverser;
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter;
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$traverser = new NodeTraverser;
$prettyPrinter = new PrettyPrinter\Standard;
// add your visitor
$traverser->addVisitor(new MyNodeVisitor);
try {
$code = file_get_contents($fileName);
// parse
$stmts = $parser->parse($code);
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = $prettyPrinter->prettyPrintFile($stmts);
echo $code;
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The corresponding node visitor might look like this:
```php
use PhpParser\Node;
use PhpParser\NodeVisitorAbstract;
class MyNodeVisitor extends NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Scalar\String_) {
$node->value = 'foo';
}
}
}
```
The above node visitor would change all string literals in the program to `'foo'`.
All visitors must implement the `PhpParser\NodeVisitor` interface, which defines the following four
methods:
```php
public function beforeTraverse(array $nodes);
public function enterNode(\PhpParser\Node $node);
public function leaveNode(\PhpParser\Node $node);
public function afterTraverse(array $nodes);
```
The `beforeTraverse()` method is called once before the traversal begins and is passed the nodes the
traverser was called with. This method can be used for resetting values before traversation or
preparing the tree for traversal.
The `afterTraverse()` method is similar to the `beforeTraverse()` method, with the only difference that
it is called once after the traversal.
The `enterNode()` and `leaveNode()` methods are called on every node, the former when it is entered,
i.e. before its subnodes are traversed, the latter when it is left.
All four methods can either return the changed node or not return at all (i.e. `null`) in which
case the current node is not changed.
The `enterNode()` method can additionally return the value `NodeTraverser::DONT_TRAVERSE_CHILDREN`,
which instructs the traverser to skip all children of the current node.
The `leaveNode()` method can additionally return the value `NodeTraverser::REMOVE_NODE`, in which
case the current node will be removed from the parent array. Furthermore it is possible to return
an array of nodes, which will be merged into the parent array at the offset of the current node.
I.e. if in `array(A, B, C)` the node `B` should be replaced with `array(X, Y, Z)` the result will
be `array(A, X, Y, Z, C)`.
Instead of manually implementing the `NodeVisitor` interface you can also extend the `NodeVisitorAbstract`
class, which will define empty default implementations for all the above methods.
The NameResolver node visitor
-----------------------------
One visitor is already bundled with the package: `PhpParser\NodeVisitor\NameResolver`. This visitor
helps you work with namespaced code by trying to resolve most names to fully qualified ones.
For example, consider the following code:
use A as B;
new B\C();
In order to know that `B\C` really is `A\C` you would need to track aliases and namespaces yourself.
The `NameResolver` takes care of that and resolves names as far as possible.
After running it most names will be fully qualified. The only names that will stay unqualified are
unqualified function and constant names. These are resolved at runtime and thus the visitor can't
know which function they are referring to. In most cases this is a non-issue as the global functions
are meant.
Also the `NameResolver` adds a `namespacedName` subnode to class, function and constant declarations
that contains the namespaced name instead of only the shortname that is available via `name`.
Example: Converting namespaced code to pseudo namespaces
--------------------------------------------------------
A small example to understand the concept: We want to convert namespaced code to pseudo namespaces
so it works on 5.2, i.e. names like `A\\B` should be converted to `A_B`. Note that such conversions
are fairly complicated if you take PHP's dynamic features into account, so our conversion will
assume that no dynamic features are used.
We start off with the following base code:
```php
use PhpParser\ParserFactory;
use PhpParser\PrettyPrinter;
use PhpParser\NodeTraverser;
use PhpParser\NodeVisitor\NameResolver;
$inDir = '/some/path';
$outDir = '/some/other/path';
$parser = (new ParserFactory)->create(ParserFactory::PREFER_PHP7);
$traverser = new NodeTraverser;
$prettyPrinter = new PrettyPrinter\Standard;
$traverser->addVisitor(new NameResolver); // we will need resolved names
$traverser->addVisitor(new NamespaceConverter); // our own node visitor
// iterate over all .php files in the directory
$files = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($inDir));
$files = new \RegexIterator($files, '/\.php$/');
foreach ($files as $file) {
try {
// read the file that should be converted
$code = file_get_contents($file);
// parse
$stmts = $parser->parse($code);
// traverse
$stmts = $traverser->traverse($stmts);
// pretty print
$code = $prettyPrinter->prettyPrintFile($stmts);
// write the converted file to the target directory
file_put_contents(
substr_replace($file->getPathname(), $outDir, 0, strlen($inDir)),
$code
);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
}
```
Now lets start with the main code, the `NodeVisitor\NamespaceConverter`. One thing it needs to do
is convert `A\\B` style names to `A_B` style ones.
```php
use PhpParser\Node;
class NamespaceConverter extends \PhpParser\NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Name) {
return new Node\Name($node->toString('_'));
}
}
}
```
The above code profits from the fact that the `NameResolver` already resolved all names as far as
possible, so we don't need to do that. We only need to create a string with the name parts separated
by underscores instead of backslashes. This is what `$node->toString('_')` does. (If you want to
create a name with backslashes either write `$node->toString()` or `(string) $node`.) Then we create
a new name from the string and return it. Returning a new node replaces the old node.
Another thing we need to do is change the class/function/const declarations. Currently they contain
only the shortname (i.e. the last part of the name), but they need to contain the complete name including
the namespace prefix:
```php
use PhpParser\Node;
use PhpParser\Node\Stmt;
class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Name) {
return new Node\Name($node->toString('_'));
} elseif ($node instanceof Stmt\Class_
|| $node instanceof Stmt\Interface_
|| $node instanceof Stmt\Function_) {
$node->name = $node->namespacedName->toString('_');
} elseif ($node instanceof Stmt\Const_) {
foreach ($node->consts as $const) {
$const->name = $const->namespacedName->toString('_');
}
}
}
}
```
There is not much more to it than converting the namespaced name to string with `_` as separator.
The last thing we need to do is remove the `namespace` and `use` statements:
```php
use PhpParser\Node;
use PhpParser\Node\Stmt;
class NodeVisitor_NamespaceConverter extends \PhpParser\NodeVisitorAbstract
{
public function leaveNode(Node $node) {
if ($node instanceof Node\Name) {
return new Node\Name($node->toString('_'));
} elseif ($node instanceof Stmt\Class_
|| $node instanceof Stmt\Interface_
|| $node instanceof Stmt\Function_) {
$node->name = $node->namespacedName->toString('_');
} elseif ($node instanceof Stmt\Const_) {
foreach ($node->consts as $const) {
$const->name = $const->namespacedName->toString('_');
}
} elseif ($node instanceof Stmt\Namespace_) {
// returning an array merges is into the parent array
return $node->stmts;
} elseif ($node instanceof Stmt\Use_) {
// returning false removed the node altogether
return false;
}
}
}
```
That's all.

View file

@ -0,0 +1,330 @@
Other node tree representations
===============================
It is possible to convert the AST into several textual representations, which serve different uses.
Simple serialization
--------------------
It is possible to serialize the node tree using `serialize()` and also unserialize it using
`unserialize()`. The output is not human readable and not easily processable from anything
but PHP, but it is compact and generates quickly. The main application thus is in caching.
Human readable dumping
----------------------
Furthermore it is possible to dump nodes into a human readable format using the `dump` method of
`PhpParser\NodeDumper`. This can be used for debugging.
```php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hello World!!!');
CODE;
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7);
$nodeDumper = new PhpParser\NodeDumper;
try {
$stmts = $parser->parse($code);
echo $nodeDumper->dump($stmts), "\n";
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The above script will have an output looking roughly like this:
```
array(
0: Stmt_Function(
byRef: false
params: array(
0: Param(
name: msg
default: null
type: null
byRef: false
)
)
stmts: array(
0: Stmt_Echo(
exprs: array(
0: Expr_Variable(
name: msg
)
1: Scalar_String(
value:
)
)
)
)
name: printLine
)
1: Expr_FuncCall(
name: Name(
parts: array(
0: printLine
)
)
args: array(
0: Arg(
value: Scalar_String(
value: Hello World!!!
)
byRef: false
)
)
)
)
```
JSON encoding
-------------
Nodes (and comments) implement the `JsonSerializable` interface. As such, it is possible to JSON
encode the AST directly using `json_encode()`:
```php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hello World!!!');
CODE;
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7);
$nodeDumper = new PhpParser\NodeDumper;
try {
$stmts = $parser->parse($code);
echo json_encode($stmts, JSON_PRETTY_PRINT), "\n";
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
This will result in the following output (which includes attributes):
```json
[
{
"nodeType": "Stmt_Function",
"byRef": false,
"name": "printLine",
"params": [
{
"nodeType": "Param",
"type": null,
"byRef": false,
"variadic": false,
"name": "msg",
"default": null,
"attributes": {
"startLine": 3,
"endLine": 3
}
}
],
"returnType": null,
"stmts": [
{
"nodeType": "Stmt_Echo",
"exprs": [
{
"nodeType": "Expr_Variable",
"name": "msg",
"attributes": {
"startLine": 4,
"endLine": 4
}
},
{
"nodeType": "Scalar_String",
"value": "\n",
"attributes": {
"startLine": 4,
"endLine": 4,
"kind": 2
}
}
],
"attributes": {
"startLine": 4,
"endLine": 4
}
}
],
"attributes": {
"startLine": 3,
"endLine": 5
}
},
{
"nodeType": "Expr_FuncCall",
"name": {
"nodeType": "Name",
"parts": [
"printLine"
],
"attributes": {
"startLine": 7,
"endLine": 7
}
},
"args": [
{
"nodeType": "Arg",
"value": {
"nodeType": "Scalar_String",
"value": "Hello World!!!",
"attributes": {
"startLine": 7,
"endLine": 7,
"kind": 1
}
},
"byRef": false,
"unpack": false,
"attributes": {
"startLine": 7,
"endLine": 7
}
}
],
"attributes": {
"startLine": 7,
"endLine": 7
}
}
]
```
There is currently no mechanism to convert JSON back into a node tree. Furthermore, not all ASTs
can be JSON encoded. In particular, JSON only supports UTF-8 strings.
Serialization to XML
--------------------
It is also possible to serialize the node tree to XML using `PhpParser\Serializer\XML->serialize()`
and to unserialize it using `PhpParser\Unserializer\XML->unserialize()`. This is useful for
interfacing with other languages and applications or for doing transformation using XSLT.
```php
<?php
$code = <<<'CODE'
<?php
function printLine($msg) {
echo $msg, "\n";
}
printLine('Hello World!!!');
CODE;
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7);
$serializer = new PhpParser\Serializer\XML;
try {
$stmts = $parser->parse($code);
echo $serializer->serialize($stmts);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
Produces:
```xml
<?xml version="1.0" encoding="UTF-8"?>
<AST xmlns:node="http://nikic.github.com/PHPParser/XML/node" xmlns:subNode="http://nikic.github.com/PHPParser/XML/subNode" xmlns:scalar="http://nikic.github.com/PHPParser/XML/scalar">
<scalar:array>
<node:Stmt_Function line="2">
<subNode:byRef>
<scalar:false/>
</subNode:byRef>
<subNode:params>
<scalar:array>
<node:Param line="2">
<subNode:name>
<scalar:string>msg</scalar:string>
</subNode:name>
<subNode:default>
<scalar:null/>
</subNode:default>
<subNode:type>
<scalar:null/>
</subNode:type>
<subNode:byRef>
<scalar:false/>
</subNode:byRef>
</node:Param>
</scalar:array>
</subNode:params>
<subNode:stmts>
<scalar:array>
<node:Stmt_Echo line="3">
<subNode:exprs>
<scalar:array>
<node:Expr_Variable line="3">
<subNode:name>
<scalar:string>msg</scalar:string>
</subNode:name>
</node:Expr_Variable>
<node:Scalar_String line="3">
<subNode:value>
<scalar:string>
</scalar:string>
</subNode:value>
</node:Scalar_String>
</scalar:array>
</subNode:exprs>
</node:Stmt_Echo>
</scalar:array>
</subNode:stmts>
<subNode:name>
<scalar:string>printLine</scalar:string>
</subNode:name>
</node:Stmt_Function>
<node:Expr_FuncCall line="6">
<subNode:name>
<node:Name line="6">
<subNode:parts>
<scalar:array>
<scalar:string>printLine</scalar:string>
</scalar:array>
</subNode:parts>
</node:Name>
</subNode:name>
<subNode:args>
<scalar:array>
<node:Arg line="6">
<subNode:value>
<node:Scalar_String line="6">
<subNode:value>
<scalar:string>Hello World!!!</scalar:string>
</subNode:value>
</node:Scalar_String>
</subNode:value>
<subNode:byRef>
<scalar:false/>
</subNode:byRef>
</node:Arg>
</scalar:array>
</subNode:args>
</node:Expr_FuncCall>
</scalar:array>
</AST>
```

View file

@ -0,0 +1,84 @@
Code generation
===============
It is also possible to generate code using the parser, by first creating an Abstract Syntax Tree and then using the
pretty printer to convert it to PHP code. To simplify code generation, the project comes with builders which allow
creating node trees using a fluid interface, instead of instantiating all nodes manually. Builders are available for
the following syntactic elements:
* namespaces and use statements
* classes, interfaces and traits
* methods, functions and parameters
* properties
Here is an example:
```php
use PhpParser\BuilderFactory;
use PhpParser\PrettyPrinter;
use PhpParser\Node;
$factory = new BuilderFactory;
$node = $factory->namespace('Name\Space')
->addStmt($factory->use('Some\Other\Thingy')->as('SomeOtherClass'))
->addStmt($factory->class('SomeClass')
->extend('SomeOtherClass')
->implement('A\Few', '\Interfaces')
->makeAbstract() // ->makeFinal()
->addStmt($factory->method('someMethod')
->makePublic()
->makeAbstract() // ->makeFinal()
->setReturnType('bool')
->addParam($factory->param('someParam')->setTypeHint('SomeClass'))
->setDocComment('/**
* This method does something.
*
* @param SomeClass And takes a parameter
*/')
)
->addStmt($factory->method('anotherMethod')
->makeProtected() // ->makePublic() [default], ->makePrivate()
->addParam($factory->param('someParam')->setDefault('test'))
// it is possible to add manually created nodes
->addStmt(new Node\Expr\Print_(new Node\Expr\Variable('someParam')))
)
// properties will be correctly reordered above the methods
->addStmt($factory->property('someProperty')->makeProtected())
->addStmt($factory->property('anotherProperty')->makePrivate()->setDefault(array(1, 2, 3)))
)
->getNode()
;
$stmts = array($node);
$prettyPrinter = new PrettyPrinter\Standard();
echo $prettyPrinter->prettyPrintFile($stmts);
```
This will produce the following output with the standard pretty printer:
```php
<?php
namespace Name\Space;
use Some\Other\Thingy as SomeClass;
abstract class SomeClass extends SomeOtherClass implements A\Few, \Interfaces
{
protected $someProperty;
private $anotherProperty = array(1, 2, 3);
/**
* This method does something.
*
* @param SomeClass And takes a parameter
*/
public abstract function someMethod(SomeClass $someParam) : bool;
protected function anotherMethod($someParam = 'test')
{
print $someParam;
}
}
```

View file

@ -0,0 +1,75 @@
Error handling
==============
Errors during parsing or analysis are represented using the `PhpParser\Error` exception class. In addition to an error
message, an error can also store additional information about the location the error occurred at.
How much location information is available depends on the origin of the error and how many lexer attributes have been
enabled. At a minimum the start line of the error is usually available.
Column information
------------------
In order to receive information about not only the line, but also the column span an error occurred at, the file
position attributes in the lexer need to be enabled:
```php
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array('comments', 'startLine', 'endLine', 'startFilePos', 'endFilePos'),
));
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer);
try {
$stmts = $parser->parse($code);
// ...
} catch (PhpParser\Error $e) {
// ...
}
```
Before using column information its availability needs to be checked with `$e->hasColumnInfo()`, as the precise
location of an error cannot always be determined. The methods for retrieving column information also have to be passed
the source code of the parsed file. An example for printing an error:
```php
if ($e->hasColumnInfo()) {
echo $e->getRawMessage() . ' from ' . $e->getStartLine() . ':' . $e->getStartColumn($code)
. ' to ' . $e->getEndLine() . ':' . $e->getEndColumn($code);
// or:
echo $e->getMessageWithColumnInfo();
} else {
echo $e->getMessage();
}
```
Both line numbers and column numbers are 1-based. EOF errors will be located at the position one past the end of the
file.
Error recovery
--------------
The error behavior of the parser (and other components) is controlled by an `ErrorHandler`. Whenever an error is
encountered, `ErrorHandler::handleError()` is invoked. The default error handling strategy is `ErrorHandler\Throwing`,
which will immediately throw when an error is encountered.
To instead collect all encountered errors into an array, while trying to continue parsing the rest of the source code,
an instance of `ErrorHandler\Collecting` can be passed to the `Parser::parse()` method. A usage example:
```php
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::ONLY_PHP7);
$errorHandler = new PhpParser\ErrorHandler\Collecting;
$stmts = $parser->parse($code, $errorHandler);
if ($errorHandler->hasErrors()) {
foreach ($errorHandler->getErrors() as $error) {
// $error is an ordinary PhpParser\Error
}
}
if (null !== $stmts) {
// $stmts is a best-effort partial AST
}
```
The `NameResolver` visitor also accepts an `ErrorHandler` as a constructor argument.

View file

@ -0,0 +1,152 @@
Lexer component documentation
=============================
The lexer is responsible for providing tokens to the parser. The project comes with two lexers: `PhpParser\Lexer` and
`PhpParser\Lexer\Emulative`. The latter is an extension of the former, which adds the ability to emulate tokens of
newer PHP versions and thus allows parsing of new code on older versions.
This documentation discusses options available for the default lexers and explains how lexers can be extended.
Lexer options
-------------
The two default lexers accept an `$options` array in the constructor. Currently only the `'usedAttributes'` option is
supported, which allows you to specify which attributes will be added to the AST nodes. The attributes can then be
accessed using `$node->getAttribute()`, `$node->setAttribute()`, `$node->hasAttribute()` and `$node->getAttributes()`
methods. A sample options array:
```php
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array(
'comments', 'startLine', 'endLine'
)
));
```
The attributes used in this example match the default behavior of the lexer. The following attributes are supported:
* `comments`: Array of `PhpParser\Comment` or `PhpParser\Comment\Doc` instances, representing all comments that occurred
between the previous non-discarded token and the current one. Use of this attribute is required for the
`$node->getDocComment()` method to work. The attribute is also needed if you wish the pretty printer to retain
comments present in the original code.
* `startLine`: Line in which the node starts. This attribute is required for the `$node->getLine()` to work. It is also
required if syntax errors should contain line number information.
* `endLine`: Line in which the node ends.
* `startTokenPos`: Offset into the token array of the first token in the node.
* `endTokenPos`: Offset into the token array of the last token in the node.
* `startFilePos`: Offset into the code string of the first character that is part of the node.
* `endFilePos`: Offset into the code string of the last character that is part of the node.
### Using token positions
The token offset information is useful if you wish to examine the exact formatting used for a node. For example the AST
does not distinguish whether a property was declared using `public` or using `var`, but you can retrieve this
information based on the token position:
```php
function isDeclaredUsingVar(array $tokens, PhpParser\Node\Stmt\Property $prop) {
$i = $prop->getAttribute('startTokenPos');
return $tokens[$i][0] === T_VAR;
}
```
In order to make use of this function, you will have to provide the tokens from the lexer to your node visitor using
code similar to the following:
```php
class MyNodeVisitor extends PhpParser\NodeVisitorAbstract {
private $tokens;
public function setTokens(array $tokens) {
$this->tokens = $tokens;
}
public function leaveNode(PhpParser\Node $node) {
if ($node instanceof PhpParser\Node\Stmt\Property) {
var_dump(isDeclaredUsingVar($this->tokens, $node));
}
}
}
$lexer = new PhpParser\Lexer(array(
'usedAttributes' => array(
'comments', 'startLine', 'endLine', 'startTokenPos', 'endTokenPos'
)
));
$parser = (new PhpParser\ParserFactory)->create(PhpParser\ParserFactory::PREFER_PHP7, $lexer);
$visitor = new MyNodeVisitor();
$traverser = new PhpParser\NodeTraverser();
$traverser->addVisitor($visitor);
try {
$stmts = $parser->parse($code);
$visitor->setTokens($lexer->getTokens());
$stmts = $traverser->traverse($stmts);
} catch (PhpParser\Error $e) {
echo 'Parse Error: ', $e->getMessage();
}
```
The same approach can also be used to perform specific modifications in the code, without changing the formatting in
other places (which is the case when using the pretty printer).
Lexer extension
---------------
A lexer has to define the following public interface:
void startLexing(string $code, ErrorHandler $errorHandler = null);
array getTokens();
string handleHaltCompiler();
int getNextToken(string &$value = null, array &$startAttributes = null, array &$endAttributes = null);
The `startLexing()` method is invoked with the source code that is to be lexed (including the opening tag) whenever the
`parse()` method of the parser is called. It can be used to reset state or preprocess the source code or tokens. The
passes `ErrorHandler` should be used to report lexing errors.
The `getTokens()` method returns the current token array, in the usual `token_get_all()` format. This method is not
used by the parser (which uses `getNextToken()`), but is useful in combination with the token position attributes.
The `handleHaltCompiler()` method is called whenever a `T_HALT_COMPILER` token is encountered. It has to return the
remaining string after the construct (not including `();`).
The `getNextToken()` method returns the ID of the next token (as defined by the `Parser::T_*` constants). If no more
tokens are available it must return `0`, which is the ID of the `EOF` token. Furthermore the string content of the
token should be written into the by-reference `$value` parameter (which will then be available as `$n` in the parser).
### Attribute handling
The other two by-ref variables `$startAttributes` and `$endAttributes` define which attributes will eventually be
assigned to the generated nodes: The parser will take the `$startAttributes` from the first token which is part of the
node and the `$endAttributes` from the last token that is part of the node.
E.g. if the tokens `T_FUNCTION T_STRING ... '{' ... '}'` constitute a node, then the `$startAttributes` from the
`T_FUNCTION` token will be taken and the `$endAttributes` from the `'}'` token.
An application of custom attributes is storing the exact original formatting of literals: While the parser does retain
some information about the formatting of integers (like decimal vs. hexadecimal) or strings (like used quote type), it
does not preserve the exact original formatting (e.g. leading zeros for integers or escape sequences in strings). This
can be remedied by storing the original value in an attribute:
```php
use PhpParser\Lexer;
use PhpParser\Parser\Tokens;
class KeepOriginalValueLexer extends Lexer // or Lexer\Emulative
{
public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
$tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
if ($tokenId == Tokens::T_CONSTANT_ENCAPSED_STRING // non-interpolated string
|| $tokenId == Tokens::T_ENCAPSED_AND_WHITESPACE // interpolated string
|| $tokenId == Tokens::T_LNUMBER // integer
|| $tokenId == Tokens::T_DNUMBER // floating point number
) {
// could also use $startAttributes, doesn't really matter here
$endAttributes['originalValue'] = $value;
}
return $tokenId;
}
}
```