Archive for June, 2008

Back references

Back references are a means to use a previous captured sub-expression in the regular expression itself. It can be useful in situations such as matching html tags where you want to match the ending tag when the starting tag is not known.

The syntax for back references is: `\1` or any digit above one, maximum number of back references allowed are 99.


<?php
$pattern = '!<(.*?)>.*?</\1>!';
$string = 'some text <tag> text </tag> some text';
preg_match($pattern, $string, $matches);
?>

Back references must refer to capturing sub-expressions, they can not be used with non-capturing sub-expressions. The following will not work, it will raise an error.

$pattern = '!<(?:.*?)>.*?</\1>!';

because you are referencing a sub-expression which does not exist, as it was not captured. It is the same as matching a sub-expression which was not used because of another alternative being used as in the following

$pattern = '!(a|(bc))\1!';

This will not match if the string starts with `a` but will match if the string starts with `bc`.

Comments (1)

Non-capturing Parentheses

Sometimes we add sub-expressions as part of a larger expression but we don’t need the match data for that sub-expression. This is where non-capturing parentheses (aka. grouping-only parentheses) come in use. non-capturing parentheses prevent the sub-expression match being stored in the match array.

The syntax for non-capturing parenthese is: (?:…)


<?php

$string = '<tr><td>table data 1</td><td>table data 2</td></tr>';

$pattern = '/<tr><td>(.*?)<\/td><td>(.*?)<\/td><\/tr>/';

preg_match($pattern, $string, $match);

print_r($match);

?>

The above code snippet will print:

Array
(
[0] => <tr><td>table data 1</td><td>table data 2</td></tr>
[1] => table data 1
[2] => table data 2
)
Assuming that we only require data in the second <td> we can change the sub-expression in the first <td> into non-capturing as follows.


$pattern = '/<tr><td>(?:.*?)<\/td><td>(.*?)<\/td><\/tr>/';

and this will print:

Array
(
[0] => <tr><td>table data 1</td><td>table data 2</td></tr>
[1] => table data 2
)

It is said that the second method is more efficient and faster than the first (capturing everything) and also uses less memory (logical).

Leave a Comment

Named Capture

Named capturing means to capture a part of an expression into a named location, i.e. the match array will contain an element where the key will be the name specified in the named capture and the value will be the matched expression.

The syntax for named capturing is: (?P<name>…)

Look at the following code snippet and it will become clear.


<?php

$string = "<head><title>my title</title></head>";

$pattern = "/<title>(?P<page_title>.*?)<\/title>/";

preg_match($pattern, $string, $match);

echo $match['page_title'];

?>

What are the benefits of named capturing?

  1. Easier to access the captured data rather than having to work out the array index, especially in much larger and complex regular expressions which contain many sub-expressions.
  2. You don’t have to modify existing code, i.e. if another matching sub-expression is added before the named capture you can still access the value of the named capture using the same key. Whereas if named capturing was not used the array index of all the matches after it would change to $i+1.
  3. Easier code readability.

Leave a Comment

Locating the matching brace in vi

viĀ  has excellent syntax highlighting, it also highlights the matching braces and brackets aswell when you enter a corresponding closing one. But sometimes the matching brace could be out of the viewable textarea, such as writing a relatively long method. In vi you can quite easily switch to the matching brace by placing the cursor over the opening or closing brace (in command mode) and typing `%` (shift+5).

Leave a Comment

Indenting source code in vi

To enable automatic indenting of source code add the following line to your .vimrc file


set autoindent

Indenting the whole file

Ever copied and pasted source code in vi and dreaded the idea of having to go through each line and using the tab key, well you don’t have to because it can be done for you automatically with a few key strokes.

press ‘esc’ to go into command mode

then type `gg` (g key twice) to place the cursor at the start of the file

then type `=G` (equals key followed by uppercase g) this is the magic

thats all, you will see the code being indented right before your eyes.

You can also indent part of the file by placing the cursor from where you want the indenting to start and typing `=G`

Indenting a single line

To indent a single line place the cursor on the line you want to indent and press `CTRL+T`. You must be in insert mode for this to work.

Copy and paste float-right problem

When copying and pasting source code with the autoindent feature enabled you will find that each line which vi would indent floats further to the right than the previous line.

To fix this you need to go into command mode and type the following


:set noautoindent

This will temporarily disable the autoindent feature. You can then copy and paste your code (paste your code in insert mode or else you might motice that the first letter or two are missing) and enable the autoindent feature again by typing


:set autoindent

Leave a Comment