LAL: Linear Arrangement Library 21.07.01
A library focused on algorithms on linear arrangements of graphs.
|
A reader for a single treebank file. More...
#include <treebank_reader.hpp>
Public Member Functions | |
treebank_error | init (const std::string &file, const std::string &identifier="") noexcept |
Initialises the treebank reader. | |
bool | end () const noexcept |
Returns whether there is another tree to be processed. | |
void | next_tree () noexcept |
Retrieves the next tree in the file. | |
size_t | get_num_trees () const noexcept |
Returns the number of trees processed so far. | |
const std::string & | get_identifier () const noexcept |
Returns the identifier corresponding of the treebank. | |
const std::string & | get_treebank_filename () const noexcept |
Returns the name of the treebank file. | |
graphs::rooted_tree | get_tree () const noexcept |
Returns the current tree. | |
head_vector | get_head_vector () const noexcept |
Returns the current head vector. | |
bool | is_open () const noexcept |
Can the treebank be read? | |
void | set_normalise (bool v) noexcept |
Should trees be normalised? | |
void | set_calculate_size_subtrees (bool v) noexcept |
Should the size of the subtrees be calculated? | |
void | set_calculate_tree_type (bool v) noexcept |
Should the tree be classified into types? | |
void | set_identifier (const std::string &id) noexcept |
Set this treebank's identifier string. | |
Private Attributes | |
std::string | m_treebank_identifier = "none" |
Identifier for the treebank. | |
std::string | m_treebank_file = "none" |
Treebank's file name (with the full path). | |
std::ifstream | m_treebank |
Handler for main file reading. | |
size_t | m_num_trees = 0 |
Number of trees in the treebank. | |
std::string | m_file_line |
Current line. | |
head_vector | m_current_head_vector |
Current head vector. | |
bool | m_normalise_tree = true |
Normalise the current tree. | |
bool | m_calculate_size_subtrees = true |
Calculate the size of the subtrees of the generated rooted tree. | |
bool | m_calculate_tree_type = true |
Calculate the type of tree of the generated tree. | |
bool | m_no_more_trees = false |
Have all trees in the file been consumed? | |
A reader for a single treebank file.
This class, the objects of which will be referred to as the "readers", offers a simple interface for iterating over the trees in a single treebank file, henceforth referred to as the treebank. Each tree is formatted as a list of whole, positive numbers (including zero), each representing a node of the tree. The number 0 denotes the root of the tree, and a number at a certain position indicates its parent node. For example, when number 4 is at position 9 it means that node 9 has parent node 4. Therefore, if number 0 is at position 1 it means that node 1 is the root of the tree. A complete example of such a tree's representation is the following
0 3 4 1 6 3
which should be interpreted as
(a) predecessor: 0 3 4 1 6 3 (b) node of the tree: 1 2 3 4 5 6
Note that lines like these are not valid:
(1) 0 2 2 2 2 2 (2) 2 0 0
Line (1) is not valid due to a self-reference in the second position, and (2) not being valid due to containing two '0' (i.e., two roots).
In order to use it, this class has to be first initialized with the treebank file and, optionally, a self-descriptive string, i.e., something that identifies the treebank (e.g., an ISO code of a language). Once initialised, the first tree can be retrievend with get_tree. The other trees can be iterated over by calling next_tree. This function can only be called as long as end returns false.
If an object of this class was returned by the class treebank_collection_reader, then methods get_treebank_filename and get_identifier might prove useful for debugging since they return, respectively, the full name (path included) of the treebank and an identifier string.
An example of usage of this class is given in the following piece of code.
|
inlinenoexcept |
Returns the number of trees processed so far.
When method end returns 'true', this method returns the exact amount of trees in the treebank.
|
noexcept |
Initialises the treebank reader.
file | Treebank file. |
identifier | Identifier string for the treebank. |
|
inlinenoexcept |
Can the treebank be read?
If the init method returned an error different from lal::io::treebank_error_type::no_error then this returns false.
|
noexcept |
Retrieves the next tree in the file.
|
inlinenoexcept |
Should the size of the subtrees be calculated?
v | Boolean value. |
|
inlinenoexcept |
Should the tree be classified into types?
See lal::graphs::tree_type for details on the classification.
v | Boolean value. |
|
inlinenoexcept |
Set this treebank's identifier string.
This method overrides the contents of m_treebank_identifier. This method is most useful when, after initialising a treebank reader, the identifier string is to be changed in some way.
id | Identifier string. |
|
inlinenoexcept |
Should trees be normalised?
v | Boolean value. |