LAL: Linear Arrangement Library 21.07.01
A library focused on algorithms on linear arrangements of graphs.
Loading...
Searching...
No Matches
lal::io::treebank_reader Class Reference

A reader for a single treebank file. More...

#include <treebank_reader.hpp>

Public Member Functions

treebank_error init (const std::string &file, const std::string &identifier="") noexcept
 Initialises the treebank reader.
 
bool end () const noexcept
 Returns whether there is another tree to be processed.
 
void next_tree () noexcept
 Retrieves the next tree in the file.
 
size_t get_num_trees () const noexcept
 Returns the number of trees processed so far.
 
const std::string & get_identifier () const noexcept
 Returns the identifier corresponding of the treebank.
 
const std::string & get_treebank_filename () const noexcept
 Returns the name of the treebank file.
 
graphs::rooted_tree get_tree () const noexcept
 Returns the current tree.
 
head_vector get_head_vector () const noexcept
 Returns the current head vector.
 
bool is_open () const noexcept
 Can the treebank be read?
 
void set_normalise (bool v) noexcept
 Should trees be normalised?
 
void set_calculate_size_subtrees (bool v) noexcept
 Should the size of the subtrees be calculated?
 
void set_calculate_tree_type (bool v) noexcept
 Should the tree be classified into types?
 
void set_identifier (const std::string &id) noexcept
 Set this treebank's identifier string.
 

Private Attributes

std::string m_treebank_identifier = "none"
 Identifier for the treebank.
 
std::string m_treebank_file = "none"
 Treebank's file name (with the full path).
 
std::ifstream m_treebank
 Handler for main file reading.
 
size_t m_num_trees = 0
 Number of trees in the treebank.
 
std::string m_file_line
 Current line.
 
head_vector m_current_head_vector
 Current head vector.
 
bool m_normalise_tree = true
 Normalise the current tree.
 
bool m_calculate_size_subtrees = true
 Calculate the size of the subtrees of the generated rooted tree.
 
bool m_calculate_tree_type = true
 Calculate the type of tree of the generated tree.
 
bool m_no_more_trees = false
 Have all trees in the file been consumed?
 

Detailed Description

A reader for a single treebank file.

This class, the objects of which will be referred to as the "readers", offers a simple interface for iterating over the trees in a single treebank file, henceforth referred to as the treebank. Each tree is formatted as a list of whole, positive numbers (including zero), each representing a node of the tree. The number 0 denotes the root of the tree, and a number at a certain position indicates its parent node. For example, when number 4 is at position 9 it means that node 9 has parent node 4. Therefore, if number 0 is at position 1 it means that node 1 is the root of the tree. A complete example of such a tree's representation is the following

  0 3 4 1 6 3

which should be interpreted as

(a) predecessor:       0 3 4 1 6 3
(b) node of the tree:  1 2 3 4 5 6

Note that lines like these are not valid:

(1) 0 2 2 2 2 2
(2) 2 0 0

Line (1) is not valid due to a self-reference in the second position, and (2) not being valid due to containing two '0' (i.e., two roots).

In order to use it, this class has to be first initialized with the treebank file and, optionally, a self-descriptive string, i.e., something that identifies the treebank (e.g., an ISO code of a language). Once initialised, the first tree can be retrievend with get_tree. The other trees can be iterated over by calling next_tree. This function can only be called as long as end returns false.

If an object of this class was returned by the class treebank_collection_reader, then methods get_treebank_filename and get_identifier might prove useful for debugging since they return, respectively, the full name (path included) of the treebank and an identifier string.

An example of usage of this class is given in the following piece of code.

// it is advisable to check for errors
const auto err = tbread.init(main_file);
while (not tbread.end()) {
const lal::graphs::rooted_tree t = tbread.get_tree();
// process tree 't'
// ....
tbread.next_tree();
}
Rooted tree graph class.
Definition rooted_tree.hpp:107
A reader for a single treebank file.
Definition treebank_reader.hpp:113
bool end() const noexcept
Returns whether there is another tree to be processed.
Definition treebank_reader.hpp:131
void next_tree() noexcept
Retrieves the next tree in the file.
treebank_error init(const std::string &file, const std::string &identifier="") noexcept
Initialises the treebank reader.
graphs::rooted_tree get_tree() const noexcept
Returns the current tree.

Member Function Documentation

◆ get_num_trees()

size_t lal::io::treebank_reader::get_num_trees ( ) const
inlinenoexcept

Returns the number of trees processed so far.

When method end returns 'true', this method returns the exact amount of trees in the treebank.

◆ init()

treebank_error lal::io::treebank_reader::init ( const std::string & file,
const std::string & identifier = "" )
noexcept

Initialises the treebank reader.

Parameters
fileTreebank file.
identifierIdentifier string for the treebank.
Returns
The type of the error, if any. The list of errors that this method can return is:
Postcondition
The amount of trees processed, m_num_trees, is always set to 0.

◆ is_open()

bool lal::io::treebank_reader::is_open ( ) const
inlinenoexcept

Can the treebank be read?

If the init method returned an error different from lal::io::treebank_error_type::no_error then this returns false.

Returns
Whether the treebank is readable or not.

◆ next_tree()

void lal::io::treebank_reader::next_tree ( )
noexcept

Retrieves the next tree in the file.

Postcondition
Increments the amount of trees found.

◆ set_calculate_size_subtrees()

void lal::io::treebank_reader::set_calculate_size_subtrees ( bool v)
inlinenoexcept

Should the size of the subtrees be calculated?

Parameters
vBoolean value.

◆ set_calculate_tree_type()

void lal::io::treebank_reader::set_calculate_tree_type ( bool v)
inlinenoexcept

Should the tree be classified into types?

See lal::graphs::tree_type for details on the classification.

Parameters
vBoolean value.

◆ set_identifier()

void lal::io::treebank_reader::set_identifier ( const std::string & id)
inlinenoexcept

Set this treebank's identifier string.

This method overrides the contents of m_treebank_identifier. This method is most useful when, after initialising a treebank reader, the identifier string is to be changed in some way.

Parameters
idIdentifier string.

◆ set_normalise()

void lal::io::treebank_reader::set_normalise ( bool v)
inlinenoexcept

Should trees be normalised?

Parameters
vBoolean value.

The documentation for this class was generated from the following file: