LAL: Linear Arrangement Library 21.07.01
A library focused on algorithms on linear arrangements of graphs.
|
Input/Output functions and processing of treebanks. More...
Classes | |
class | _process_treebank_base |
The processor base class. More... | |
class | report_treebank_collection |
Report on a treebank collection. More... | |
class | report_treebank_file |
Report on a treebank file. More... | |
class | treebank_collection_processor |
Automatic processing of treebank collections. More... | |
class | treebank_collection_reader |
A reader for a collection of treebanks. More... | |
class | treebank_error |
Treebank error report class. More... | |
class | treebank_processor |
Automatic processing of treebank files. More... | |
class | treebank_reader |
A reader for a single treebank file. More... | |
Functions | |
std::vector< report_treebank_file > | check_correctness_treebank (const std::string &treebank_filename) noexcept |
Checks the correctness of a treebank collection. | |
std::vector< report_treebank_collection > | check_correctness_treebank_collection (const std::string &main_file_name, size_t n_threads=1) noexcept |
Checks the correctness of a treebank collection. | |
std::optional< graphs::undirected_graph > | read_edge_list_undirected_graph (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a undirected graph in edge list format. | |
std::optional< graphs::directed_graph > | read_edge_list_directed_graph (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a directed graph in edge list format. | |
std::optional< graphs::free_tree > | read_edge_list_free_tree (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a free tree in edge list format. | |
std::optional< graphs::rooted_tree > | read_edge_list_rooted_tree (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a rooted tree in edge list format. | |
template<class G , std::enable_if_t< std::is_base_of_v< graphs::graph, G >, bool > = true> | |
std::optional< G > | read_edge_list (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a graph in edge list format. | |
std::optional< graphs::free_tree > | read_head_vector_free_tree (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a free tree in head vector format. | |
std::optional< graphs::rooted_tree > | read_head_vector_rooted_tree (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a rooted tree in head vector format. | |
template<class T , std::enable_if_t< std::is_base_of_v< graphs::free_tree, T >||std::is_base_of_v< graphs::rooted_tree, T >, bool > = true> | |
std::optional< T > | read_head_vector (const std::string &filename, bool norm=true, bool check_norm=true) noexcept |
Reads a rooted tree in head vector format. | |
treebank_error | process_treebank_collection (const std::string &treebank_collection_main_file, const std::string &output_directory, std::size_t num_threads=1) noexcept |
Automatically process a treebank collection. | |
treebank_error | process_treebank (const std::string &treebank_file, const std::string &output_file) noexcept |
Automatically process a treebank. | |
Variables | |
static constexpr std::size_t | __treebank_feature_size |
The total number of features available. | |
Input/Output functions and processing of treebanks.
This namespace contains the functions for input/output operations.
The most basic operations read a graph from a file. The formats supported for reading are:
Users can also process collection of trees (called treebank files), and collections of treebank files (obviously, a treebank collection). One can process automatically a treebank file (see lal::io::treebank_processor) or a treebank collection (see lal::io::treebank_collection_processor), and iterate through the trees of a treebank file (see lal::io::treebank_reader) and through the treebank files within a treebank collection (see lal::io::treebank_collection_reader).
A treebank file is a file containing only head vectors of trees. Such head vectors may not be formatted correctly (from the standpoint of this library), and so users of the library are also provided with function to check the correctess of such files (see lal::io::check_correctness_treebank and lal::io::check_correctness_treebank_collection). Users are encouraged to read the documentation of these classes to know more about treebank files and collection of treebanks and how to process them with the library.
|
strong |
Possible errors that can arise while processing a collection.
There are several reasons why a treebank collection or a single treebank file could not be processed. Because of this, certain methods return one of these values instead of a plain 'false' value.
Enumerator | |
---|---|
no_error | No error occurred. Returned by: |
no_features | No features at all were given to the processor. Returned by: |
treebank_file_does_not_exist | A treebank was not found in disk. Returned by: |
treebank_file_could_not_be_opened | A treebank file could not be opened. Returned by: |
output_file_could_not_be_opened | Output file could not be opened. Returned by: |
malformed_treebank_file | The treebank file contains errors that should be fixed. In this case, method lal::io::check_correctness_treebank should be run in order to obtain a report on the errors. Returned by: |
main_file_does_not_exist | Main file does not exist. Returned by: |
main_file_could_not_be_opened | Main file could not be opened. Returned by: |
output_directory_does_not_exist | Output directory could not be found. Returned by: |
output_join_file_could_not_be_opened | The file containing the result of processing a treebank collection could not be opened. Returned by: |
treebank_result_file_could_not_be_opened | The resulting file of processing a treebank could not be opened. Returned by: |
some_treebank_file_failed | Processing one or more of the treebanks failed. Returned by: |
malformed_treebank_collection | The treebank collection contains errors that should be fixed. In this case, method lal::io::check_correctness_treebank_collection should be run in order to obtain a report on the errors. Returned by: |
|
strong |
The features that can be computed in automatic processing of treebanks.
Classes lal::io::treebank_collection_processor and lal::io::treebank_processor are designed to process treebanks in an automatic fashion, meaning that they process the trees and calculate a fixed set of features, the results of which are stored in files. In this enumeration users will find a complete list of all the features that can be calculated using those two classes.
Enumerator | |
---|---|
num_nodes | Number of nodes of the tree. |
second_moment_degree | Second moment of degree \(\langle k^2 \rangle\). See lal::properties::moment_degree for details. |
second_moment_degree_in | Second moment of in-degree \(\langle k_{in}^2 \rangle\). See lal::properties::moment_degree_in for details. |
second_moment_degree_out | Second moment of out-degree \(\langle k_{out}^2 \rangle\). See lal::properties::moment_degree_out for details. |
third_moment_degree | Third moment of degree \(\langle k^3 \rangle\). See lal::properties::moment_degree for details. |
third_moment_degree_in | Third moment of in-degree \(\langle k_{in}^3 \rangle\). See lal::properties::moment_degree_in for details. |
third_moment_degree_out | Third moment of out-degree \(\langle k_{out}^3 \rangle\). See lal::properties::moment_degree_out for details. |
num_pairs_independent_edges | Size of the set \(Q(T)\) of this tree \(T\). See lal::properties::num_pairs_independent_edges for details. |
head_initial | Headedness of the tree. In case the tree has only one vertex, outputs a NaN value. See lal::linarr::head_initial for details. |
hubiness | Hubiness of the tree. In case a tree has 3 or less vertices the output is a 'NaN' value. See lal::properties::hubiness for details. |
mean_hierarchical_distance | Mean hierarchical distance of the tree. In case a tree has 1 vertex the output is a 'NaN' value. See lal::properties::mean_hierarchical_distance for details. |
tree_centre | Centre of the tree. This feature spans two columns, one for each possible central vertex. Each column contains an index: the first is always strictly less than the number of vertices, and the second is only valid when its value is strictly less than the number of vertices. See lal::properties::tree_centre for details on the definition of centre of a tree. |
tree_centroid | Centroid of the tree. This feature spans two columns, one for each possible centroidal vertex. Each column contains an index: the first is always strictly less than the number of vertices, and the second is only valid when its value is strictly less than the number of vertices. See lal::properties::tree_centroid for details on the definition of centroid of a tree. |
tree_diameter | Diameter of the tree. See lal::properties::tree_diameter for details. |
num_crossings | Number of edge crossings \(C\). See lal::linarr::num_crossings for details. |
predicted_num_crossings | Prediction of the number of crossings \(C\). See lal::linarr::predicted_num_crossings for details. |
exp_num_crossings | First moment of expectation of \(C\), \(E[C]\). See lal::properties::exp_num_crossings for details. |
var_num_crossings | Variance of \(C\), \(V[C]\). See lal::properties::var_num_crossings for details. |
z_score_num_crossings | z-score of \(C\), \(\frac{C - E[C]}{\sqrt{V[C]}}\). See lal::properties::var_num_crossings_tree for details on how the variance of \(C\), \(V[C]\), is calculated. |
sum_edge_lengths | Sum of length of edges \(D\). See lal::linarr::sum_edge_lengths for details. |
exp_sum_edge_lengths | Expectation of \(D\), \(E[D]\). See lal::properties::exp_sum_edge_lengths for details. |
exp_sum_edge_lengths_projective | Expectation of \(D\) constrained to projective arrangements, \(E_{\mathrm{pr}}[D]\). See lal::properties::exp_sum_edge_lengths_projective for details. |
exp_sum_edge_lengths_planar | Expectation of \(D\) constrained to planar arrangements, \(E_{\mathrm{pl}}[D]\). See lal::properties::exp_sum_edge_lengths_planar for details. |
var_sum_edge_lengths | Variance of \(D\), \(V[D]\). See lal::properties::var_sum_edge_lengths for details. |
z_score_sum_edge_lengths | z-score of \(D\), \(\frac{D - E[D]}{\sqrt{V[D]}}\). See lal::properties::var_sum_edge_lengths for details on how the variance of \(D\), \(V[D]\), is calculated. |
min_sum_edge_lengths | Unconstrained minimum sum of length of edges. See lal::linarr::algorithms_Dmin::Unconstrained_YS, or lal::linarr::algorithms_Dmin::Unconstrained_FC for details. |
min_sum_edge_lengths_planar | Minimum sum of length of edges under the planary constraint. See lal::linarr::min_sum_edge_lengths for details. |
min_sum_edge_lengths_projective | Minimum sum of length of edges under the planary constraint. See lal::linarr::min_sum_edge_lengths_projective for details. |
mean_dependency_distance | Mean dependency distance of the tree. In case a tree has 1 vertex the output is a 'NaN' value. See lal::linarr::mean_dependency_distance for details. |
flux_max_weight | Maximum flux weight. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_weight | Mean flux weight. This is the sum of weights averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of weight. |
flux_min_weight | Minimum flux weight. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_left_span | Maximum left span. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_left_span | Mean left span. This is the sum of left spans averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of left span. |
flux_min_left_span | Minimum left span. In case the tree has 1 vertex, the output is a 'nan' value. See linarr::dependency_flux for details. |
flux_max_right_span | Maximum right span. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_right_span | Mean right span. This is the sum of right spans averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of right span. |
flux_min_right_span | Minimum right span. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_size | Maximum flux size. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_size | Mean flux size. This is the sum of flux sizes averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of flux size. |
flux_min_size | Minimum flux size. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_RL_ratio | Maximum R/L ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_RL_ratio | Mean R/L ratio. This is the sum of R/L ratios averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of R/L ratio. |
flux_min_RL_ratio | Minimum R/L ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_WS_ratio | Maximum W/S ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_WS_ratio | Mean W/S ratio. This is the sum of W/S ratios averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of W/S ratio. |
flux_min_WS_ratio | Minimum W/S ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
tree_type | The type of tree. This feature spans as many columns as types of trees are available in this library. Each column will contain either a 0 or a 1 depending on whether or not the tree can be classified into that type of tree. See lal::graphs::tree_type for a complete list of tree types. |
syntactic_dependency_structure_class | The type of syntactic dependency structure. This feature spans as many columns as types of syntactic dependency structure are available in this library. Each column will contain either a 0 or a 1 depending on whether or not the tree can be classified into that syntactic dependency structure. See lal::linarr::syntactic_dependency_structure for a complete list of types. |
|
noexcept |
Checks the correctness of a treebank collection.
treebank_filename | Name of the treebank file. |
|
noexcept |
Checks the correctness of a treebank collection.
main_file_name | Name of the main file. |
n_threads | Number of threads to use. |
|
inlinenoexcept |
Automatically process a treebank.
This function is an utility to process easily a single treebank file. This function uses the class lal::io::treebank_processor in order to process such a file. The default values of the processor are used, i.e., all features available in lal::io::treebank_feature are computed.
treebank_file | The treebank file name. |
output_file | The output file name. |
|
inlinenoexcept |
Automatically process a treebank collection.
This function is an utility to process easily a collection of treebank files. This function uses the class lal::io::treebank_collection_processor in order to process such a collection, with all its options set to their default value. The default options are:
treebank_collection_main_file | The main file of the treebank collection. |
output_directory | The output . |
num_threads | The number of threads. |
|
noexcept |
Reads a graph in edge list format.
This format consists of a list of all the graph's edges. Each edge is described as a pair of indices of the nodes at each end of the edge. Nodes are labelled with indices starting at 0. The resulting number of nodes of the graph will be the maximum index in the file plus 1.
This function calls lal::io::read_edge_list_undirected_graph, lal::io::read_edge_list_directed_graph, lal::io::read_edge_list_free_tree or lal::io::read_edge_list_rooted_tree depending on the type of the template parameter G.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
G | A graph type. A class that derives from lal::graphs::graph. |
|
noexcept |
Reads a directed graph in edge list format.
This format consists of a list of all the graph's edges. Each edge is described as a pair of indices of the nodes at each end of the edge. Nodes are labelled with indices starting at 0. The resulting number of nodes of the graph will be the maximum index in the file plus 1.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
noexcept |
Reads a free tree in edge list format.
This format consists of a list of all the graph's edges. Each edge is described as a pair of indices of the nodes at each end of the edge. Nodes are labelled with indices starting at 0. The resulting number of nodes of the graph will be the maximum index in the file plus 1.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
noexcept |
Reads a rooted tree in edge list format.
This format consists of a list of all the graph's edges. Each edge is described as a pair of indices of the nodes at each end of the edge. Nodes are labelled with indices starting at 0. The resulting number of nodes of the graph will be the maximum index in the file plus 1.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
noexcept |
Reads a undirected graph in edge list format.
This format consists of a list of all the graph's edges. Each edge is described as a pair of indices of the nodes at each end of the edge. Nodes are labelled with indices starting at 0. The resulting number of nodes of the graph will be the maximum index in the file plus 1.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
noexcept |
Reads a rooted tree in head vector format.
A head vector of an n-vertex tree is a list of non-negative integer numbers. The number at position i denotes the parent node of the vertex at said position. Value '0' denotes the root. In this case, the vertex corresponding to the value '0' is not labelled as a root.
Each tree is formatted as a list of whole, positive numbers (including zero), each representing a node of the tree. The number 0 denotes the root of the tree, and a number at a certain position indicates its parent node. For example, when number 4 is at position 9 it means that node 9 has parent node 4. Therefore, if number 0 is at position 1 it means that node 1 is the root of the tree. A complete example of such a tree's representation is the following
0 3 4 1 6 3
which should be interpreted as
(a) predecessor: 0 3 4 1 6 3 (b) node of the tree: 1 2 3 4 5 6
Note that lines like these are not valid:
(1) 0 2 2 2 2 2 (2) 2 0 0
Line (1) is not valid due to a self-reference in the second position, and (2) not being valid due to containing two '0' (i.e., two roots).
The current contents of the graph will be cleared and replaced by the contents of the file.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
noexcept |
Reads a free tree in head vector format.
A head vector of an n-vertex tree is a list of non-negative integer numbers. The number at position i denotes the parent node of the vertex at said position. Value '0' denotes the root. In this case, the vertex corresponding to the value '0' is not labelled as a root. See method lal::io::read_head_vector for further details.
The current contents of the graph will be cleared and replaced by the contents of the file.
filename | Name of the file. |
norm | Should the tree be normalised? See lal::graphs::free_tree::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
noexcept |
Reads a rooted tree in head vector format.
A head vector of an n-vertex tree is a list of non-negative integer numbers. The number at position i denotes the parent node of the vertex at said position. Value '0' denotes the root. See method lal::io::read_head_vector for further details.
The current contents of the graph will be cleared and replaced by the contents of the file.
filename | Name of the file. |
norm | Should the graph be normalised? See lal::graphs::graph::is_normalised() |
check_norm | If the graph is not to be normalised check whether or not the graph read is normalised. |
|
staticconstexpr |
The total number of features available.
This number equals the total amount of values within the enumeration lal::io::treebank_feature, except the last one (which should never be used).