LAL: Linear Arrangement Library 24.10.00
A library focused on algorithms on linear arrangements of graphs.
|
Input/Output functions and processing of treebanks. More...
Classes | |
class | _treebank_processor_base |
The processor base class. More... | |
class | head_vector_error |
Head vector error report class. More... | |
class | treebank_collection_processor |
Automatic processing of treebank collections. More... | |
class | treebank_collection_reader |
A reader for a collection of treebanks. More... | |
class | treebank_collection_report |
Report on a treebank collection. More... | |
struct | treebank_collection_report_location |
An auxiliary struct in replacement of std::tuple. More... | |
class | treebank_file_error |
Treebank file error report class. More... | |
class | treebank_file_report |
Report on a treebank file. More... | |
class | treebank_processor |
Automatic processing of treebank files. More... | |
class | treebank_reader |
A reader for a single treebank file. More... | |
Functions | |
std::vector< head_vector_error > | check_correctness_head_vector (const head_vector &head_vector) noexcept |
Checks the correctness of a head vector. | |
std::vector< head_vector_error > | check_correctness_head_vector (const std::string &head_vector_str) noexcept |
Checks the correctness of a head vector. | |
treebank_file_report | check_correctness_treebank (const std::string &treebank_filename) noexcept |
Checks the correctness of a treebank collection. | |
treebank_collection_report | check_correctness_treebank_collection (const std::string &main_file_name, const std::size_t n_threads=1) noexcept |
Checks the correctness of a treebank collection. | |
std::optional< graphs::undirected_graph > | read_edge_list_undirected_graph (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a undirected graph in edge list format. | |
std::optional< graphs::directed_graph > | read_edge_list_directed_graph (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a directed graph in edge list format. | |
std::optional< graphs::free_tree > | read_edge_list_free_tree (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a free tree in edge list format. | |
std::optional< graphs::rooted_tree > | read_edge_list_rooted_tree (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a rooted tree in edge list format. | |
template<class graph_t , std::enable_if_t< std::is_base_of_v< graphs::graph, graph_t >, bool > = true> | |
std::optional< graph_t > | read_edge_list (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a graph in edge list format. | |
std::optional< graphs::free_tree > | read_head_vector_free_tree (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a free tree in head vector format. | |
std::optional< graphs::rooted_tree > | read_head_vector_rooted_tree (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a rooted tree in head vector format. | |
template<class tree_t , std::enable_if_t< std::is_base_of_v< graphs::tree, tree_t >, bool > = true> | |
std::optional< tree_t > | read_head_vector (const std::string &filename, const bool norm=true, const bool check_norm=true) noexcept |
Reads a rooted tree in head vector format. | |
treebank_file_error | process_treebank_collection (const std::string &treebank_collection_main_file, const std::string &output_directory, const std::size_t num_threads=1) noexcept |
Automatically process a treebank collection. | |
constexpr std::string_view | treebank_feature_string (const io::treebank_feature_type &tf) noexcept |
Converts a lal::io::treebank_feature_type value into a string. | |
constexpr std::size_t | treebank_feature_to_index (const io::treebank_feature_type &tf) noexcept |
Returns the index of the input treebank feature. | |
constexpr io::treebank_feature_type | index_to_treebank_feature (const std::size_t idx) noexcept |
Returns the treebank feature corresponding to the input index. | |
constexpr std::string_view | treebank_feature_index_to_string (const std::size_t idx) noexcept |
Returns the treebank feature corresponding to the index as a string. | |
treebank_file_error | process_treebank (const std::string &treebank_file, const std::string &output_file) noexcept |
Automatically process a treebank. | |
Variables | |
constexpr std::size_t | __treebank_feature_size |
The total number of features available. | |
Input/Output functions and processing of treebanks.
This namespace contains the functions for input/output operations.
The most basic operations read a graph from a file. The formats supported for reading are:
Users can also process collection of trees (called treebank files), and collections of treebank files (obviously, a treebank collection). One can process automatically a treebank file (see lal::io::treebank_processor) or a treebank collection (see lal::io::treebank_collection_processor), and iterate through the trees of a treebank file (see lal::io::treebank_reader) and through the treebank files within a treebank collection (see lal::io::treebank_collection_reader).
A treebank file is a file containing only head vectors of trees. Such head vectors may not be formatted correctly (from the standpoint of this library), and so users of the library are also provided with function to check the correctess of such files (see lal::io::check_correctness_treebank and lal::io::check_correctness_treebank_collection). Users are encouraged to read the documentation of these classes to know more about treebank files and collection of treebanks and how to process them with the library.
|
strong |
The different types of errors that can be present in a head vector.
|
strong |
The features that can be computed in automatic processing of treebanks.
Classes lal::io::treebank_collection_processor and lal::io::treebank_processor are designed to process treebanks in an automatic fashion, meaning that they process the trees and calculate a fixed set of features, the results of which are stored in files. In this enumeration users will find a complete list of all the features that can be calculated using those two classes.
Enumerator | |
---|---|
num_nodes | Number of nodes of the tree. |
second_moment_degree | Second moment of degree \(\langle k^2 \rangle\). See lal::properties::moment_degree for details. |
second_moment_degree_out | Second moment of out-degree \(\langle k_{out}^2 \rangle\). See lal::properties::moment_out_degree for details. |
third_moment_degree | Third moment of degree \(\langle k^3 \rangle\). See lal::properties::moment_degree for details. |
third_moment_degree_out | Third moment of out-degree \(\langle k_{out}^3 \rangle\). See lal::properties::moment_out_degree for details. |
sum_squared_degrees | Sum of squared degrees. See lal::properties::sum_powers_degrees for details. |
sum_squared_out_degrees | Sum of squared out-degrees. See lal::properties::sum_powers_out_degrees for details. |
sum_cubed_degrees | Sum of cube degrees. See lal::properties::sum_powers_degrees for details. |
sum_cubed_out_degrees | Sum of cube out-degrees. See lal::properties::sum_powers_out_degrees for details. |
num_pairs_independent_edges | Size of the set \(Q(T)\) of this tree \(T\). See lal::properties::num_pairs_independent_edges for details. |
head_initial | Headedness of the tree. In case the tree has only one vertex, outputs a NaN value. See lal::linarr::head_initial for details. |
hubiness | Hubiness of the tree. In case a tree has 3 or less vertices the output is a 'NaN' value. See lal::properties::hubiness for details. |
sum_hierarchical_distances | Sum of hierarchical distances of the tree. See lal::properties::sum_hierarchical_distances for details. |
mean_hierarchical_distance | Mean hierarchical distance of the tree. In case a tree has 1 vertex the output is a 'NaN' value. See lal::properties::mean_hierarchical_distance for details. |
tree_centre | Centre of the tree. This feature spans two columns, one for each possible central vertex. Each column contains an index: the first is always strictly less than the number of vertices, and the second is only valid when its value is strictly less than the number of vertices. See lal::properties::tree_centre for details on the definition of centre of a tree. |
tree_centroid | Centroid of the tree. This feature spans two columns, one for each possible centroidal vertex. Each column contains an index: the first is always strictly less than the number of vertices, and the second is only valid when its value is strictly less than the number of vertices. See lal::properties::tree_centroid for details on the definition of centroid of a tree. |
tree_diameter | Diameter of the tree. See lal::properties::tree_diameter for details. |
tree_caterpillar_distance | Caterpillar distance of the tree. See lal::properties::tree_caterpillar_distance for details. |
num_crossings | Number of edge crossings \(C\). See lal::linarr::num_crossings for details. |
predicted_num_crossings | Prediction of the number of crossings \(C\). See lal::linarr::predicted_num_crossings for details. |
exp_num_crossings | First moment of expectation of \(C\), \(E[C]\). See lal::properties::exp_num_crossings for details. |
var_num_crossings | Variance of \(C\), \(V[C]\). See lal::properties::var_num_crossings for details. |
z_score_num_crossings | z-score of \(C\), \(\frac{C - E[C]}{\sqrt{V[C]}}\). See lal::properties::var_num_crossings_tree for details on how the variance of \(C\), \(V[C]\), is calculated. |
sum_edge_lengths | Sum of length of edges \(D\). See lal::linarr::sum_edge_lengths for details. |
exp_sum_edge_lengths | Expectation of \(D\), \(E[D]\). See lal::properties::exp_sum_edge_lengths for details. |
exp_sum_edge_lengths_bipartite | Expectation of \(D\) constrained to bipartite arrangements, \(E_{\mathrm{bip}}[D]\). See lal::properties::exp_sum_edge_lengths_bipartite for details. |
exp_sum_edge_lengths_projective | Expectation of \(D\) constrained to projective arrangements, \(E_{\mathrm{pr}}[D]\). See lal::properties::exp_sum_edge_lengths_projective for details. |
exp_sum_edge_lengths_planar | Expectation of \(D\) constrained to planar arrangements, \(E_{\mathrm{pl}}[D]\). See lal::properties::exp_sum_edge_lengths_planar for details. |
var_sum_edge_lengths | Variance of \(D\), \(V[D]\). See lal::properties::var_sum_edge_lengths for details. |
z_score_sum_edge_lengths | z-score of \(D\), \(\frac{D - E[D]}{\sqrt{V[D]}}\). See lal::properties::var_sum_edge_lengths for details on how the variance of \(D\), \(V[D]\), is calculated. |
min_sum_edge_lengths | Unconstrained minimum sum of length of edges. See lal::linarr::algorithms_Dmin for details. The algorithm used is lal::linarr::algorithms_Dmin::Shiloach. |
min_sum_edge_lengths_bipartite | Minimum sum of length of edges over bipartite arrangements. See lal::linarr::min_sum_edge_lengths_bipartite for details. |
min_sum_edge_lengths_planar | Minimum sum of length of edges under the planarity constraint. See lal::linarr::min_sum_edge_lengths_planar for details. |
min_sum_edge_lengths_projective | Minimum sum of length of edges under the planarity constraint. See lal::linarr::min_sum_edge_lengths_projective for details. |
max_sum_edge_lengths | Maximum sum of length of edges over all arrangements. See lal::linarr::max_sum_edge_lengths_all for details. |
max_sum_edge_lengths_1_thistle | Maximum sum of length of edges over arrangements with 1 thistle vertex. See lal::linarr::max_sum_edge_lengths_1_eq_thistle for details. |
max_sum_edge_lengths_bipartite | Maximum sum of length of edges among bipartite arrangements. See lal::linarr::max_sum_edge_lengths_bipartite for details. |
max_sum_edge_lengths_planar | Maximum sum of length of edges under the planarity constraint. See lal::linarr::max_sum_edge_lengths_planar for details. |
max_sum_edge_lengths_projective | Maximum sum of length of edges under the planarity constraint. See lal::linarr::max_sum_edge_lengths_projective for details. |
mean_dependency_distance | Mean dependency distance of the tree. In case a tree has 1 vertex the output is a 'NaN' value. See lal::linarr::mean_dependency_distance for details. |
flux_max_weight | Maximum flux weight. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_weight | Mean flux weight. This is the sum of weights averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of weight. |
flux_min_weight | Minimum flux weight. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_left_span | Maximum left span. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_left_span | Mean left span. This is the sum of left spans averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of left span. |
flux_min_left_span | Minimum left span. In case the tree has 1 vertex, the output is a 'nan' value. See linarr::dependency_flux for details. |
flux_max_right_span | Maximum right span. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_right_span | Mean right span. This is the sum of right spans averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of right span. |
flux_min_right_span | Minimum right span. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_size | Maximum flux size. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_size | Mean flux size. This is the sum of flux sizes averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of flux size. |
flux_min_size | Minimum flux size. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_RL_ratio | Maximum R/L ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_RL_ratio | Mean R/L ratio. This is the sum of R/L ratios averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of R/L ratio. |
flux_min_RL_ratio | Minimum R/L ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_max_WS_ratio | Maximum W/S ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
flux_mean_WS_ratio | Mean W/S ratio. This is the sum of W/S ratios averaged by the number of fluxes (the number of vertices of the tree minus 1). In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details on the definition of W/S ratio. |
flux_min_WS_ratio | Minimum W/S ratio. In case the tree has 1 vertex, the output is a 'nan' value. See lal::linarr::dependency_flux for details. |
tree_type | The type of tree. This feature spans as many columns as types of trees are available in this library. Each column will contain either a 0 or a 1 depending on whether or not the tree can be classified into that type of tree. See lal::graphs::tree_type for a complete list of tree types. |
syntactic_dependency_tree_class | The type of syntactic dependency structure. This feature spans as many columns as types of syntactic dependency structure are available in this library. Each column will contain either a 0 or a 1 depending on whether or not the tree can be classified into that syntactic dependency structure. See lal::linarr::syntactic_dependency_tree_type for a complete list of types. |
|
strong |
Possible errors that can arise while processing a collection.
There are several reasons why a treebank collection or a single treebank file could not be processed. Because of this, certain methods return one of these values instead of a plain 'false' value.
Enumerator | |
---|---|
no_error | No error occurred. Returned by: |
no_features | No features at all were given to the processor. Returned by: |
treebank_file_does_not_exist | A treebank was not found in disk. Returned by: |
treebank_file_could_not_be_opened | A treebank file could not be opened. Returned by: |
output_file_could_not_be_opened | Output file could not be opened. Returned by: |
malformed_treebank_file | The treebank file contains errors that should be fixed. In this case, method lal::io::check_correctness_treebank should be run in order to obtain a report on the errors. Returned by: |
main_file_does_not_exist | Main file does not exist. Returned by: |
main_file_could_not_be_opened | Main file could not be opened. Returned by: |
output_directory_could_not_be_created | Output directory could not be created. If a directory output does not exist, the library will attempt to create it. If this fails then this error is returned. Returned by: |
output_join_file_could_not_be_opened | The file containing the result of processing a treebank collection could not be opened. Returned by: |
treebank_result_file_could_not_be_opened | The resulting file of processing a treebank could not be opened. Returned by: |
some_treebank_file_failed | Processing one or more of the treebanks failed. Returned by: |
malformed_treebank_collection | The treebank collection contains errors that should be fixed. In this case, method lal::io::check_correctness_treebank_collection should be run in order to obtain a report on the errors. Returned by: |
|
nodiscardnoexcept |
Checks the correctness of a head vector.
head_vector | A head vector. |
|
nodiscardnoexcept |
Checks the correctness of a head vector.
The head vector is given in a string.
head_vector_str | A string containing a head vector. |
|
nodiscardnoexcept |
Checks the correctness of a treebank collection.
treebank_filename | Name of the treebank file. |
|
nodiscardnoexcept |
Checks the correctness of a treebank collection.
main_file_name | Name of the main file. |
n_threads | Number of threads to use. |
|
inlinenodiscardnoexcept |
Automatically process a treebank.
This function is an utility to process easily a single treebank file. This function uses the class lal::io::treebank_processor in order to process such a file. The default values of the processor are used, i.e., all features available in lal::io::treebank_feature_type are computed.
treebank_file | The treebank file name. |
output_file | The output file name. |
|
inlinenodiscardnoexcept |
Automatically process a treebank collection.
This function is an utility to process easily a collection of treebank files. This function uses the class lal::io::treebank_collection_processor in order to process such a collection, with all its options set to their default value. The default options are:
treebank_collection_main_file | The main file of the treebank collection. |
output_directory | The output . |
num_threads | The number of threads. |
|
inlinenodiscardnoexcept |
Reads a graph in edge list format.
See File with edge list format for further details on the format.
This function calls lal::io::read_edge_list_undirected_graph, lal::io::read_edge_list_directed_graph, lal::io::read_edge_list_free_tree or lal::io::read_edge_list_rooted_tree depending on the type of the template parameter G.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
graph_t | A graph type. A class that derives from lal::graphs::graph. |
|
nodiscardnoexcept |
Reads a directed graph in edge list format.
See File with edge list format for further details on the format.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
nodiscardnoexcept |
Reads a free tree in edge list format.
See File with edge list format for further details on the format.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
nodiscardnoexcept |
Reads a rooted tree in edge list format.
See File with edge list format for further details on the format.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
nodiscardnoexcept |
Reads a undirected graph in edge list format.
See File with edge list format for further details on the format.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
inlinenodiscardnoexcept |
Reads a rooted tree in head vector format.
See Head vector for details on what a head vector is.
The current contents of the graph will be cleared and replaced by the contents of the file.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
nodiscardnoexcept |
Reads a free tree in head vector format.
See Head vector for details on what a head vector is.
The current contents of the graph will be cleared and replaced by the contents of the file.
filename | Name of the file. |
norm | Should the tree be normalized? See lal::graphs::free_tree::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
nodiscardnoexcept |
Reads a rooted tree in head vector format.
See Head vector for details on what a head vector is.
The current contents of the graph will be cleared and replaced by the contents of the file.
filename | Name of the file. |
norm | Should the graph be normalized? See lal::graphs::graph::is_normalized() |
check_norm | If the graph is not to be normalized check whether or not the graph read is normalized. |
|
inlineconstexprnoexcept |
Converts a lal::io::treebank_feature_type value into a string.
tf | A treebank feature. |
|
inlineconstexpr |
The total number of features available.
This number equals the total amount of values within the enumeration lal::io::treebank_feature_type, except the last one (which should never be used).