profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/pcanal/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.

gxbert/gxbert 1

Modularization and Vectorization of the Geant4 Bertini Cascade Model

pcanal/cmssw 1

CMS Offline Software

pcanal/build-configuration 0

Modify Jenkins matrix configuration through trigger phrases on GitHub

pcanal/buildfile2cmake 0

Creates CMake files for SCRAM projects.

pcanal/celeritas-docs 0

Misc documents related to the project

pcanal/clad 0

clad -- automatic differentiation for C/C++

pcanal/CMake 0

Mirror of CMake upstream repository

pcanal/GeantExascalePilot 0

Geant Exascale Pilot application

pcanal/groot-bench 0

groot-bench gathers programs to benchmark r/w performances of groot

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 toc: true toc_sticky: true --- -ROOT provides the {% include ref class="TTree" %} and the {% include ref class="TNtuple" %} class to store large quantities of same-class objects.<br>-A tree is a typical data container used for example by all LHC (Large Hadron Collider) experiments.<br>-Trees are optimized to reduce disk space and enhance access speed.+## Introducing `TTree` -A tree consists of a list of independent columns, called branches. The {% include ref class="TBranch" %} class represents a branch. A branch can contain all kind of data, such as objects or arrays in addition to all the simple types.+As introduced in → [Storing columnar data in a ROOT file and reading it back]({{ '/manual/root_files/#storing-columnar-data-in-a-root-file-and-reading-it-back' | relative_url }}),+ROOT can handle large columnar datasets.+In the aforementioned section, we made use of {% include ref class="RDataFrame" namespace="ROOT" %} to write and+read back a simple dataset.+RDataFrame traditionally relies on {% include ref class="TTree" %} for columnar data storage, used for example+by all LHC (Large Hadron Collider) experiments.+Trees are optimized for reduced disk space and selecting, high-throughput columnar access with reduced memory usage. -A {% include ref class="TNtuple" %} is a {% include ref class="TTree" %}, which is limited to contain only floating-point numbers.+In addition to the documentation in this manual, we recommend to take a look at the TTree tutorials:  {% include tutorials name="Tree" url="tree" %}  > **RNTuple** >-> [RNTuple](https://root.cern/doc/master/md_tree_ntuple_v7_doc_README.html){:target="_blank"} (for N-tuple and nested tuple) is the experimental evolution of {% include ref class="TTree" %} columnar data storage. `RNTuple` introduces new interfaces that are more robust.+> [RNTuple](https://root.cern/doc/master/md_tree_ntuple_v7_doc_README.html){:target="_blank"} is the experimental evolution of {% include ref class="TTree" %} columnar data storage. {% include ref class="RNTuple" namespace="ROOT::Experimental" %} introduces robust interfaces, a high-performance storage layout, and an asynchronous, thread-safe scheduling. -## Tree classes--ROOT provides numerous classes for trees and branches, of which the following are among the most used:--- [TTree](https://root.cern/doc/master/classTTree.html){:target="_blank"}: Represents a columnar data set. Any C++ type can be stored in its columns.--- [TNtuple](https://root.cern/doc/master/classTNtuple.html){:target="_blank"}: A simple `TTree` restricted to a list of float variables only.--- [TBranch](https://root.cern/doc/master/classTBranch.html){:target="_blank"}: Organizes columns, i.e. branches, of a `TTree`.--- [TChain](https://root.cern/doc/master/classTChain.html){:target="_blank"}: A list of ROOT files containing `TTree` objects.---## Working with trees--ROOT offers many possibilities to work with trees, for example:--- [Creating a tree](#creating-a-tree)-- [Creating a tree from a folder structure](#creating-a-tree-from-a-folder-structure)-- [Filling a tree](#filling-a-tree)-- [Writing a tree](#writing-a-tree)-- [Printing the summary of a tree](#printing-the-summary-of-a-tree)-- [Showing an entry of a tree](#showing-an-entry-of-a-tree)-- [Scanning trees](#scanning-trees)--### Creating a tree--- Use the {% include ref class="TTree" %} constructor to create a tree.--_**Example**_--{% highlight C++ %}-   TTree t("MyTree","Example Tree");-{% endhighlight %}--It creates a tree with the title `Example Tree`.--_**Example: A simple tree**_--The following script builds a {% include ref class="TTree" %} from an ASCII file containing-statistics about the staff at CERN. Both, `staff.C` and `staff.dat` are in available in-`$ROOTSYS/tutorials/tree`.--The following script declares a structure called `staff_t`. It opens the ASCII file, creates-a ROOT file and a `TTree`. Then it creates one branch with the-[TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"}-method.<br/>The first parameter of the `Branch()` method is the branch name. <br/>The second-parameter is the address from which the first leaf is to be read. In this example, it is-the address of the structure staff. Once the branch is defined,-the script reads the data from the ASCII file into the `staff_t`-structure and fills the tree. The ASCII file is closed, and the ROOT file is written to-disk saving the tree. Trees and histograms are created in the current directory, which is-the ROOT file in our example. Hence an `f->Write()` saves the tree.--{% highlight C++ %}-{-// Create the structure to hold the variables for the branch.-   struct staff_t {-   Int_t cat;-   Int_t division;-   Int_t flag;-   Int_t age;-   Int_t service;-   Int_t children;-   Int_t grade;-   Int_t step;-   Int_t nation;-   Int_t hrweek;-   Int_t cost;-   };-   staff_t staff;--// Open the ASCII file.-   FILE *fp = fopen("staff.dat","r");-   char line[81];--// Create a new ROOT file.-   TFile *f = new TFile("staff.root","RECREATE");--// Create a TTree.-   TTree *tree = new TTree("T","Staff data from ASCII file");--// Create one branch with all information from the structure.-   tree->Branch("staff",&staff.cat,"cat/I:division:flag:age:service:-   children:grade:step:nation:hrweek:cost");--// Fill the tree from the values in ASCII file.-   while (fgets(&line,80,fp)) {-      sscanf(&line[0],"%d%d%d%d",&staff.cat,&staff.division,-      &staff.flag,&staff.age);-      sscanf(&line[13],"%d%d%d%d",&staff.service,&staff.children,-      &staff.grade,&staff.step);-      sscanf(&line[24],"%d%d%d",&staff.nation,&staff.hrweek,-      &staff.cost);-      tree->Fill();-   }--// Check what the tree looks like.-   tree->Print();-   fclose(fp);-   f->Write();-}-{% endhighlight %}--<p><a name="example-building-a-tree-from-an-ascii-file"></a></p>-_**Example: Building a tree from an ASCII file**_---The tutorial {% include tutorial name="cernbuild" %} provides an example how to build a {% include ref class="TTree" %} from an ASCII file.-The input file is `cernstaff.dat` that contains statistics about the staff at CERN.--The `cernbuild.C` ROOT macro creates a root file (`cernstaff.root`) and prints the tree `T` and its branches with [TTree::Print()](https://root.cern/doc/master/classTTree.html#a7a0006d38d5066b533e040aa16f97094){:target="_blank"}.--{% highlight C++ %}-root [0] .x cernbuild.C-******************************************************************************-*Tree    :T         : CERN 1988 staff data                                   *-*Entries :     3354 : Total =          176339 bytes  File  Size =      15005 *-*        :          : Tree compression factor =   2.74                       *-******************************************************************************-*Br    0 :Category  : Category/I                                             *-*Entries :     3354 : Total  Size=      14073 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-*Br    1 :Flag      : Flag/i                                                 *-*Entries :     3354 : Total  Size=      14049 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-*Br    2 :Age       : Age/I                                                  *-*Entries :     3354 : Total  Size=      14043 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-*Br    3 :Service   : Service/I                                              *-*Entries :     3354 : Total  Size=      14067 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-...-...-...-{% endhighlight %}--### Creating a tree from a folder structure--You can build a folder structure and create a tree with branches for each of the sub-folders.--_**Example**_--`TTree folder_tree("MyFolderTree","/MyFolder");`--`MyFolder` is the top folder. `/` indicates the {% include ref class="TTree" %}  constructor that a folder is being used.-You can fill the tree by placing the data into the folder structure and then calling the [TTree::Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} method.---### Filling a tree+> **RDataFrame**+>+> To access TTree data, please use {% include ref class="RDataFrame" namespace="ROOT" %}.+> `TTree` provides interfaces for low-level, expert usage. -- Use the [TTree:Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} method to fill a {% include ref class="TTree" %} instance.+### The tree and its data -A loop on all defined branches (see → [Branches](#branches)) is executed.+A `TTree` behaves like an array of a data structure that resides on storage - except for one entry (or row, in database language).+That entry is accessible in memory: you can load any tree entry, ideally sequentially.+You can provide your own storage for the values of the columns of the current entry, in the form of variables.+In this case you have to tell the `TTree` about the addresses of these variables; either by calling [`TTree::SetBranchAddress()`](https://root.cern/doc/master/classTTree.html#a39b867210e4a77ef44917fd5e7898a1d), or by passing the variable when creating the branch for writing.+When "filling" (writing) the `TTree`, it will read the values out of these variables;+when reading back a `TTree` entry, it will write the values it read from storage into your variables. -### Writing a tree+### Branches and leaves -The data of a tree are saved in a ROOT file (see → [ROOT files]({{ '/manual/root_files' | relative_url }})).+A tree consists of a list of independent columns, called branches. A branch can contain values of any fundamental type, C++ objects known to ROOT's type system, or collections of those.+When reading a tree, you can select which subset of branches should be read.+This allows you to optimize read throughput for a given analysis, and is one of the main motivations for storing data in columnar format. -- Use the [TTree::Write()](https://root.cern/doc/master/classTTree.html#af6f2d9ae4048ad85fcae5d2afa05100f){:target="_blank"} method to write the tree into a ROOT file.+Branches are represented by {% include ref class="TBranch" %} and its derived classes. -The `TTree::Write()` method is needed to write the ROOT file header.+While `TBranch` represent structure, objects inheriting from {% include ref class="TLeaf" %} give access to the actual data.+Originally, any columnar data was accessible through a `TLeaf`; these days, some of the `TBranch`-derived classes provide data access themselves, such as {% include ref class="TBranchElement" %}. -When writing a {% include ref class="TTree" %} to a ROOT file and if the ROOT file size reaches the value stored in the [TTree::GetMaxTreeSize()](https://root.cern/doc/master/classTTree.html#aca38baf017a203ddb3119a9ab7283cd9){:target="_blank"}, the current ROOT file is closed and a new ROOT file is created. If the original ROOT file is named `myfile.root`, the subsequent ROOT files are named `myfile_1.root`, `myfile_2.root`, etc.+### Baskets, clusters and the tree header -**Autosave**+Every branch or leaf stores the data for its entries in buffers of a size that can be specified during branch creation (default: 32000 bytes).+Once the buffer is full, it gets compressed; the compressed buffer is called _basket_.+These baskets are written into the ROOT file.+Branches with more data per tree entry will fill more baskets than branches with less data per tree entry.+Conversely, baskets can hold many tree entries if their branch stores only a few bytes per tree entry.+This means that generally, all baskets - also of different branches - will contain data of different tree entry ranges. -`Autosave` gives you the option to save all branch buffers every n byte. It is recommended to use `Autosave` for large acquisitions. If the acquisition fails to complete, you can recover the ROOT file and all the contents since the last `Autosave`.+To allow more efficient pre-fetching and better chunking of tree data stored in ROOT files, TTree groups baskets into _clusters_.+A cluster contains all the data of a given entry range.+Trees will close baskets that are not yet full when reaching the tree entry at a cluster boundary. -- Use the [TTree::SetAutosave()](https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8){:target="_blank"} method to set the number of bytes between `Autosave`.+TTree finds the baskets for a given entry for a given branch by means of a _header_ stored in the file.+This header also contains other auxilliary metadata.+When reading a `TTree` object, only this header is actually deserialized, until the tree's entries are loaded.+Multiple updates of these headers can often be found in files (`treename;1`, `treename;2` etc, called cycles, see → [I/O]({{ '/manual/root_io' | relative_url }})).+Only the last one (also accessible as `treename`) knows about all written baskets. -You can also use [TTree::SetAutosave()](https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8){:target="_blank"} in the acquisition loop every n entry. -### Printing the summary of a tree+### `TNtuple`, the high-performance spread-sheet -- Use the [TTree::Print(Option_t * option = "")](https://root.cern/doc/master/classTTree.html#a7a0006d38d5066b533e040aa16f97094){:target="_blank"} method to print a summary of the tree contents.--- `option = "all"`: Friend trees are also printed.-- `option = "toponly"`:  Only the top level branches are printed.-- `option = "clusters"`: Information about the cluster of baskets is printed.+For convenience, ROOT also provides the {% include ref class="TNtuple" %} class which is a tree whose branches contain only numbers of type `float`, one per tree entry.+It derives from {% include ref class="TTree" %} and is constructed with a list of column names separated by `:`.  _**Example**_  {% highlight C++ %}-root[] TFile f("cernstaff.root")-root[] T->TTree::Print()--******************************************************************************-*Tree    :T         : CERN 1988 staff data                                   *-*Entries :     3354 : Total =          175531 bytes  File  Size =      47246 *-*        :          : Tree compression factor =   3.69                       *-******************************************************************************-*Br    0 :Category  : Category/I                                             *-*Entries :     3354 : Total  Size=      13985 bytes  File Size  =       4919 *-*Baskets :        1 : Basket Size=      32000 bytes  Compression=   2.74     *-*............................................................................*-*Br    1 :Flag      : Flag/i                                                 *-*Entries :     3354 : Total  Size=      13965 bytes  File Size  =       2165 *-*Baskets :        1 : Basket Size=      32000 bytes  Compression=   6.23     *-*............................................................................*-*Br    2 :Age       : Age/I                                                  *-*Entries :     3354 : Total  Size=      13960 bytes  File Size  =       3489 *-*Baskets :        1 : Basket Size=      32000 bytes  Compression=   3.86     *-*............................................................................*-*Br    3 :Service   : Service/I                                              *-*Entries :     3354 : Total  Size=      13980 bytes  File Size  =       2214 *-...-...-...+// Create an n-tuple with the columns `Potential`, `Current`, `Temperature`, `Pressure`,+// each holding one `float` per tree entry.+TNtuple ntp("ntp","Example N-Tuple","Potential:Current:Temperature:Pressure"); {% endhighlight %} -### Showing an entry of a tree--- Use the [TTree::Show()](https://root.cern/doc/master/classTTree.html#a10e5e7424059bc7d17502331b41b0c16){:target="_blank"} method to access one entry of a tree.--_**Example**_ -Showing an entry from the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)).+## Writing a tree -{% highlight C++ %}-root[] TFile f("cernstaff.root")-root[] T->Show(42)--======> EVENT:42-Category = 301-Flag = 13-Age = 56-Service = 31-Children = 0-Grade = 9-Step = 8-Hrweek = 40-Cost = 8645-Division = EP-Nation = CH-{% endhighlight %}--### Scanning trees--- Use the [TTree::Scan()](https://root.cern/doc/master/classTTree.html#af8a886acab51b16d8ddbf65667c035e4){:target="_blank"} method to display all values of the list of leaves.+When writing a `TTree` you first want to create a `TFile`+(see → [ROOT files]({{ '/manual/root_files' | relative_url }}).+Then construct the `TTree` to be stored in the file; we will later add branches to the tree.  _**Example**_ -Scanning the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)).- {% highlight C++ %}-   root[] TFile f("cernstaff.root")-   root[] T->Scan("Cost:Age:Children")--   ************************************************-   *    Row *    Cost *       Age *    Children   *-   ************************************************-   *     0 *    11975 *        58 *             0 *-   *     1 *    10228 *        63 *             0 *-   *     2 *    10730 *        56 *             2 *-   *     3 *     9311 *        61 *             0 *-   *     4 *     9966 *        52 *             2 *-   *     5 *     7599 *        60 *             0 *-   *     6 *     9868 *        53 *             1 *-   *     7 *     8012 *        60 *             1 *-   *     8 *     8813 *        51 *             0 *-   *     9 *     7850 *        56 *             1 *-   *    10 *     7599 *        51 *             0 *-   *    11 *     9315 *        54 *             2 *-   *    12 *     7599 *        54 *             0 *-   *    13 *     7892 *        46 *             0 *-   *    14 *     7850 *        54 *             1 *-   *    15 *     7599 *        57 *             0 *-   *    16 *     8137 *        55 *             0 *-   *    17 *     7850 *        55 *             1 *-   *    18 *     7294 *        57 *             1 *-   *    19 *     8101 *        51 *             2 *-   *    20 *     5720 *        54 *             0 *-   *    21 *    15832 *        57 *             1 *-   *    22 *    12226 *        63 *             1 *-   *    23 *    13135 *        56 *             0 *-   *    24 *     9617 *        49 *             0 *+std::unique_ptr<TFile> myFile( TFile::Open("file.root", "RECREATE") );+auto tree = std::make_unique<TTree>("tree", "The Tree Title"); {% endhighlight %} -### Indexing trees--- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.--The index is built in the following way:-- A pass on all entries is made like in [TTree::Draw()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.-- `var1` = `majorname`-- `var2` = `minorname`-- `sel = 231` × _majorname_ + _minorname_-- For each entry in the tree the `sel` expression is evaluated and the result array is sorted into `fIndexValues`.--Once the index is calculated, an entry can be retrieved with-[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.--_**Example**_--{% highlight C++ %}-// To create an index using the leaves "Run" and "Event".-   tree.BuildIndex("Run","Event");--// To read entry corresponding to Run=1234 and Event=56789.-   tree.GetEntryWithIndex(1234,56789);-   {% endhighlight %}--Note that `majorname` and `minorname` can be expressions using original tree variables e.g., `"run-90000"` or `"event +3*xx"`.--In case an expression is specified, the equivalent expression must be computed when calling-[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.-To build an index with only `majorname`, specify `minorname="0"` (default).--Once the index is built, it can be saved with the `TTree` object with `tree.Write()`.--The most convenient place to create the index is at the end of the filling process just before saving the tree header. If a previous index was calculated, it will be redefined by this new call.+{% highlight Python %}+myFile = ROOT.TFile( ROOT.TFile.Open("file.root", "RECREATE") )+tree = ROOT.TTree("tree", "The Tree Title")+{% endhighlight %} -Note that this function can also be applied to a {% include ref class="TChain" %}. The return value is the number of entries in the Index (< 0 indicates failure).+### Creating branches -## Tree Viewer+There are multiple ways to add branches to a `TTree`; the most commonly used ones are covered here.+More extensive documentation can be found in the [reference manual](https://root.cern.ch/doc/master/classTTree.html#creatingattreetoc). -With the Tree Viewer you can examine a tree in a GUI.+> **Note**+>+> Do *not* use the {% include ref class="TBranch" %} constructor to add a branch to a tree.  > **Note** >-> You can also use the ROOT Object Browser to examine a tree that is saved in a ROOT file. See → [ROOT Object Browser]({{ '/manual/root_files#root-object-browser' | relative_url }}).+> The objects *and* variables used to create branches must not be destroyed until the `TTree` is deleted or `TTree::ResetBranchAddress()` is called.+> If the address of the data to be filled changes with each tree entry, you have to inform the branch about the new address with [TBranch::SetAddress](https://root.cern/doc/master/classTBranch.html#a63e019ffc9c53ba249bd729da6a78657){:target="_blank"} before filling the tree again. -- Use the {% include ref class="TTreeViewer" %} class to open the ROOT file (containing the tree) in the Tree Viewer. -_**Example**_+**1. Branches holding basic types** -Open the Tree Viewer for the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)) that contains the tree `T`.+If you have a variable of type `int`, `float`, `bool`, or any other basic type, you can create a branch (and a leaf) from it.+For fundamental datatypes, the type can be deduced from the variable and the name of the leaf will be set to the name of the branch.+In Python, that type information is not available and the leaf name and data type must be specified as third argument.+Further details are explained in the [reference guide](https://root.cern.ch/doc/master/classTTree.html#addcolumnoffundamentaltypes).  {% highlight C++ %}-   root[] TFile f("cernstaff.root")-   root[] new TTreeViewer("T")+float var;+tree->Branch("branch0", &var); {% endhighlight %} -{% include figure_image-img="tree_viewer.png"-caption="Tree Viewer."-%}--The left panel contains the list of trees and their branches. The right panel displays the leaves or variables in the tree.+{% highlight Python %}+# Provide a one-element array, so ROOT can read data from this memory. +from array import array+var = array('f', [ 0 ])+tree.Branch("branch0", var, "leafname/F");+{% endhighlight %}+<br/>+**2. Branches holding class type** -### Drawing correlating variables in a scatterplot+You can create a branch holding one of ROOT's classes, or your own type for which you have provided a dictionary (see → [I/O]({{ '/manual/root_io' | relative_url }})). -You can show the correlation between the variables, listed in the {% include ref class="TTreeViewer" %}, by drawing a scatterplot.+_Splitting_ -- Select a variable in the {% include ref class="TTreeViewer" %}  and drag it to the `X:-empty-` entry.-- Select a second variable and drag it to the `Y:-empty-` entry.+If told, TTree will create (sub-) branches for each member of a class and its base classes.+If such a member is a class itself, that member's type can also be split.+The recusion level of nested splitting is called the "split level"; it can be configured during branch creation. -{% include figure_image-img="variables_for_scatterplot_small.png"-caption="Variables Age and Cost selected for the scatterplot."-%}+If the split level is set to 0, there is no splitting: all data members are stored in the same branch.+Data members can also be configured to be non-split as part of the dictionary; see → [I/O]({{ '/manual/root_io' | relative_url }}).+The default split level of 99 means to split all members at any recursion level. -- Click `Scatterplot`.+_Pointers_ -{% include figure_image-img="scatterplot-icon.png"-caption="Scatterplot icon."-%}--The scatterplot is drawn.+While references `X &` are not supported as member types, pointers are.+If the pointer is non-null, ROOT stores the object pointed to (pointee).+If multiple pointers within the same branch point to the same object during one `TBranch::Fill()` operation (as invoked by `TTree::Fill()`), that pointee will only be stored once; upon reading, all pointers will again point to the same object. -{% include figure_jsroot-   file="trees.root" object="CostAge" width="500px" height="350px"-   caption="Scatterplot of the variables Age and Cost."-%}--Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in-the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid.+For the general case, indices into object collections could be persistified instead of pointers.+This way, the object is only stored once. -## Branches--You can organize columns, this is branches, of a tree with the {% include ref class="TBranch" %} class. A variable on a `TBranch` is called a leaf ({% include ref class="TLeaf" %}). If two variables are independent and it is certain that the variables will not be used together, they should be placed on separate branches.--The branch type differs by what is stored in it. A branch can contain the following data:--- an entire object,-- a list of simple variables,-- contents of a folder,-- contents of a {% include ref class="TList" %},-- an array of objects.--If two variables are independent and the variables will not be used together, place them on separate branches.-If the variables are related, such as the coordinates of a point, create one branch with both coordinates on it.--### Adding a branch+_**Example**_ -- Use the following syntax of the [TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"} method to add a {% include ref class="TBranch" %} to a tree:+ROOT's class {% include ref class="TNamed" %} has the data members `fName` and `fTitle`.+The following requests the tree to create a branch for each of them.+As `TNamed` derives from `TObject`, branches for `TObject`'s data members will also be created.  {% highlight C++ %}-   auto branch = tree.Branch(branchname, address, leaflist, bufsize)+TNamed var;+const int splitLevel = 99; // "all the way"+tree->Branch("branch0", &var, splitlevel); {% endhighlight %}+<br/>+**3. Branches holding `std::vector`, `std::array`, `std::list`, etc** -`address` is the address of the first item of a structure.-`leaflist` is the concatenation of all the variable names and types separated by a colon character. The variable name and the variable type are separated by a slash (/). The variable type must be one character.-For more information on adding a branch to tree, see → {% include ref class="TTree" %}.+Both top-level branches (those created by a call to `TTree::Branch()`) and branches created by splitting data members can hold collections such as `std::vector`, `std::array`, `std::list`, or `std::map`.+Splitting can traverse through collections:+if a member is a `std::vector<X>`, the tree can split `X` into sub-branches, too. -> **Note**->-> Do *not* use the {% include ref class="TBranch" %} constructor to add a branch to a tree.+Such collections can also contain pointers.+For polymorphic pointees, ROOT will not just stream the base, but determine the actual object type.+If the split level is `TTree::kSplitCollectionOfPointers` then the pointees will be written in split mode, possibly adding new branches as new polymorphic derived types are encountered. -### Adding a branch with a folder--- Use the following syntax to add a branch with a folder:+### Filling a tree -{% highlight C++ %}-   tree->Branch("/aFolder")-{% endhighlight %}+Use [TTree:Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} to add a new entry (or "row") to the tree, and store the current values of the variables that were provided during branch creation. -This creates one branch for each element in the folder. The method returns the total number of branches created.+### Writing the tree header -### Adding a branch with STL collections+Use [TTree::Write()](https://root.cern/doc/master/classTTree.html#af6f2d9ae4048ad85fcae5d2afa05100f){:target="_blank"} to write the tree header into a ROOT file.+Earlier entries' data might already be written as part of `TTree::Fill()`. -A `STLcollection` is a address of a pointer to `std::vector`, `std::list`, `std::deque`, `std::set` or `std::multiset` containing pointers to objects.+If due to the data written during `TTree::Fill()`, the file's size increases beyond [TTree::GetMaxTreeSize()](https://root.cern/doc/master/classTTree.html#aca38baf017a203ddb3119a9ab7283cd9){:target="_blank"}, the current ROOT file is closed and a new ROOT file is created.+For an original ROOT file named `myfile.root`, the subsequent ROOT files are named `myfile_1.root`, `myfile_2.root`, etc. -- Use the following syntax of the [TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"} method to add a `STLcollection`:+_**Example**_  {% highlight C++ %}-   auto branch = tree.Branch(branchname, STLcollection, buffsize, splitlevel);- {% endhighlight %}+std::unique_ptr<TFile> myFile( TFile::Open("file.root", "RECREATE") );+auto tree = std::make_unique<TTree>("tree", "The Tree Title"); -If the `splitlevel` is a value bigger than 100 [TTree::kSplitCollectionOfPointers](https://root.cern/doc/master/classTTree.html#a6d07819a66bb97bafd460adfad555114ae3b257c9ade74c1a53383d800c0a708c){:target="_blank"}  then the `STLcollection` will be written in split mode.+float var;+tree->Branch("branch0", &var); -If a dynamic structures changes with each entry, you have to redefine the branch address with [TBranch::SetAddress](https://root.cern/doc/master/classTBranch.html#a63e019ffc9c53ba249bd729da6a78657){:target="_blank"}  before filling the branch again.--### Adding a branch with objects--- Use the following syntax of the [TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"} method to add objects to a tree:+for (int iEntry = 0; iEntry < 1000; ++iEntry) {+   var = 0.3 * iEntry;+   // Fill the current value of `var` into `branch0`+   tree->Fill();+} -{% highlight C++ %}-   MyClass object;-   auto branch = tree.Branch(branchname, &object, bufsize, splitlevel)+// Now write the header+tree->Write(); {% endhighlight %} -`&object` must be the address of a valid object. The object must not be destroyed (this is be deleted)-until the {% include ref class="TTree" %} is deleted or-[TTree::ResetBranchAddress](https://root.cern/doc/master/classTTree.html#a181eb19c03433781fde2fa94443710dc){:target="_blank"}-is called.--The following values are available for the `splitlevel`:--`splitlevel=0`<br>-The object is serialized in the branch buffer.+{% highlight Python %}+from array import array+import ROOT -`splitlevel=1 (default)`<br>-This branch is automatically into sub-branches, with one sub-branch for each-data member or object of the object itself. If the object member is a [TClonesArray](https://root.cern/doc/master/classTClonesArray.html){:target="_blank"}, it is processed as it is with `splitlevel=2`.+myFile = ROOT.TFile( ROOT.TFile.Open("file.root", "RECREATE") )+tree = ROOT.TTree("tree", "The Tree Title") -`splitlevel=2`<br>-This branch is automatically split into sub-branches, with one sub-branch for each-data member or object of the object itself. If the object member is a [TClonesArray](https://root.cern/doc/master/classTClonesArray.html){:target="_blank"} it is processed as a [TObject](https://root.cern/doc/master/classTObject.html){:target="_blank"}, but only for one branch.+# Provide a one-element array, so ROOT can read data from this memory. +var = array('f', [ 0 ])+tree.Branch("branch0", var, "leafname/F"); -### Adding a branch to an existing tree-You can add a branch to an existing tree.--_**Example**_+for iEntry in range(1000):+   var = 0.3 * iEntry+   # Fill the current value of `var` into `branch0`+   tree.Fill() -If one variable in the tree was computed with a certain algorithm, you may want to try another algorithm and compare the results. To do this, you can add a new branch, fill it, and save the tree.--{% highlight C++ %}-void tree3AddBranch() {-   TFile f("tree3.root", "update");-   Float_t new_v;-   auto t3 = f->Get<TTree>("t3");-   auto newBranch = t3->Branch("new_v", &new_v, "new_v/F");-   Long64_t nentries = t3->GetEntries();    // Read the number of entries in the t3.-   for (Long64_t i = 0; i < nentries; i++) {-      new_v = gRandom->Gaus(0, 1);-      newBranch->Fill();-   }-   t3->Write("", TObject::kOverwrite);       // Save only the new version of the tree.-}+# Now write the header+tree.Write() {% endhighlight %} -`kOverwrite` in the `Write()` method causes the tree to be overwritten.--## Examples for writing and reading trees+_AutoFlush_ -The following sections are examples of writing and reading trees that range in complexity from a simple tree with-a few variables to a tree with folders and complex event objects.+The tree can flush its data (i.e. its baskets) to file when reaching a given cluster size, thus closing the cluster.+By default this happens approximately every 30MB of compressed data.+The size can be adjusted using using [TTree::SetAutoFlush()](https://root.cern/doc/master/classTTree.html#ad4c7c7d70caf5657104832bcfbd83a9f){:target="_blank"}. -### A tree with a C structure+_AutoSave_ -> Tutorial->->  {% include tutorial name="tree2" %}+The tree can write a header update to file after it has collected a certain data size in baskets (by default, 300MB).+If your program crashes, you can recover the tree and its baskets written before the last autosave. -In this tutorial is shown:+You can adjust the threshold (in bytes or entries) using [TTree::SetAutoSave()](https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8){:target="_blank"}. -- how to build branches from a C structure-- how to make a branch with a fixed length array-- how to make a branch with a variable length array-- how to read selective branches-- how to fill a histogram from a branch-- how to [TTree::Draw](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"} to draw a 3D plot -### Adding friends to trees+## Reading a tree -> Tutorial+> **Note** >->  {% include tutorial name="tree3" %}--Adding a branch is often not possible because the tree is a read-only file and you do not have permission to save the modified tree with the new branch. Even if you do have the permission, you risk loosing the original tree with an unsuccessful attempt to save the modification. Since trees are usually large, adding a branch could extend it over the 2 GB limit. In this case, the attempt to write the tree fails, and the original data is may also be corrupted. In addition, adding a branch to a tree enlarges the tree and increases the amount of memory needed to read an entry, and therefore decreases the performance.--For these reasons ROOT offers the concept of friends for trees (and chains) by adding a branch manually with [TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a011d362261b694ee7dd780bad21f030b){:target="_blank"}.+> Please use {% include ref class="RDataFrame" namespace="ROOT" %} to read trees, unless you need to do low-level I/O! -The [TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a011d362261b694ee7dd780bad21f030b){:target="_blank"} method has two parameters, the first is the tree name and the second is the name of the ROOT file where the friend tree is saved.-[TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a011d362261b694ee7dd780bad21f030b){:target="_blank"} automatically opens the friend file.+To read a tree, you need to associate your variables with the tree's branches, as when writing.+When loading a tree entry, the tree will set the variables to the branch's value as read from the storage.+That is done by calling [`TTree::SetBranchAddress()`](https://root.cern/doc/master/classTTree.html#a39b867210e4a77ef44917fd5e7898a1d): +_**Example**_ -### Importing an ASCII file into a tree+{% highlight C++ %}+std::unique_ptr<TFile> myFile( TFile::Open("file.root") );+auto tree = myFile->Get<TTree>("TreeName"); -Use [TTree::ReadFile()](https://root.cern/doc/master/classTTree.html#a9c8da1fbc68221b31c21e55bddf72ce7){:target="_blank"} to automatically define the structure of the {% include ref class="TTree" %} and read the data from a formatted ASCII file.+int variable;+tree->SetBranchAddress("branchName", &variable); -_**Example**_+for (int iEntry = 0; tree->LoadTree(iEntry) >= 0; ++iEntry) {+   // Load the data for the given tree entry+   tree->GetEntry(iEntry); -{% highlight C++ %}-{-   gROOT->Reset();-   TFile *f = new TFile("basic2.root","RECREATE");-   TH1F *h1 = new TH1F("h1","x distribution",100,-4,4);-   TTree *T = new TTree("ntuple","data from ascii file");-   Long64_t nlines = T->ReadFile("basic.dat","x:y:z");-   printf(" found %lld pointsn",nlines);-   T->Draw("x","z>2");-   T->Write();+   // Now, `variable` is set to the value of the branch+   // "branchName" in tree entry `iEntry`+   printf("%d\n", variable); } {% endhighlight %} +In Python you can simply use the branch name as an attribute on the tree: -## Using trees for data analysis--The following methods are available for data analysis using trees:--- [TTree::Draw()](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"}+{% highlight Python %}+myFile = ROOT.TFile.Open("file.root")+myTree = myFile.TreeName+for entry in myTree:+   print(entry.branchName)+{% endhighlight %} -- [TTree::MakeClass()](https://root.cern/doc/master/classTTree.html#ac4ceaf4ae0b87412acf94093043cc2de){:target="_blank"} -- [TTree::MakeSelector()](https://root.cern/doc/master/classTTree.html#abe2c6509820373c42a88f343434cbcb4){:target="_blank"}+### Selecting a subset of branches to be read -### Using TTree:Draw()+You can select or deselect branches from being read by `GetEntry()` by calling [`TTree::SetBranchStatus()`](https://root.cern/doc/master/classTTree.html#aeca53bcd4482de1d883d259df6663920).+It is vividly recommended to only read the branches actually needed:+`TTree` is optimized for exactly this use case, and most analyses will only need a fraction of the available branches. -With the [TTree::Draw()](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"} method, you can easily plot a variable (a leaf).+{% highlight C++ %}+// Extract the tree as above. -_**Example**_+// Disable everything...+tree->SetBranchStatus("*", false);+// ...but the branch we need+tree->SetBranchStatus("branchName", true); -Open the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)) and lists its content.+// Now proceed as above.+int variable;+tree->SetBranchAddress("branchName", &variable);+for (int iEntry = 0; tree->LoadTree(iEntry) >= 0; ++iEntry) {+   // Load the data for the given tree entry+   tree->GetEntry(iEntry); -{% highlight C++ %}-root [] TFile f("cernstaff.root")-root [] f.ls()-TFile**      cernstaff.root- TFile*      cernstaff.root-  KEY: TTree   T;1   CERN 1988 staff data+   printf("%d\n", variable);+} {% endhighlight %} -The `cernstaff.root` file contains the {% include ref class="TTree" %}  `T`. A pointer is created to the tree. -{% highlight C++ %}-   root [] TTree *MyTree = T-{% endhighlight %}+### Selecting a subset of entries to be read -To show the different `Draw()` options, a canvas with four sub-pads is created.+To process only a selection of tree entries, you can use a {% include ref class="TEntryList" %}.+First you insert the tree entry numbers you want to process into the `TEntryList`. -{% highlight C++ %}-   root [] TCanvas *myCanvas = new TCanvas()-   root [] myCanvas->Divide(2,2)+{% highlight Python %}+entryList = ROOT.TEntryList("entryListName", "Title of the entry list")+for entry in tree:+   if entry.missingET < 100:+      entryList.Enter(tree.GetReadEntry())+myFile = ROOT.TFile.Open("entrylist.root", "RECREATE")+myFile.WriteObject(entrylist) {% endhighlight %} -The first pad with is activated with [TCanvas::cd](https://root.cern/doc/master/classTCanvas.html#ad996aa7bc34186944363b48963de4de5){:target="_blank"}.+You can then re-use the `TEntryList` in subsequent processing of the tree, skipping irrelevant entries. -{% highlight C++ %}-   root [] myCanvas->cd(1)+{% highlight Python %}+myFile = ROOT.TFile.Open("entrylist.root")+entrylist = myFile.entryListName+tree.SetEntryList(entrylist)+for entry in tree:+   # all entries will have missingET < 100 {% endhighlight %} -The `Cost` variable is drawn. [TTree::Draw](https://root.cern/doc/master/classTCanvas.html#a2309e37a6471e07f9dad3e5af1fe5561){:target="_blank"}-automatically creates a histogram. The style of the histogram is inherited from the {% include ref class="TTree" %} attributes.--{% highlight C++ %}-   root [] MyTree->Draw("Cost")-{% endhighlight %}+## Appending `TTree`s as a `TChain` -{% include figure_jsroot-   file="trees.root" object="c1" width="500px" height="350px"-   caption="The variable `Cost` drawn in a histogram."-%}+In high energy physics you always want as much data as possible.+But it's not nice to deal with files of multiple terabytes.+ROOT allows to to split data across multiple files, where you can then access the files' tree parts as one large tree.+That's done through {% include ref class="TChain" %}, which inherits from {% include ref class="TTree" %}:+it wants to know the name of the trees in the files (which can be overridden when adding files), and the file names, and will act as if it was a huge, continuous tree: -Next, the second pad is activated and scatter plot is drawn. Two dimensions (here `Cost` and `Age`) are separated by a colon ("x:y").<br>-In general, this parameter is a string containing up to three expressions, one for each dimension, separated by a colon (“e1:e2:e3”).+_**Example**_  {% highlight C++ %}-   root [] myCanvas->cd(2)-   root [] MyTree->Draw("Cost:Age")+TChain chain("CommonTreeName");+if (chain.Add("data_*.root") != 12)+   std::cerr << "Expected to find 12 files!\n";+// Use `chain` as if it was a `TTree` {% endhighlight %} -{% include figure_jsroot-   file="trees.root" object="c2" width="500px" height="350px"-   caption="The variable `Cost` and `Age` drawn in a histogram."-%}+{% highlight Python %}+chain = ROOT.TChain("CommonTreeName")+if chain.Add("data_*.root") != 12:+   print("Expected to find 12 files!")+# Use `chain` as if it was a `TTree`+{% endhighlight %} -Next, the third pad is activated and a selection is added. `Cost` versus `Age` for the entries where the nation is equal to `"CH"` is drawn.<br>-You can use any C++ operator. The value of the selection is used as a weight when filling-the histogram. If the expression includes only Boolean operations the result is 0-(histogram is not filled) or 1 (histogram is filled).+## Widening a `TTree` through friends -{% highlight C++ %}-   root [] myCanvas->cd(3)-   root [] MyTree->Draw("Cost:Age","Nation == \"CH\"")-{% endhighlight %}+Trees are usually written just once.+While updating an existing tree is non-trivial, extending it with additional branches, potentially an "improved" version of an original branch, is straightforward.+"Friend trees" are added by calling [TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a321f2684de145cfcb01cabfce69ea910){:target="_blank"}.+Adding another tree called `T1` as a friend tree will make the branch `X` of `T1` available as both `T1.X` and - if `X` does not exist in the original tree - as `X`. -{% include figure_jsroot-   file="trees.root" object="c3" width="500px" height="350px"-   caption="The variable `Cost` and `Age` with a selection drawn in a histogram."-%}+Friend trees are expected to have at least as many entries as the original tree.+The order of the friend tree's entries must preserve the entry order of the original tree. -Next, the fourth pad is activated and the histogram is drawn with the draw option `surf2`.-Refer to the {% include ref class="THistPainter" %} class for possible draw options.+> **Note**+>+> Care must be taken to ensure that the order of entries in the primary tree matches friends' entries. This is especially relevant when processing a tree in parallel to generate a friend tree, as the entries might be written out in an undefined order (misaligned entries).
> Care must be taken to ensure that the order of entries in the primary tree matches friends' entries. This is especially relevant when processing a tree in parallel to generate a friend tree, as the entries might be written out in an undefined order (misaligned entries).   Case of misaligned entries can be mitigated by building an index on the friend tree (`TTree::BuildIndex`).
Axel-Naumann

comment created time in 6 hours

issue openedroot-project/root

Double delete during hadd tear down.

As reported in https://root-forum.cern.ch/t/pure-virtual-method-called-on-hadding-tfiles-with-tfriendelements/46836/6 (see there for reproducer), hadd at tear down delete already deleted objects.

In version of ROOT equal or newer than v6.22/08 and v6.24/00, the problem appears only in a slow merge. (for example hadd -f0 output.root testroot_1,root testroot_2.root)

The resulting stack trace is:

==1241== Process terminating with default action of signal 6 (SIGABRT)
==1241==    at 0x5BB618B: raise (raise.c:51)
==1241==    by 0x5B9592D: abort (abort.c:100)
==1241==    by 0x58C2910: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1241==    by 0x58CE38B: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1241==    by 0x58CE3F6: std::terminate() (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1241==    by 0x58CF154: __cxa_pure_virtual (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28)
==1241==    by 0x551449A: TCollection::RecursiveRemove(TObject*) (TCollection.cxx:579)
==1241==    by 0x4A345D0: TTree::RecursiveRemove(TObject*) (TTree.cxx:7857)
==1241==    by 0x551F16F: TList::RecursiveRemove(TObject*) (TList.cxx:813)
==1241==    by 0x5518879: THashList::RecursiveRemove(TObject*) (THashList.cxx:354)
==1241==    by 0x543C003: TROOT::RecursiveRemove(TObject*) (TROOT.cxx:2455)
==1241==    by 0x496B26E: ROOT::CallRecursiveRemoveIfNeeded(TObject&) (TROOT.h:398)
==1241==    by 0x5512E61: TCollection::~TCollection() (TCollection.cxx:189)
==1241==    by 0x5506D19: TSeqCollection::~TSeqCollection() (TSeqCollection.h:37)
==1241==    by 0x551BE8E: TList::~TList() (TList.cxx:92)
==1241==    by 0x551BEAD: TList::~TList() (TList.cxx:95)
==1241==    by 0x4A1DE02: TTree::~TTree() (TTree.cxx:975)
==1241==    by 0x4A1E2C7: TTree::~TTree() (TTree.cxx:1023)
==1241==    by 0x4A308E9: TTree::Merge(TCollection*, TFileMergeInfo*) (TTree.cxx:6908)
==1241==    by 0x4960BA2: ROOT::merge_TTree(void*, TCollection*, TFileMergeInfo*) (G__Tree.cxx:4209)
==1241==    by 0x4E739D6: TFileMerger::MergeOne(TDirectory*, TList*, int, TFileMergeInfo&, TString&, THashList&, bool&, bool&, TString const&, TDirectory*, TFile*, TKey*, TObject*, TIter&) (TFileMerger.cxx:660)
==1241==    by 0x4E74D94: TFileMerger::MergeRecursive(TDirectory*, TList*, int) (TFileMerger.cxx:878)
==1241==    by 0x4E756AF: TFileMerger::PartialMerge(int) (TFileMerger.cxx:968)
==1241==    by 0x4E72210: TFileMerger::Merge(bool) (TFileMerger.cxx:372)
==1241==    by 0x119013: main::{lambda(TFileMerger&)#1}::operator()(TFileMerger&) const (hadd.cxx:473)
==1241==    by 0x119407: main::{lambda(TFileMerger&, int, int)#2}::operator()(TFileMerger&, int, int) const (hadd.cxx:501)
==1241==    by 0x11B82E: main (hadd.cxx:543)

created time in 9 hours

pull request commentroot-project/root

[ROOT5] cint: implement a stat() call cache for G__matchfilename()

I doubt so (but maybe @Axel-Naumann can add) as we are no longer really supporting v5 and as you see from the CI result we no longer have a functioning test stand ...

veprbl

comment created time in a day

push eventroot-project/root

Dmitry Kalinkin

commit sha 7231e038b2519eee62446aea7be2becb8671fc57

cint: implement a stat() call cache for G__matchfilename() The G__matchfilename() implements a file comparison check used specifically for loading/unloading of the libraries and the source code. On UNIX-like systems the basic filename comparison is supplemented an additional file match condition is based on comparing file attributes returned by the stat() syscall. On a typical load/unload call, the G__matchfilename() is iterated over items of G__srcfile, which produces a number of stat() calls that is quadratic in number of loaded files. In our specific case we observe an occasional poor performance on AFS network filesystem. The suggested change introduces a cache for the stat() calls that should allow to reduce the number of calls to scale linearly.

view details

Dmitry Kalinkin

commit sha 1c1347b551a8ce9a51d6938c15587ccc4be6f4c0

cint: optimize cache lookup by using iterators

view details

push time in a day

PR merged root-project/root

[ROOT5] cint: implement a stat() call cache for G__matchfilename()

The G__matchfilename() implements a file comparison check used specifically for loading/unloading of the libraries and the source code. On UNIX-like systems the basic filename comparison is supplemented an additional file match condition is based on comparing file attributes returned by the stat() syscall. On a typical load/unload call, the G__matchfilename() is iterated over items of G__srcfile, which produces a number of stat() calls that is quadratic in number of loaded files.

In our specific case we observe an occasional poor performance on AFS network filesystem. The suggested change introduces a cache for the stat() calls that should allow to reduce the number of calls to scale linearly.

This Pull request:

Changes or fixes:

Fixes downstream issue star-bnl/star-sw#115

Checklist:

  • [x] tested changes locally
  • [ ] updated the docs (if necessary)
+43 -2

12 comments

1 changed file

veprbl

pr closed time in a day

issue commentcms-sw/cmssw

Support for std::array event products

@wddgit Thanks for the detailed information. I am reviewing if there is a good practical solution.

makortel

comment created time in a day

Pull request review commentroot-project/root

[ROOT5] cint: implement a stat() call cache for G__matchfilename()

 int G__isfilebusy(int ifn)   return(flag); } +#if !defined(G__WIN32)++#include <errno.h>+#include <time.h>++struct G__StatCacheEntry {+  struct stat info;+  time_t ctime;+  int retcode, _errno;+};++#ifndef R__STAT_CACHING_TIME+#define R__STAT_CACHING_TIME    60+#endif++int G__cachingstat(const char *path, struct stat *buf) {+  static std::map<std::string, struct G__StatCacheEntry> stat_cache;++  if (   stat_cache.count(path)

Would it be possible to use the result of std::map::find rather than searching for the location 3 times (for the case where the path is found)?

veprbl

comment created time in a day

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 The scatterplot is drawn. Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid. -### Indexing trees--- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.--The index is built in the following way:-- A pass on all entries is made like in [TTree::Draw()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.-- `var1` = `majorname`-- `var2` = `minorname`-- `sel = 231` × _majorname_ + _minorname_-- For each entry in the tree the `sel` expression is evaluated and the result array is sorted into `fIndexValues`.--Once the index is calculated, an entry can be retrieved with+### Indexing a tree++- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} to build an index table over expressions that depend on the value in the leaves.+This index is similar to database indexes:+it allows to quickly determine the tree entry number corresponding to the value of an expression.+These expressions should be both equality comparable (that is, not use floating point numbers where precision might cause the index lookup to fail) and unique, to make sure you get the tree entry you expect.+For high-energy physics, a common example could be a combination of run number and event number:+while each one of them might have duplications, their combination is guaranteed to be unique. ++To build an index, define a major and optionally a minor expression, for instance above `Run` and `Event`.

No, they are really expressions (eg. run%1000, event_part_1 * 1000 + event_part_2)

Axel-Naumann

comment created time in 4 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 void ROOT::Experimental::RNTupleWriter::CommitCluster()       field.Flush();       field.CommitCluster();    }-   fSink->CommitCluster(fNEntries);+   float nbytes = fSink->CommitCluster(fNEntries);+   // Cap the compression factor at 1000 to prevent overflow of fMinUnzippedClusterSizeEst+   float compressionFactor = std::min(1000.f, static_cast<float>(fUnzippedClusterSize) / nbytes);+   fUnzippedClusterSizeEst =

which is exact within the limits of the size of a single entry.

It can not be exact, there is also a way for the estimate "does the next entry brings us over the limit" to be wrong (because of the size variability of collections) and since the cluster "must" finish whatever entry it started is a collection of the last entry of the cluster is unusually large, you might have to go over the limit, isn't it?

It would be also easy to diagnose

In practice, it is (too often) harder that one would expect. For example, the failure mode might that some cluster are smaller than they should be (compression guess was too low) resulting in pervasive but not huge performance degradation (too many small reads from the file) that might becomes a "noticeable problem" only under "extreme condition" [e.g. thousands of HPC nodes reading from a high latency remote source].

jblomer

comment created time in 5 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 void ROOT::Experimental::RNTupleWriter::CommitCluster()       field.Flush();       field.CommitCluster();    }-   fSink->CommitCluster(fNEntries);+   float nbytes = fSink->CommitCluster(fNEntries);+   // Cap the compression factor at 1000 to prevent overflow of fMinUnzippedClusterSizeEst+   float compressionFactor = std::min(1000.f, static_cast<float>(fUnzippedClusterSize) / nbytes);+   fUnzippedClusterSizeEst =

The problem you describe is not tied to collections,

Yes but collections have the potential of amplifying the consequence of the missed estimate.

jblomer

comment created time in 5 days

PullRequestReviewEvent

pull request commentroot-project/root

Added typedefs Float16_t and Double32_t to TDataType

Unfortunately the projectroot.roottest.root.io.double32.roottest_root_io_double32_make failure is a real issue. It, surpringly, fails to merge some consecutive Double32_t data member that it was able to before (so the issue appear in TStreamerInfo::Compile):

-   i= 3, ff2             type= 29, offset= [deleted from log], len=4, method= [deleted from log] [optimized]
-   i= 4, ff4             type= 49, offset= [deleted from log], len=1, method= [deleted from log]
+   i= 3, ff2             type= 29, offset= [deleted from log], len=3, method= [deleted from log]
+   i= 4, ff3             type=  9, offset= [deleted from log], len=1, method= [deleted from log]
+   i= 5, ff4             type= 49, offset= [deleted from log], len=1, method= [deleted from log]
Triple-S

comment created time in 5 days

pull request commentroot-project/root

Added typedefs Float16_t and Double32_t to TDataType

@phsft-bot build

Triple-S

comment created time in 5 days

Pull request review commentroot-project/root

[ROOT5] cint: implement a stat() call cache for G__matchfilename()

 int G__isfilebusy(int ifn)   return(flag); } +#if !defined(G__WIN32)

Isnt't the compilation going to fail on Windows?

veprbl

comment created time in 5 days

PullRequestReviewEvent

pull request commentroot-project/root

[ROOT5] cint: implement a stat() call cache for G__matchfilename()

@phsft-bot build

veprbl

comment created time in 5 days

pull request commentroot-project/root

[ROOT5] cint: implement a stat() call cache for G__matchfilename()

How does the new cache handle the typical cache of a file being edited and then reloaded?

veprbl

comment created time in 5 days

pull request commentroot-project/root

Added typedefs Float16_t and Double32_t to TDataType

@phsft-bot build

Triple-S

comment created time in 5 days

issue commentroot-project/root

Reading a std::array<Long64_t, 1> from a TFile created on linux does not work on macOS

:) That looks like a random stream :)

ktf

comment created time in 5 days

PullRequestReviewEvent

Pull request review commentroot-project/root

Added typedefs Float16_t and Double32_t to TDataType

 void TDataType::AddBuiltins(TCollection* types)       fgBuiltins[kULong_t] = new TDataType("unsigned long");       fgBuiltins[kLong64_t] = new TDataType("long long");       fgBuiltins[kULong64_t] = new TDataType("unsigned long long");-      fgBuiltins[kFloat_t] = new TDataType("float");-      fgBuiltins[kDouble_t] = new TDataType("double");+      fgBuiltins[kFloat_t] = fgBuiltins[kFloat16_t] = new TDataType("float");+      fgBuiltins[kDouble_t] = fgBuiltins[kDouble32_t] = new TDataType("double");

kFloat_t and kFloat16_t(and Double_t and Double32_t) are not interchangeble`. See other use in the same file.

      fgBuiltins[kFloat_t] = new TDataType("float");
      fgBuiltins[kDouble_t] = new TDataType("double");
      fgBuiltins[kFloat16_t] = new TDataType("Float16_t");
      fgBuiltins[kDouble32_t] = new TDataType("Double32_t");

might work.

Triple-S

comment created time in 5 days

PullRequestReviewEvent

issue commentcms-sw/cmssw

Support for std::array event products

For 1 (and somewhat 2) (TClass returns a typeinfo or getTypeInfo special case std::array), the major question/trade-off is that we need to either: a. enumerate all std::array instances used (in selection.xml or getTypeInfo) [ Error prone / missing instances ] b. use the interpreter to produce the type info [ slightly slower, increase memory use, potentially significantly ] c. auto-generate the dictionary for the std::array (similar to STL collection) [ doesn't support top level std::array, only those nested in a class or struct ]

makortel

comment created time in 5 days