profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/jalopezg-r00t/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Javier Lopez-Gomez jalopezg-r00t CERN Geneva, Switzerland https://root.cern/ PhD in Computer Science and Technology (University Carlos III of Madrid). Since September 2020, part of @root-project at CERN.

jalopezg-r00t/ARCOS-Git_-_There_be_dragons 1

ARCOS talk: Git - There be dragons! [Jan 2019]

jalopezg-r00t/MPI_mpmc-queue 1

MPI Multiple-Producer-Multiple-Consumer distributed queue implementation

jalopezg-r00t/iotools 0

Tracing read/seek

jalopezg-r00t/root 0

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

jalopezg-r00t/roottest 0

The ROOT test suite

jalopezg-r00t/SRS-latex-uc3m 0

Support for software engineering tasks (SRS, traceability matrices, etc.) in LaTeX2e

jalopezg-r00t/web 0

root.cern

PullRequestReviewEvent

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 toc: true toc_sticky: true --- -ROOT provides the {% include ref class="TTree" %} and the {% include ref class="TNtuple" %} class to store large quantities of same-class objects.<br>-A tree is a typical data container used for example by all LHC (Large Hadron Collider) experiments.<br>-Trees are optimized to reduce disk space and enhance access speed.+## Introducing `TTree` -A tree consists of a list of independent columns, called branches. The {% include ref class="TBranch" %} class represents a branch. A branch can contain all kind of data, such as objects or arrays in addition to all the simple types.+As introduced in → [Storing columnar data in a ROOT file and reading it back]({{ '/manual/root_files/#storing-columnar-data-in-a-root-file-and-reading-it-back' | relative_url }}),+ROOT can handle large columnar datasets.+In the aforementioned section, we made use of {% include ref class="RDataFrame" namespace="ROOT" %} to write and+read back a simple dataset.+RDataFrame traditionally relies on {% include ref class="TTree" %} for columnar data storage, used for example+by all LHC (Large Hadron Collider) experiments.+Trees are optimized for reduced disk space and selecting, high-throughput columnar access with reduced memory usage. -A {% include ref class="TNtuple" %} is a {% include ref class="TTree" %}, which is limited to contain only floating-point numbers.+In addition to the documentation in this manual, we recommend to take a look at the TTree tutorials:  {% include tutorials name="Tree" url="tree" %}  > **RNTuple** >-> [RNTuple](https://root.cern/doc/master/md_tree_ntuple_v7_doc_README.html){:target="_blank"} (for N-tuple and nested tuple) is the experimental evolution of {% include ref class="TTree" %} columnar data storage. `RNTuple` introduces new interfaces that are more robust.+> [RNTuple](https://root.cern/doc/master/md_tree_ntuple_v7_doc_README.html){:target="_blank"} is the experimental evolution of {% include ref class="TTree" %} columnar data storage. {% include ref class="RNTuple" namespace="ROOT::Experimental" %} introduces robust interfaces, a high-performance storage layout, and an asynchronous, thread-safe scheduling. -## Tree classes--ROOT provides numerous classes for trees and branches, of which the following are among the most used:--- [TTree](https://root.cern/doc/master/classTTree.html){:target="_blank"}: Represents a columnar data set. Any C++ type can be stored in its columns.--- [TNtuple](https://root.cern/doc/master/classTNtuple.html){:target="_blank"}: A simple `TTree` restricted to a list of float variables only.--- [TBranch](https://root.cern/doc/master/classTBranch.html){:target="_blank"}: Organizes columns, i.e. branches, of a `TTree`.--- [TChain](https://root.cern/doc/master/classTChain.html){:target="_blank"}: A list of ROOT files containing `TTree` objects.---## Working with trees--ROOT offers many possibilities to work with trees, for example:--- [Creating a tree](#creating-a-tree)-- [Creating a tree from a folder structure](#creating-a-tree-from-a-folder-structure)-- [Filling a tree](#filling-a-tree)-- [Writing a tree](#writing-a-tree)-- [Printing the summary of a tree](#printing-the-summary-of-a-tree)-- [Showing an entry of a tree](#showing-an-entry-of-a-tree)-- [Scanning trees](#scanning-trees)--### Creating a tree--- Use the {% include ref class="TTree" %} constructor to create a tree.--_**Example**_--{% highlight C++ %}-   TTree t("MyTree","Example Tree");-{% endhighlight %}--It creates a tree with the title `Example Tree`.--_**Example: A simple tree**_--The following script builds a {% include ref class="TTree" %} from an ASCII file containing-statistics about the staff at CERN. Both, `staff.C` and `staff.dat` are in available in-`$ROOTSYS/tutorials/tree`.--The following script declares a structure called `staff_t`. It opens the ASCII file, creates-a ROOT file and a `TTree`. Then it creates one branch with the-[TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"}-method.<br/>The first parameter of the `Branch()` method is the branch name. <br/>The second-parameter is the address from which the first leaf is to be read. In this example, it is-the address of the structure staff. Once the branch is defined,-the script reads the data from the ASCII file into the `staff_t`-structure and fills the tree. The ASCII file is closed, and the ROOT file is written to-disk saving the tree. Trees and histograms are created in the current directory, which is-the ROOT file in our example. Hence an `f->Write()` saves the tree.--{% highlight C++ %}-{-// Create the structure to hold the variables for the branch.-   struct staff_t {-   Int_t cat;-   Int_t division;-   Int_t flag;-   Int_t age;-   Int_t service;-   Int_t children;-   Int_t grade;-   Int_t step;-   Int_t nation;-   Int_t hrweek;-   Int_t cost;-   };-   staff_t staff;--// Open the ASCII file.-   FILE *fp = fopen("staff.dat","r");-   char line[81];--// Create a new ROOT file.-   TFile *f = new TFile("staff.root","RECREATE");--// Create a TTree.-   TTree *tree = new TTree("T","Staff data from ASCII file");--// Create one branch with all information from the structure.-   tree->Branch("staff",&staff.cat,"cat/I:division:flag:age:service:-   children:grade:step:nation:hrweek:cost");--// Fill the tree from the values in ASCII file.-   while (fgets(&line,80,fp)) {-      sscanf(&line[0],"%d%d%d%d",&staff.cat,&staff.division,-      &staff.flag,&staff.age);-      sscanf(&line[13],"%d%d%d%d",&staff.service,&staff.children,-      &staff.grade,&staff.step);-      sscanf(&line[24],"%d%d%d",&staff.nation,&staff.hrweek,-      &staff.cost);-      tree->Fill();-   }--// Check what the tree looks like.-   tree->Print();-   fclose(fp);-   f->Write();-}-{% endhighlight %}--<p><a name="example-building-a-tree-from-an-ascii-file"></a></p>-_**Example: Building a tree from an ASCII file**_---The tutorial {% include tutorial name="cernbuild" %} provides an example how to build a {% include ref class="TTree" %} from an ASCII file.-The input file is `cernstaff.dat` that contains statistics about the staff at CERN.--The `cernbuild.C` ROOT macro creates a root file (`cernstaff.root`) and prints the tree `T` and its branches with [TTree::Print()](https://root.cern/doc/master/classTTree.html#a7a0006d38d5066b533e040aa16f97094){:target="_blank"}.--{% highlight C++ %}-root [0] .x cernbuild.C-******************************************************************************-*Tree    :T         : CERN 1988 staff data                                   *-*Entries :     3354 : Total =          176339 bytes  File  Size =      15005 *-*        :          : Tree compression factor =   2.74                       *-******************************************************************************-*Br    0 :Category  : Category/I                                             *-*Entries :     3354 : Total  Size=      14073 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-*Br    1 :Flag      : Flag/i                                                 *-*Entries :     3354 : Total  Size=      14049 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-*Br    2 :Age       : Age/I                                                  *-*Entries :     3354 : Total  Size=      14043 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-*Br    3 :Service   : Service/I                                              *-*Entries :     3354 : Total  Size=      14067 bytes  One basket in memory    *-*Baskets :        0 : Basket Size=      32000 bytes  Compression=   1.00     *-*............................................................................*-...-...-...-{% endhighlight %}--### Creating a tree from a folder structure--You can build a folder structure and create a tree with branches for each of the sub-folders.--_**Example**_--`TTree folder_tree("MyFolderTree","/MyFolder");`--`MyFolder` is the top folder. `/` indicates the {% include ref class="TTree" %}  constructor that a folder is being used.-You can fill the tree by placing the data into the folder structure and then calling the [TTree::Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} method.---### Filling a tree+> **RDataFrame**+>+> To access TTree data, please use {% include ref class="RDataFrame" namespace="ROOT" %}.+> `TTree` provides interfaces for low-level, expert usage. -- Use the [TTree:Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} method to fill a {% include ref class="TTree" %} instance.+### The tree and its data -A loop on all defined branches (see → [Branches](#branches)) is executed.+A `TTree` behaves like an array of a data structure that resides on storage - except for one entry (or row, in database language).+That entry is accessible in memory: you can load any tree entry, ideally sequentially.+You can provide your own storage for the values of the columns of the current entry, in the form of variables.+In this case you have to tell the `TTree` about the addresses of these variables; either by calling [`TTree::SetBranchAddress()`](https://root.cern/doc/master/classTTree.html#a39b867210e4a77ef44917fd5e7898a1d), or by passing the variable when creating the branch for writing.+When "filling" (writing) the `TTree`, it will read the values out of these variables;+when reading back a `TTree` entry, it will write the values it read from storage into your variables. -### Writing a tree+### Branches and leaves -The data of a tree are saved in a ROOT file (see → [ROOT files]({{ '/manual/root_files' | relative_url }})).+A tree consists of a list of independent columns, called branches. A branch can contain values of any fundamental type, C++ objects known to ROOT's type system, or collections of those.+When reading a tree, you can select which subset of branches should be read.+This allows you to optimize read throughput for a given analysis, and is one of the main motivations for storing data in columnar format. -- Use the [TTree::Write()](https://root.cern/doc/master/classTTree.html#af6f2d9ae4048ad85fcae5d2afa05100f){:target="_blank"} method to write the tree into a ROOT file.+Branches are represented by {% include ref class="TBranch" %} and its derived classes. -The `TTree::Write()` method is needed to write the ROOT file header.+While `TBranch` represent structure, objects inheriting from {% include ref class="TLeaf" %} give access to the actual data.+Originally, any columnar data was accessible through a `TLeaf`; these days, some of the `TBranch`-derived classes provide data access themselves, such as {% include ref class="TBranchElement" %}. -When writing a {% include ref class="TTree" %} to a ROOT file and if the ROOT file size reaches the value stored in the [TTree::GetMaxTreeSize()](https://root.cern/doc/master/classTTree.html#aca38baf017a203ddb3119a9ab7283cd9){:target="_blank"}, the current ROOT file is closed and a new ROOT file is created. If the original ROOT file is named `myfile.root`, the subsequent ROOT files are named `myfile_1.root`, `myfile_2.root`, etc.+### Baskets, clusters and the tree header -**Autosave**+Every branch or leaf stores the data for its entries in buffers of a size that can be specified during branch creation (default: 32000 bytes).+Once the buffer is full, it gets compressed; the compressed buffer is called _basket_.+These baskets are written into the ROOT file.+Branches with more data per tree entry will fill more baskets than branches with less data per tree entry.+Conversely, baskets can hold many tree entries if their branch stores only a few bytes per tree entry.+This means that generally, all baskets - also of different branches - will contain data of different tree entry ranges. -`Autosave` gives you the option to save all branch buffers every n byte. It is recommended to use `Autosave` for large acquisitions. If the acquisition fails to complete, you can recover the ROOT file and all the contents since the last `Autosave`.+To allow more efficient pre-fetching and better chunking of tree data stored in ROOT files, TTree groups baskets into _clusters_.+A cluster contains all the data of a given entry range.+Trees will close baskets that are not yet full when reaching the tree entry at a cluster boundary. -- Use the [TTree::SetAutosave()](https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8){:target="_blank"} method to set the number of bytes between `Autosave`.+TTree finds the baskets for a given entry for a given branch by means of a _header_ stored in the file.+This header also contains other auxilliary metadata.+When reading a `TTree` object, only this header is actually deserialized, until the tree's entries are loaded.+Multiple updates of these headers can often be found in files (`treename;1`, `treename;2` etc, called cycles, see → [I/O]({{ '/manual/io' | relative_url }})).+Only the last one (also accessible as `treename`) knows about all written baskets. -You can also use [TTree::SetAutosave()](https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8){:target="_blank"} in the acquisition loop every n entry. -### Printing the summary of a tree+### `TNtuple`, the high-performance spread-sheet -- Use the [TTree::Print(Option_t * option = "")](https://root.cern/doc/master/classTTree.html#a7a0006d38d5066b533e040aa16f97094){:target="_blank"} method to print a summary of the tree contents.--- `option = "all"`: Friend trees are also printed.-- `option = "toponly"`:  Only the top level branches are printed.-- `option = "clusters"`: Information about the cluster of baskets is printed.+For convenience, ROOT also provides the {% include ref class="TNtuple" %} class which is a tree whose branches contain only numbers of type `float`, one per tree entry.+It derives from {% include ref class="TTree" %} and is constructed with a list of column names separated by `:`.  _**Example**_  {% highlight C++ %}-root[] TFile f("cernstaff.root")-root[] T->TTree::Print()--******************************************************************************-*Tree    :T         : CERN 1988 staff data                                   *-*Entries :     3354 : Total =          175531 bytes  File  Size =      47246 *-*        :          : Tree compression factor =   3.69                       *-******************************************************************************-*Br    0 :Category  : Category/I                                             *-*Entries :     3354 : Total  Size=      13985 bytes  File Size  =       4919 *-*Baskets :        1 : Basket Size=      32000 bytes  Compression=   2.74     *-*............................................................................*-*Br    1 :Flag      : Flag/i                                                 *-*Entries :     3354 : Total  Size=      13965 bytes  File Size  =       2165 *-*Baskets :        1 : Basket Size=      32000 bytes  Compression=   6.23     *-*............................................................................*-*Br    2 :Age       : Age/I                                                  *-*Entries :     3354 : Total  Size=      13960 bytes  File Size  =       3489 *-*Baskets :        1 : Basket Size=      32000 bytes  Compression=   3.86     *-*............................................................................*-*Br    3 :Service   : Service/I                                              *-*Entries :     3354 : Total  Size=      13980 bytes  File Size  =       2214 *-...-...-...+// Create an n-tuple with the columns `Potential`, `Current`, `Temperature`, `Pressure`,+// each holding one `float` per tree entry.+TNtuple ntp("ntp","Example N-Tuple","Potential:Current:Temperature:Pressure"); {% endhighlight %} -### Showing an entry of a tree--- Use the [TTree::Show()](https://root.cern/doc/master/classTTree.html#a10e5e7424059bc7d17502331b41b0c16){:target="_blank"} method to access one entry of a tree.--_**Example**_ -Showing an entry from the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)).+## Writing a tree -{% highlight C++ %}-root[] TFile f("cernstaff.root")-root[] T->Show(42)--======> EVENT:42-Category = 301-Flag = 13-Age = 56-Service = 31-Children = 0-Grade = 9-Step = 8-Hrweek = 40-Cost = 8645-Division = EP-Nation = CH-{% endhighlight %}--### Scanning trees--- Use the [TTree::Scan()](https://root.cern/doc/master/classTTree.html#af8a886acab51b16d8ddbf65667c035e4){:target="_blank"} method to display all values of the list of leaves.+When writing a `TTree` you first want to create a `TFile`+(see → [ROOT files]({{ '/manual/root_files' | relative_url }}).+Then construct the `TTree` to be stored in the file; we will later add branches to the tree.  _**Example**_ -Scanning the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)).- {% highlight C++ %}-   root[] TFile f("cernstaff.root")-   root[] T->Scan("Cost:Age:Children")--   ************************************************-   *    Row *    Cost *       Age *    Children   *-   ************************************************-   *     0 *    11975 *        58 *             0 *-   *     1 *    10228 *        63 *             0 *-   *     2 *    10730 *        56 *             2 *-   *     3 *     9311 *        61 *             0 *-   *     4 *     9966 *        52 *             2 *-   *     5 *     7599 *        60 *             0 *-   *     6 *     9868 *        53 *             1 *-   *     7 *     8012 *        60 *             1 *-   *     8 *     8813 *        51 *             0 *-   *     9 *     7850 *        56 *             1 *-   *    10 *     7599 *        51 *             0 *-   *    11 *     9315 *        54 *             2 *-   *    12 *     7599 *        54 *             0 *-   *    13 *     7892 *        46 *             0 *-   *    14 *     7850 *        54 *             1 *-   *    15 *     7599 *        57 *             0 *-   *    16 *     8137 *        55 *             0 *-   *    17 *     7850 *        55 *             1 *-   *    18 *     7294 *        57 *             1 *-   *    19 *     8101 *        51 *             2 *-   *    20 *     5720 *        54 *             0 *-   *    21 *    15832 *        57 *             1 *-   *    22 *    12226 *        63 *             1 *-   *    23 *    13135 *        56 *             0 *-   *    24 *     9617 *        49 *             0 *+std::unique_ptr<TFile> myFile( TFile::Open("file.root", "RECREATE") );+auto tree = std::make_unique<TTree>("tree", "The Tree Title"); {% endhighlight %} -### Indexing trees--- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.--The index is built in the following way:-- A pass on all entries is made like in [TTree::Draw()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.-- `var1` = `majorname`-- `var2` = `minorname`-- `sel = 231` × _majorname_ + _minorname_-- For each entry in the tree the `sel` expression is evaluated and the result array is sorted into `fIndexValues`.--Once the index is calculated, an entry can be retrieved with-[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.--_**Example**_--{% highlight C++ %}-// To create an index using the leaves "Run" and "Event".-   tree.BuildIndex("Run","Event");--// To read entry corresponding to Run=1234 and Event=56789.-   tree.GetEntryWithIndex(1234,56789);-   {% endhighlight %}--Note that `majorname` and `minorname` can be expressions using original tree variables e.g., `"run-90000"` or `"event +3*xx"`.--In case an expression is specified, the equivalent expression must be computed when calling-[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.-To build an index with only `majorname`, specify `minorname="0"` (default).--Once the index is built, it can be saved with the `TTree` object with `tree.Write()`.--The most convenient place to create the index is at the end of the filling process just before saving the tree header. If a previous index was calculated, it will be redefined by this new call.+{% highlight Python %}+myFile = ROOT.TFile( ROOT.TFile.Open("file.root", "RECREATE") )+tree = ROOT.TTree("tree", "The Tree Title")+{% endhighlight %} -Note that this function can also be applied to a {% include ref class="TChain" %}. The return value is the number of entries in the Index (< 0 indicates failure).+### Creating branches -## Tree Viewer+There are multiple ways to add branches to a `TTree`; the most commonly used ones are covered here.+More extensive documentation can be found in the [reference manual](https://root.cern.ch/doc/master/classTTree.html#creatingattreetoc). -With the Tree Viewer you can examine a tree in a GUI.+> **Note**+>+> Do *not* use the {% include ref class="TBranch" %} constructor to add a branch to a tree.  > **Note** >-> You can also use the ROOT Object Browser to examine a tree that is saved in a ROOT file. See → [ROOT Object Browser]({{ '/manual/root_files#root-object-browser' | relative_url }}).+> The objects *and* variables used to create branches must not be destroyed until the `TTree` is deleted or `TTree::ResetBranchAddress()` is called.+> If the address of the data to be filled changes with each tree entry, you have to inform the branch about the new address with [TBranch::SetAddress](https://root.cern/doc/master/classTBranch.html#a63e019ffc9c53ba249bd729da6a78657){:target="_blank"} before filling the tree again. -- Use the {% include ref class="TTreeViewer" %} class to open the ROOT file (containing the tree) in the Tree Viewer. -_**Example**_+**1. Branches holding basic types** -Open the Tree Viewer for the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)) that contains the tree `T`.+If you have a variable of type `int`, `float`, `bool`, or any other basic type, you can create a branch (and a leaf) from it.+For fundamental datatypes, the type can be deduced from the variable and the name of the leaf will be set to the name of the branch.+In Python, that type information is not available and the leaf name and data type must be specified as third argument.+Further details are explained in the [reference guide](https://root.cern.ch/doc/master/classTTree.html#addcolumnoffundamentaltypes).  {% highlight C++ %}-   root[] TFile f("cernstaff.root")-   root[] new TTreeViewer("T")+float var;+tree->Branch("branch0", &var); {% endhighlight %} -{% include figure_image-img="tree_viewer.png"-caption="Tree Viewer."-%}--The left panel contains the list of trees and their branches. The right panel displays the leaves or variables in the tree.+{% highlight Python %}+# Provide a one-element array, so ROOT can read data from this memory. +from array import array+var = array('f', [ 0 ])+tree.Branch("branch0", var, "leafname/F");+{% endhighlight %}+<br/>+**2. Branches holding class type** -### Drawing correlating variables in a scatterplot+You can create a branch holding one of ROOT's classes, or your own type for which you have provided a dictionary (see → [I/O]({{ '/manual/io' | relative_url }})). -You can show the correlation between the variables, listed in the {% include ref class="TTreeViewer" %}, by drawing a scatterplot.+_Splitting_ -- Select a variable in the {% include ref class="TTreeViewer" %}  and drag it to the `X:-empty-` entry.-- Select a second variable and drag it to the `Y:-empty-` entry.+If told, TTree will create (sub-) branches for each member of a class and its base classes.+If such a member is a class itself, that member's type can also be split.+The recusion level of nested splitting is called the "split level"; it can be configured during branch creation. -{% include figure_image-img="variables_for_scatterplot_small.png"-caption="Variables Age and Cost selected for the scatterplot."-%}+If the split level is set to 0, there is no splitting: all data members are stored in the same branch.+Data members can also be configured to be non-split as part of the dictionary; see → [I/O]({{ '/manual/io' | relative_url }}).+The default split level of 99 means to split all members at any recursion level. -- Click `Scatterplot`.+_Pointers_ -{% include figure_image-img="scatterplot-icon.png"-caption="Scatterplot icon."-%}--The scatterplot is drawn.+While references `X &` are not supported as member types, pointers are.+If the pointer is non-null, ROOT stores the object pointed to (pointee).+If multiple pointers within the same branch point to the same object during one `TBranch::Fill()` operation (as invoked by `TTree::Fill()`), that pointee will only be stored once; upon reading, all pointers will again point to the same object. -{% include figure_jsroot-   file="trees.root" object="CostAge" width="500px" height="350px"-   caption="Scatterplot of the variables Age and Cost."-%}--Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in-the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid.+For the general case, indices into object collections could be persistified instead of pointers.+This way, the object is only stored once. -## Branches--You can organize columns, this is branches, of a tree with the {% include ref class="TBranch" %} class. A variable on a `TBranch` is called a leaf ({% include ref class="TLeaf" %}). If two variables are independent and it is certain that the variables will not be used together, they should be placed on separate branches.--The branch type differs by what is stored in it. A branch can contain the following data:--- an entire object,-- a list of simple variables,-- contents of a folder,-- contents of a {% include ref class="TList" %},-- an array of objects.--If two variables are independent and the variables will not be used together, place them on separate branches.-If the variables are related, such as the coordinates of a point, create one branch with both coordinates on it.--### Adding a branch+_**Example**_ -- Use the following syntax of the [TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"} method to add a {% include ref class="TBranch" %} to a tree:+ROOT's class {% include ref class="TNamed" %} has the data members `fName` and `fTitle`.+The following requests the tree to create a branch for each of them.+As `TNamed` derives from `TObject`, branches for `TObject`'s data members will also be created.  {% highlight C++ %}-   auto branch = tree.Branch(branchname, address, leaflist, bufsize)+TNamed var;+const int splitLevel = 99; // "all the way"+tree->Branch("branch0", &var, splitlevel); {% endhighlight %}+<br/>+**3. Branches holding `std::vector`, `std::array`, `std::list`, etc** -`address` is the address of the first item of a structure.-`leaflist` is the concatenation of all the variable names and types separated by a colon character. The variable name and the variable type are separated by a slash (/). The variable type must be one character.-For more information on adding a branch to tree, see → {% include ref class="TTree" %}.+Both top-level branches (those created by a call to `TTree::Branch()`) and branches created by splitting data members can hold collections such as `std::vector`, `std::array`, `std::list`, or `std::map`.+Splitting can traverse through collections:+if a member is a `std::vector<X>`, the tree can split `X` into sub-branches, too. -> **Note**->-> Do *not* use the {% include ref class="TBranch" %} constructor to add a branch to a tree.+Such collections can also contain pointers.+For polymorphic pointees, ROOT will not just stream the base, but determine the actual object type.+If the split level is `TTree::kSplitCollectionOfPointers` then the pointees will be written in split mode, possibly adding new branches as new polymorphic derived types are encountered. -### Adding a branch with a folder--- Use the following syntax to add a branch with a folder:+### Filling a tree -{% highlight C++ %}-   tree->Branch("/aFolder")-{% endhighlight %}+Use [TTree:Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} to add a new entry (or "row") to the tree, and store the current values of the variables that were provided during branch creation. -This creates one branch for each element in the folder. The method returns the total number of branches created.+### Writing the tree header -### Adding a branch with STL collections+Use [TTree::Write()](https://root.cern/doc/master/classTTree.html#af6f2d9ae4048ad85fcae5d2afa05100f){:target="_blank"} to write the tree header into a ROOT file.+Earlier entries' data might already be written as part of `TTree::Fill()`. -A `STLcollection` is a address of a pointer to `std::vector`, `std::list`, `std::deque`, `std::set` or `std::multiset` containing pointers to objects.+If due to the data written during `TTree::Fill()`, the file's size increases beyond [TTree::GetMaxTreeSize()](https://root.cern/doc/master/classTTree.html#aca38baf017a203ddb3119a9ab7283cd9){:target="_blank"}, the current ROOT file is closed and a new ROOT file is created.+For an original ROOT file named `myfile.root`, the subsequent ROOT files are named `myfile_1.root`, `myfile_2.root`, etc. -- Use the following syntax of the [TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"} method to add a `STLcollection`:+_**Example**_  {% highlight C++ %}-   auto branch = tree.Branch(branchname, STLcollection, buffsize, splitlevel);- {% endhighlight %}+std::unique_ptr<TFile> myFile( TFile::Open("file.root", "RECREATE") );+auto tree = std::make_unique<TTree>("tree", "The Tree Title"); -If the `splitlevel` is a value bigger than 100 [TTree::kSplitCollectionOfPointers](https://root.cern/doc/master/classTTree.html#a6d07819a66bb97bafd460adfad555114ae3b257c9ade74c1a53383d800c0a708c){:target="_blank"}  then the `STLcollection` will be written in split mode.+float var;+tree->Branch("branch0", &var); -If a dynamic structures changes with each entry, you have to redefine the branch address with [TBranch::SetAddress](https://root.cern/doc/master/classTBranch.html#a63e019ffc9c53ba249bd729da6a78657){:target="_blank"}  before filling the branch again.--### Adding a branch with objects--- Use the following syntax of the [TTree::Branch()](https://root.cern/doc/master/classTTree.html#ab47499eeb7793160b20fa950f4de716a){:target="_blank"} method to add objects to a tree:+for (int iEntry = 0; iEntry < 1000; ++iEntry) {+   var = 0.3 * iEntry;+   // Fill the current value of `var` into `branch0`+   tree->Fill();+} -{% highlight C++ %}-   MyClass object;-   auto branch = tree.Branch(branchname, &object, bufsize, splitlevel)+// Now write the header+tree->Write(); {% endhighlight %} -`&object` must be the address of a valid object. The object must not be destroyed (this is be deleted)-until the {% include ref class="TTree" %} is deleted or-[TTree::ResetBranchAddress](https://root.cern/doc/master/classTTree.html#a181eb19c03433781fde2fa94443710dc){:target="_blank"}-is called.--The following values are available for the `splitlevel`:--`splitlevel=0`<br>-The object is serialized in the branch buffer.+{% highlight Python %}+from array import array+import ROOT -`splitlevel=1 (default)`<br>-This branch is automatically into sub-branches, with one sub-branch for each-data member or object of the object itself. If the object member is a [TClonesArray](https://root.cern/doc/master/classTClonesArray.html){:target="_blank"}, it is processed as it is with `splitlevel=2`.+myFile = ROOT.TFile( ROOT.TFile.Open("file.root", "RECREATE") )+tree = ROOT.TTree("tree", "The Tree Title") -`splitlevel=2`<br>-This branch is automatically split into sub-branches, with one sub-branch for each-data member or object of the object itself. If the object member is a [TClonesArray](https://root.cern/doc/master/classTClonesArray.html){:target="_blank"} it is processed as a [TObject](https://root.cern/doc/master/classTObject.html){:target="_blank"}, but only for one branch.+# Provide a one-element array, so ROOT can read data from this memory. +var = array('f', [ 0 ])+tree.Branch("branch0", var, "leafname/F"); -### Adding a branch to an existing tree-You can add a branch to an existing tree.--_**Example**_+for iEntry in range(1000):+   var = 0.3 * iEntry+   # Fill the current value of `var` into `branch0`+   tree.Fill() -If one variable in the tree was computed with a certain algorithm, you may want to try another algorithm and compare the results. To do this, you can add a new branch, fill it, and save the tree.--{% highlight C++ %}-void tree3AddBranch() {-   TFile f("tree3.root", "update");-   Float_t new_v;-   auto t3 = f->Get<TTree>("t3");-   auto newBranch = t3->Branch("new_v", &new_v, "new_v/F");-   Long64_t nentries = t3->GetEntries();    // Read the number of entries in the t3.-   for (Long64_t i = 0; i < nentries; i++) {-      new_v = gRandom->Gaus(0, 1);-      newBranch->Fill();-   }-   t3->Write("", TObject::kOverwrite);       // Save only the new version of the tree.-}+# Now write the header+tree.Write() {% endhighlight %} -`kOverwrite` in the `Write()` method causes the tree to be overwritten.--## Examples for writing and reading trees+_AutoFlush_ -The following sections are examples of writing and reading trees that range in complexity from a simple tree with-a few variables to a tree with folders and complex event objects.+The tree can flush its data (i.e. its baskets) to file when reaching a given cluster size, thus closing the cluster.+By default this happens approximately every 30MB of compressed data.+The size can be adjusted using using [TTree::SetAutoFlush()](https://root.cern/doc/master/classTTree.html#ad4c7c7d70caf5657104832bcfbd83a9f){:target="_blank"}. -### A tree with a C structure+_AutoSave_ -> Tutorial->->  {% include tutorial name="tree2" %}+The tree can write a header update to file after it has collected a certain data size in baskets (by default, 300MB).+If your program crashes, you can recover the tree and its baskets written before the last autosave. -In this tutorial is shown:+You can adjust the threshold (in bytes or entries) using [TTree::SetAutoSave()](https://root.cern/doc/master/classTTree.html#a76259576b0094536ad084cde665c13a8){:target="_blank"}. -- how to build branches from a C structure-- how to make a branch with a fixed length array-- how to make a branch with a variable length array-- how to read selective branches-- how to fill a histogram from a branch-- how to [TTree::Draw](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"} to draw a 3D plot -### Adding friends to trees+## Reading a tree -> Tutorial+> **Note** >->  {% include tutorial name="tree3" %}--Adding a branch is often not possible because the tree is a read-only file and you do not have permission to save the modified tree with the new branch. Even if you do have the permission, you risk loosing the original tree with an unsuccessful attempt to save the modification. Since trees are usually large, adding a branch could extend it over the 2 GB limit. In this case, the attempt to write the tree fails, and the original data is may also be corrupted. In addition, adding a branch to a tree enlarges the tree and increases the amount of memory needed to read an entry, and therefore decreases the performance.--For these reasons ROOT offers the concept of friends for trees (and chains) by adding a branch manually with [TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a011d362261b694ee7dd780bad21f030b){:target="_blank"}.+> Please use {% include ref class="RDataFrame" namespace="ROOT" %} to read trees, unless you need to do low-level I/O! -The [TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a011d362261b694ee7dd780bad21f030b){:target="_blank"} method has two parameters, the first is the tree name and the second is the name of the ROOT file where the friend tree is saved.-[TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a011d362261b694ee7dd780bad21f030b){:target="_blank"} automatically opens the friend file.+To read a tree, you need to associate your variables with the tree's branches, as when writing.+When loading a tree entry, the tree will set the variables to the branch's value as read from the storage.+That is done by calling [`TTree::SetBranchAddress()`](https://root.cern/doc/master/classTTree.html#a39b867210e4a77ef44917fd5e7898a1d): +_**Example**_ -### Importing an ASCII file into a tree+{% highlight C++ %}+std::unique_ptr<TFile> myFile( TFile::Open("file.root") );+auto tree = myFile->Get<TTree>("TreeName"); -Use [TTree::ReadFile()](https://root.cern/doc/master/classTTree.html#a9c8da1fbc68221b31c21e55bddf72ce7){:target="_blank"} to automatically define the structure of the {% include ref class="TTree" %} and read the data from a formatted ASCII file.+int variable;+tree->SetBranchAddress("branchName", &variable); -_**Example**_+for (int iEntry = 0; tree->LoadTree(iEntry) >= 0; ++iEntry) {+   // Load the data for the given tree entry+   tree->GetEntry(iEntry); -{% highlight C++ %}-{-   gROOT->Reset();-   TFile *f = new TFile("basic2.root","RECREATE");-   TH1F *h1 = new TH1F("h1","x distribution",100,-4,4);-   TTree *T = new TTree("ntuple","data from ascii file");-   Long64_t nlines = T->ReadFile("basic.dat","x:y:z");-   printf(" found %lld pointsn",nlines);-   T->Draw("x","z>2");-   T->Write();+   // Now, `variable` is set to the value of the branch+   // "branchName" in tree entry `iEntry`+   printf("%d\n", variable); } {% endhighlight %} +In Python you can simply use the branch name as an attribute on the tree: -## Using trees for data analysis--The following methods are available for data analysis using trees:--- [TTree::Draw()](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"}+{% highlight Python %}+myFile = ROOT.TFile.Open("file.root")+myTree = myFile.TreeName+for entry in myTree:+   print(entry.branchName)+{% endhighlight %} -- [TTree::MakeClass()](https://root.cern/doc/master/classTTree.html#ac4ceaf4ae0b87412acf94093043cc2de){:target="_blank"} -- [TTree::MakeSelector()](https://root.cern/doc/master/classTTree.html#abe2c6509820373c42a88f343434cbcb4){:target="_blank"}+### Selecting a subset of branches to be read -### Using TTree:Draw()+You can select or deselect branches from being read by `GetEntry()` by calling [`TTree::SetBranchStatus()`](https://root.cern/doc/master/classTTree.html#aeca53bcd4482de1d883d259df6663920).+It is vividly recommended to only read the branches actually needed:+`TTree` is optimized for exactly this use case, and most analyses will only need a fraction of the available branches. -With the [TTree::Draw()](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"} method, you can easily plot a variable (a leaf).+{% highlight C++ %}+// Extract the tree as above. -_**Example**_+// Disable everything...+tree->SetBranchStatus("*", false);+// ...but the branch we need+tree->SetBranchStatus("branchName", true); -Open the `cernstaff.root` file (see → [Building a tree from an ASCII file](#example-building-a-tree-from-an-ascii-file)) and lists its content.+// Now proceed as above.+int variable;+tree->SetBranchAddress("branchName", &variable);+for (int iEntry = 0; tree->LoadTree(iEntry) >= 0; ++iEntry) {+   // Load the data for the given tree entry+   tree->GetEntry(iEntry); -{% highlight C++ %}-root [] TFile f("cernstaff.root")-root [] f.ls()-TFile**      cernstaff.root- TFile*      cernstaff.root-  KEY: TTree   T;1   CERN 1988 staff data+   printf("%d\n", variable);+} {% endhighlight %} -The `cernstaff.root` file contains the {% include ref class="TTree" %}  `T`. A pointer is created to the tree. -{% highlight C++ %}-   root [] TTree *MyTree = T-{% endhighlight %}+### Selecting a subset of entries to be read -To show the different `Draw()` options, a canvas with four sub-pads is created.+To process only a selection of tree entries, you can use a {% include ref class="TEntryList" %}.+First you insert the tree entry numbers you want to process into the `TEntryList`. -{% highlight C++ %}-   root [] TCanvas *myCanvas = new TCanvas()-   root [] myCanvas->Divide(2,2)+{% highlight Python %}+entryList = ROOT.TEntryList("entryListName", "Title of the entry list")+for entry in tree:+   if entry.missingET < 100:+      entryList.Enter(tree.GetReadEntry())+myFile = ROOT.TFile.Open("entrylist.root", "RECREATE")+myFile.WriteObject(entrylist) {% endhighlight %} -The first pad with is activated with [TCanvas::cd](https://root.cern/doc/master/classTCanvas.html#ad996aa7bc34186944363b48963de4de5){:target="_blank"}.+You can then re-use the `TEntryList` in subsequent processing of the tree, skipping irrelevant entries. -{% highlight C++ %}-   root [] myCanvas->cd(1)+{% highlight Python %}+myFile = ROOT.TFile.Open("entrylist.root")+entrylist = myFile.entryListName+tree.SetEntryList(entrylist)+for entry in tree:+   # all entries will have missingET < 100 {% endhighlight %} -The `Cost` variable is drawn. [TTree::Draw](https://root.cern/doc/master/classTCanvas.html#a2309e37a6471e07f9dad3e5af1fe5561){:target="_blank"}-automatically creates a histogram. The style of the histogram is inherited from the {% include ref class="TTree" %} attributes.--{% highlight C++ %}-   root [] MyTree->Draw("Cost")-{% endhighlight %}+## Appending `TTree`s as a `TChain` -{% include figure_jsroot-   file="trees.root" object="c1" width="500px" height="350px"-   caption="The variable `Cost` drawn in a histogram."-%}+In high energy physics you always want as much data as possible.+But it's not nice to deal with files of multiple terabytes.+ROOT allows to to split data across multiple files, where you can then access the files' tree parts as one large tree.+That's done through {% include ref class="TChain" %}, which inherits from {% include ref class="TTree" %}:+it wants to know the name of the trees in the files (which can be overridden when adding files), and the file names, and will act as if it was a huge, continuous tree: -Next, the second pad is activated and scatter plot is drawn. Two dimensions (here `Cost` and `Age`) are separated by a colon ("x:y").<br>-In general, this parameter is a string containing up to three expressions, one for each dimension, separated by a colon (“e1:e2:e3”).+_**Example**_  {% highlight C++ %}-   root [] myCanvas->cd(2)-   root [] MyTree->Draw("Cost:Age")+TChain chain("CommonTreeName");+if (chain.Add("data_*.root") != 12)+   std::cerr << "Expected to find 12 files!\n";+// Use `chain` as if it was a `TTree` {% endhighlight %} -{% include figure_jsroot-   file="trees.root" object="c2" width="500px" height="350px"-   caption="The variable `Cost` and `Age` drawn in a histogram."-%}+{% highlight Python %}+chain = ROOT.TChain("CommonTreeName")+if chain.Add("data_*.root") != 12:+   print("Expected to find 12 files!")+# Use `chain` as if it was a `TTree`+{% endhighlight %} -Next, the third pad is activated and a selection is added. `Cost` versus `Age` for the entries where the nation is equal to `"CH"` is drawn.<br>-You can use any C++ operator. The value of the selection is used as a weight when filling-the histogram. If the expression includes only Boolean operations the result is 0-(histogram is not filled) or 1 (histogram is filled).+## Widening a `TTree` through friends -{% highlight C++ %}-   root [] myCanvas->cd(3)-   root [] MyTree->Draw("Cost:Age","Nation == \"CH\"")-{% endhighlight %}+Trees are usually written just once.+While updating an existing tree is non-trivial, extending it with additional branches, potentially an "improved" version of an original branch, is straightforward.+"Friend trees" are added by calling [TTree::AddFriend()](https://root.cern/doc/master/classTTree.html#a321f2684de145cfcb01cabfce69ea910){:target="_blank"}.+Adding another tree called `T1` as a friend tree will make the branch `X` of `T1` available as both `T1.X` and - if `X` does not exist in the original tree - as `X`. -{% include figure_jsroot-   file="trees.root" object="c3" width="500px" height="350px"-   caption="The variable `Cost` and `Age` with a selection drawn in a histogram."-%}+Friend trees are expected to have at least as many entries as the original tree.+The order of the friend tree's entries must preserve the entry order of the original tree. -Next, the fourth pad is activated and the histogram is drawn with the draw option `surf2`.-Refer to the {% include ref class="THistPainter" %} class for possible draw options.+> **Note**+>+> Care must be taken to ensure that the order of entries in the primary tree matches friends' entries. This is especially relevant when processing a tree in parallel to generate a friend tree, as the entries might be written out in an undefined order (misaligned entries).  {% highlight C++ %}-   root [] myCanvas->cd(4)-   root [] MyTree->Draw("Cost:Age","Nation == \"CH\"","colz")-{% endhighlight %}+void treeWithFriend() {+   std::unique_ptr<TFile> myFile( TFile::Open("file.root") );+   auto tree = myFile->Get<TTree>("TreeName"); -{% include figure_jsroot-   file="trees.root" object="c4" width="500px" height="350px"-   caption="The variable `Cost` and `Age` with a selection and a draw option drawn in a histogram."-%}--The [TTree::Draw()](https://root.cern/doc/master/classTTree.html#ac4016b174665a086fe16695aad3356e2){:target="_blank"} method also accepts {% include ref class="TCut" %} objects. A {% include ref class="TCut" %} object is-a specialized string object used for {% include ref class="TTree" %} selections.+   std::unique_ptr<TFile> myFriendFile( TFile::Open("friend.root") );+   auto friendTree = myFriendFile->Get<TTree>("FriendTreeName"); +   tree->AddFriend(friendTree); -### Using TTree::MakeClass()+   int variable;+   tree->SetBranchAddress("branchName", &variable);+   int variableFriend;+   tree->SetBranchAddress("FriendTreeName.friendBranchName", &variableFriend); -- Use the [TTree::MakeClass()](https://root.cern/doc/master/classTTree.html#ac4ceaf4ae0b87412acf94093043cc2de){:target="_blank"}-method, to generate a skeleton class for looping over the entries of a tree.+   // Iteration over `tree` automatically advances its friend trees.+   for (int iEntry = 0; tree->LoadTree(iEntry) >= 0; ++iEntry) {+      // Load the data for the given tree entry+      tree->GetEntry(iEntry); +      printf("%d %d\n", variable, variableFriend);+   }+{% endhighlight %} -### Using TTree::MakeSelector()+# We still need to work on the parts below. No point in reviewing yet.

If this is going to be merged in its current state, please remove this!

Axel-Naumann

comment created time in 2 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 Only the last one (also accessible as `treename`) knows about all written basket  ### `TNtuple`, the high-performance spread-sheet -For convenience, ROOT also provides the {% include ref class="TNtuple" %} class which is a {% include ref class="TTree" %} that is limited to contain floating-point numbers only.+For convenience, ROOT also provides the {% include ref class="TNtuple" %} class which is a tree whose branches contain only numbers of type `float`, one per tree entry.+It derives from {% include ref class="TTree" %} and is constructued with a `;` separated list of column names.
It derives from {% include ref class="TTree" %} and is constructed with a list of column names separated by `:`.
Axel-Naumann

comment created time in 2 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 If the split level is `TTree::kSplitCollectionOfPointers` then the pointees will  Use [TTree:Fill()](https://root.cern/doc/master/classTTree.html#a00e0c422f5e4f6ebcdeef57ff23e9067){:target="_blank"} to add a new entry (or "row") to the tree, and store the current values of the variables that were provided during branch creation. -### Writing a tree+### Writing the tree header++Use [TTree::Write()](https://root.cern/doc/master/classTTree.html#af6f2d9ae4048ad85fcae5d2afa05100f){:target="_blank"} to write the tree header into a ROOT file.+Earlier entries' data might already be written as part of `TTree::Fill()`.++If due to the data written during `TTree::Fill()`, the file's size increases beyond [TTree::GetMaxTreeSize()](https://root.cern/doc/master/classTTree.html#aca38baf017a203ddb3119a9ab7283cd9){:target="_blank"}, the current ROOT file is closed and a new ROOT file is created.+For an original ROOT file named `myfile.root`, the subsequent ROOT files are named `myfile_1.root`, `myfile_2.root`, etc.++_**Example**_++{% highlight C++ %}+std::unique_ptr<TFile> myFile( TFile::Open("file.root", "RECREATE") );+auto tree = std::make_unique<TTree>("tree", "The Tree Title");++float var;+tree->Branch("branch0", &var);++for (int iEntry = 0; iEntry < 1000; ++iEntry) {+   var = 0.3 * iEntry;+   // Fill the current value of `var` into `branch0`+   tree->Fill();+}++// Now write the header+tree->Write();+{% endhighlight %}++{% highlight Python %}+from array import array+import ROOT++myFile = ROOT.TFile( ROOT.TFile.Open("file.root", "RECREATE") )+tree = ROOT.TTree("tree", "The Tree Title")++# Provide a one-element array, so ROOT can read data from this memory. +var = array('f', [ 0 ])+tree.Branch("branch0", var, "leafname/F");++for iEntry in range(1000):+   var = 0.3 * iEntry+   # Fill the current value of `var` into `branch0`+   tree.Fill() -The data of a tree are saved in a ROOT file (see → [ROOT files]({{ '/manual/root_files' | relative_url }})).+# Now write the header+tree.Write()+{% endhighlight %} -- Use the [TTree::Write()](https://root.cern/doc/master/classTTree.html#af6f2d9ae4048ad85fcae5d2afa05100f){:target="_blank"} method to write the tree into a ROOT file.+_AutoFlush_ -The `TTree::Write()` method is needed to write the ROOT file header.+The tree can flush its data (i.e. its baskets) to file when reaching a given cluster size, thus closing the cluster.+By default this happens approximatively every 30MB of compressed data.
By default this happens approximately every 30MB of compressed data.
Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 The scatterplot is drawn. Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid. +### Indexing trees++- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.++The index is built in the following way:+- A pass on all entries is made like in [TTree::Draw()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.+- `var1` = `majorname`+- `var2` = `minorname`+- `sel = 231` × _majorname_ + _minorname_+- For each entry in the tree the `sel` expression is evaluated and the result array is sorted into `fIndexValues`.++Once the index is calculated, an entry can be retrieved with+[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.++_**Example**_++{% highlight C++ %}+// To create an index using the leaves "Run" and "Event".+   tree.BuildIndex("Run","Event");++// To read entry corresponding to Run=1234 and Event=56789.+   tree.GetEntryWithIndex(1234,56789);+   {% endhighlight %}++Note that `majorname` and `minorname` can be expressions using original tree variables e.g., `"run-90000"` or `"event +3*xx"`.++In case an expression is specified, the equivalent expression must be computed when calling+[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.

Broken link?

Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 void treeWithFriend() {  ## Examining a tree -### Printing the summary of a tree+Different ways to examine the tree structure and content exist, from text to graphics.
ROOT offers different ways to examine tree structure and its contents, from text to graphics.
Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 The recusion level of nested splitting is called the "split level"; it can be co  If the split level is set to 0, there is no splitting: all data members are stored in the same branch. Data members can also be configured to be non-split as part of the dictionary; see → [I/O]({{ '/manual/io' | relative_url }}).-The default split level is 1; a split level of 99 means to split all members at any recursion level.+The default split level of 99 means to split all members at any recursion level.  _Pointers_  While references `X &` are not supported as member types, pointers are. If the pointer is non-null, ROOT stores the object pointed to (pointee).-If multiple pointers point to the same object during one `TTree::Fill()` operation, that pointee will only be stored once; upon reading, all pointers will again point to the same object.+If multiple pointers within the same branch point to the same object during one `TBranch::Fill()` operation (as invoked by `TTree::Fill()`), that pointee will only be stored once; upon reading, all pointers will again point to the same object.++For the general case, indices into object collections could be persistified instead of pointers.
For the general case, indices into object collections could be persisted instead of pointers.
Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 when reading back a `TTree` entry, it will write the values it read from storage A tree consists of a list of independent columns, called branches. A branch can contain values of any fundamental type, C++ objects known to ROOT's type system, or collections of those. Branches are represented by {% include ref class="TBranch" %} and its derived classes. -While `TBranch` represent structure, objects inheriting from {% include ref class="TLeaf" %} contain the actual data.-Originally, any columnar data was contained in a `TLeaf`; these days, some of the `TBranch`-derived classes contain data themselves, such as {% include ref class="TBranchElement" %}.+While `TBranch` represent structure, objects inheriting from {% include ref class="TLeaf" %} give access to the actual data.+Originally, any columnar data was accessible through a `TLeaf`; these days, some of the `TBranch`-derived classes provide data access themselves, such as {% include ref class="TBranchElement" %}.++### Baskets, clusters and the tree header++Every branch or leaf stores the data for its entries in buffers of a size that can be specified during branch creation (default: 32000 bytes).+Once the buffer is full, it gets compressed; the compressed buffer is called _basket_.+These baskets are written into the ROOT file.+Branches with more data per tree entry will fill more baskets than branches with less data per tree entry.+Conversely, baskets can hold many tree entries if their branch stores only a few bytes per tree entry.+This means that generally, all baskets - also of different branches - will contain data of different tree entry ranges.++To allow more efficient pre-fetching and better chunking of tree data stored in ROOT files, TTree groups baskets into clusters, for a defined range of tree entry indices.+Trees will close baskets that are not yet full when reaching the tree entry at a cluster boundary.++TTree finds the baskets for a given entry for a given branch by means of a header stored to file.
TTree finds the baskets for a given entry for a given branch by means of a header stored in the file.
Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 The scatterplot is drawn. Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid. +### Indexing trees++- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.
- Use the [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.
Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 The scatterplot is drawn. Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid. +### Indexing trees++- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.++The index is built in the following way:+- A pass on all entries is made like in [TTree::Draw()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.+- `var1` = `majorname`+- `var2` = `minorname`+- `sel = 231` × _majorname_ + _minorname_+- For each entry in the tree the `sel` expression is evaluated and the result array is sorted into `fIndexValues`.++Once the index is calculated, an entry can be retrieved with+[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.

Broken link?

Axel-Naumann

comment created time in 5 days

Pull request review commentroot-project/web

[Docs] Rewrite 'Trees' section

 The scatterplot is drawn. Note that not each `(x,y) point on a scatterplot represents two values in your N−tuple. In fact, the scatterplot is a grid and each square in the grid is randomly populated with a density of dots that’s proportional to the number of values in that grid. +### Indexing trees++- Use [TTree::BuildIndex()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bd0b06eea6c12b998){:target="_blank"} method to build an index table using expressions depending on the value in the leaves.++The index is built in the following way:+- A pass on all entries is made like in [TTree::Draw()](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.+- `var1` = `majorname`+- `var2` = `minorname`+- `sel = 231` × _majorname_ + _minorname_+- For each entry in the tree the `sel` expression is evaluated and the result array is sorted into `fIndexValues`.++Once the index is calculated, an entry can be retrieved with+[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.++_**Example**_++{% highlight C++ %}+// To create an index using the leaves "Run" and "Event".+   tree.BuildIndex("Run","Event");++// To read entry corresponding to Run=1234 and Event=56789.+   tree.GetEntryWithIndex(1234,56789);+   {% endhighlight %}++Note that `majorname` and `minorname` can be expressions using original tree variables e.g., `"run-90000"` or `"event +3*xx"`.++In case an expression is specified, the equivalent expression must be computed when calling+[TTree::GetEntryWithIndex(majornumber, minornumber)](https://root.cern/doc/master/classTTree.html#a3f6b5bb591ff7a5bTTree::Draw()d0b06eea6c12b998){:target="_blank"}.+To build an index with only `majorname`, specify `minorname="0"` (default).++Once the index is built, it can be saved with the `TTree` object with `tree.Write()`.++The most convenient place to create the index is at the end of the filling process just before saving the tree header. If a previous index was calculated, it will be redefined by this new call.
The most convenient place to create the index is at the end of the filling process just before saving the tree header. If a previous index was available, it will be overridden by this new call.
Axel-Naumann

comment created time in 5 days

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 void ROOT::Experimental::RNTupleWriter::CommitCluster()       field.Flush();       field.CommitCluster();    }-   fSink->CommitCluster(fNEntries);+   float nbytes = fSink->CommitCluster(fNEntries);
   const float nbytes = fSink->CommitCluster(fNEntries);
jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 public:     }     void* GetBuffer() const { return fBuffer; }-   /// Return a pointer after the last element that has space for nElements new elements. If there is not enough capacity,-   /// return nullptr-   void* TryGrow(ClusterSize_t::ValueType nElements) {-      auto offset = GetSize();-      auto nbyte = nElements * fElementSize;-      if (offset + nbyte > fCapacity) {-        return nullptr;-      }+   /// Called during writing: returns a pointer after the last element and increases the element counter+   /// in anticpation of the caller filling nElements in the page. It is the responsibility of the caller
   /// in anticipation of the caller filling nElements in the page. It is the responsibility of the caller
jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 void ROOT::Experimental::RNTupleWriter::CommitCluster()       field.Flush();       field.CommitCluster();    }-   fSink->CommitCluster(fNEntries);+   float nbytes = fSink->CommitCluster(fNEntries);+   // Cap the compression factor at 1000 to prevent overflow of fMinUnzippedClusterSizeEst+   float compressionFactor = std::min(1000.f, static_cast<float>(fUnzippedClusterSize) / nbytes);+   fUnzippedClusterSizeEst =

Using the avg. compression factor (instead of last cluster only) should probably be more accurate.

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 public:    void Connect(DescriptorId_t fieldId, RPageStorage *pageStorage);     void Append(const RColumnElementBase &element) {-      void *dst = fHeadPage.TryGrow(1);-      if (dst == nullptr) {-         Flush();-         dst = fHeadPage.TryGrow(1);-         R__ASSERT(dst != nullptr);+      void *dst = fHeadPage[fHeadPageIdx].GrowUnchecked(1);++      if (fHeadPage[fHeadPageIdx].GetNElements() == fApproxNElementsPerPage / 2) {+         FlushShadowHeadPage();       }+       element.WriteTo(dst, 1);       fNElements++;++      if (fHeadPage[fHeadPageIdx].GetNElements() == fApproxNElementsPerPage)+         SwapHeadPages();    }     void AppendV(const RColumnElementBase &elemArray, std::size_t count) {-      void *dst = fHeadPage.TryGrow(count);-      if (dst == nullptr) {+      // We might not have enough space in the current page. In this case, fall back to one by one filling.+      if (fHeadPage[fHeadPageIdx].GetNElements() + count > fApproxNElementsPerPage) {+         // TODO(jblomer): use (fewer) calls to AppendV to write the data page-by-page          for (unsigned i = 0; i < count; ++i) {             Append(RColumnElementBase(elemArray, i));          }          return;       }++      void *dst = fHeadPage[fHeadPageIdx].GrowUnchecked(count);++      // The check for flushing the shadow page is more complicated than for the Append() case+      // because we don't necessarily fill up to exactly fApproxNElementsPerPage / 2 elements;+      // we might instead jump over the 50% fill level+      if ((fHeadPage[fHeadPageIdx].GetNElements() <= fApproxNElementsPerPage / 2) &&+          (fHeadPage[fHeadPageIdx].GetNElements() + count > fApproxNElementsPerPage / 2))+      {+         FlushShadowHeadPage();+      }+       elemArray.WriteTo(dst, count);       fNElements += count;++      // Note that by the very first check, we cannot have filled more than fApproxNElementsPerPage elements+      if (fHeadPage[fHeadPageIdx].GetNElements() == fApproxNElementsPerPage)+         SwapHeadPages();

Pages are also conditionally swapped in Append(), consider moving this check into the function (and maybe rename), e.g. to SwapHeadPagesIfNeeded().

Maybe we could also do the same for FlushShadowHeadPage()?

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 ROOT::Experimental::Detail::RPageSinkDaos::CommitSealedPageImpl(    result.fBytesOnStorage = sealedPage.fSize;    fCounters->fNPageCommitted.Inc();    fCounters->fSzWritePayload.Add(sealedPage.fSize);+   fNBytesCurrentCluster += sealedPage.fSize;    return result; }  -// TODO(jalopezg): the current byte range arithmetic makes little sense for the-// object store. We might find out, however, that there are native ways to group-// clusters in DAOS.-ROOT::Experimental::RClusterDescriptor::RLocator+std::uint64_t ROOT::Experimental::Detail::RPageSinkDaos::CommitClusterImpl(ROOT::Experimental::NTupleSize_t /* nEntries */) {-   return {};+   auto result = fNBytesCurrentCluster;+   fNBytesCurrentCluster = 0;+   return result;

Maybe use std::exchange()?

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 ROOT::Experimental::RNTupleWriter::RNTupleWriter( #endif    fSink->Create(*fModel.get());    fMetrics.ObserveMetrics(fSink->GetMetrics());++   const auto &writeOpts = fSink->GetWriteOptions();+   fMaxUnzippedClusterSize = writeOpts.GetMaxUnzippedClusterSize();+   // First estimate is a factor 2 compression if compression is used at all+   int scale = writeOpts.GetCompression() ? 2 : 1;
   const int scale = writeOpts.GetCompression() ? 2 : 1;
jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 void ROOT::Experimental::RNTupleWriter::CommitCluster()       field.Flush();       field.CommitCluster();    }-   fSink->CommitCluster(fNEntries);+   float nbytes = fSink->CommitCluster(fNEntries);+   // Cap the compression factor at 1000 to prevent overflow of fMinUnzippedClusterSizeEst+   float compressionFactor = std::min(1000.f, static_cast<float>(fUnzippedClusterSize) / nbytes);
   const float compressionFactor = std::min(1000.f, static_cast<float>(fUnzippedClusterSize) / nbytes);
jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 public:    void Connect(DescriptorId_t fieldId, RPageStorage *pageStorage);     void Append(const RColumnElementBase &element) {-      void *dst = fHeadPage.TryGrow(1);-      if (dst == nullptr) {-         Flush();-         dst = fHeadPage.TryGrow(1);-         R__ASSERT(dst != nullptr);+      void *dst = fHeadPage[fHeadPageIdx].TryGrow(1);+      R__ASSERT(dst != nullptr);+      if (fHeadPage[fHeadPageIdx].GetNElements() == fApproxNElementsPerPage / 2) {

Given that this function appends just one element, I think it's okay to leave ==. Changing it to >= shouldn't cause any side effect -besides it's probably semantically incorrect-, as the first call to FlushShadowHeadPage() calls RPage::Reset(0) and subsequent calls return early in line 90.

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 protected:       auto newItemField = fSubFields[0]->Clone(fSubFields[0]->GetName());       return std::make_unique<RField<ROOT::VecOps::RVec<ItemT>>>(newName, std::move(newItemField));    }-   void AppendImpl(const Detail::RFieldValue& value) final {+   std::size_t AppendImpl(const Detail::RFieldValue& value) final {       auto typedValue = value.Get<ContainerT>();+      auto nbytes = 0;       auto count = typedValue->size();       for (unsigned i = 0; i < count; ++i) {          auto itemValue = fSubFields[0]->CaptureValue(&typedValue->data()[i]);-         fSubFields[0]->Append(itemValue);+         nbytes += fSubFields[0]->Append(itemValue);       }       Detail::RColumnElement<ClusterSize_t, EColumnType::kIndex> elemIndex(&fNWritten);       fNWritten += count;       fColumns[0]->Append(elemIndex);+      return nbytes + sizeof(elemIndex);

I think the use of sizeof(elemIndex) -also in line 1472- is okay here because elements written to pages are in a machine-dependent format anyway (?).

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

+/// \file RNTupleOptions.cxx+/// \ingroup NTuple ROOT7+/// \author Jakob Blomer <jblomer@cern.ch>+/// \date 2021-07-28+/// \warning This is part of the ROOT 7 prototype! It will change without notice. It might trigger earthquakes. Feedback+/// is welcome!++/*************************************************************************+ * Copyright (C) 1995-2021, Rene Brun and Fons Rademakers.               *+ * All rights reserved.                                                  *+ *                                                                       *+ * For the licensing terms see $ROOTSYS/LICENSE.                         *+ * For the list of contributors see $ROOTSYS/README/CREDITS.             *+ *************************************************************************/++#include <ROOT/RError.hxx>+#include <ROOT/RNTupleOptions.hxx>++#include <utility>++namespace {++void EnsureValidTunables(std::size_t zippedClusterSize, std::size_t unzippedClusterSize, std::size_t unzippedPageSize)+{+   using RException = ROOT::Experimental::RException;+   if (zippedClusterSize == 0) {+      throw RException(R__FAIL("invalid target cluster size: 0"));+   }+   if (zippedClusterSize > unzippedClusterSize) {+      throw RException(R__FAIL("compressed target cluster size must not be larger than "+                               "maximum uncompressed cluster size"));+   }+   if (unzippedPageSize > unzippedClusterSize) {+      throw RException(R__FAIL("compressed target page size must not be larger than "
      throw RException(R__FAIL("target page size must not be larger than "
jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

+/// \file RNTupleOptions.cxx+/// \ingroup NTuple ROOT7+/// \author Jakob Blomer <jblomer@cern.ch>+/// \date 2021-07-28+/// \warning This is part of the ROOT 7 prototype! It will change without notice. It might trigger earthquakes. Feedback+/// is welcome!++/*************************************************************************+ * Copyright (C) 1995-2021, Rene Brun and Fons Rademakers.               *+ * All rights reserved.                                                  *+ *                                                                       *+ * For the licensing terms see $ROOTSYS/LICENSE.                         *+ * For the list of contributors see $ROOTSYS/README/CREDITS.             *+ *************************************************************************/++#include <ROOT/RError.hxx>+#include <ROOT/RNTupleOptions.hxx>++#include <utility>++namespace {++void EnsureValidTunables(std::size_t zippedClusterSize, std::size_t unzippedClusterSize, std::size_t unzippedPageSize)+{+   using RException = ROOT::Experimental::RException;+   if (zippedClusterSize == 0) {+      throw RException(R__FAIL("invalid target cluster size: 0"));+   }+   if (zippedClusterSize > unzippedClusterSize) {+      throw RException(R__FAIL("compressed target cluster size must not be larger than "+                               "maximum uncompressed cluster size"));+   }+   if (unzippedPageSize > unzippedClusterSize) {+      throw RException(R__FAIL("compressed target page size must not be larger than "+                               "maximum uncompressed cluster size"));+   }+   if (unzippedPageSize == 0) {+      throw RException(R__FAIL("invalid target page size: 0"));+   }+}++} // anonymous namespace++std::unique_ptr<ROOT::Experimental::RNTupleWriteOptions>+ROOT::Experimental::RNTupleWriteOptions::Clone() const+{+   return std::make_unique<RNTupleWriteOptions>(*this);+}

Why did we move this inlineable definition from the header to the TU?

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 private:     * the offset column with index 0 and the character value column with index 1.     */    std::uint32_t fIndex;-   RPageSink *fPageSink;-   RPageSource *fPageSource;+   RPageSink *fPageSink = nullptr;+   RPageSource *fPageSource = nullptr;    RPageStorage::ColumnHandle_t fHandleSink;    RPageStorage::ColumnHandle_t fHandleSource;-   /// Open page into which new elements are being written-   RPage fHeadPage;+   /// A set of open pages into which new elements are being written. The pages are used+   /// in rotation. They are 50% bigger than the target size given by the write options.+   /// The current page is filled until the target size, but it is only committed once the other+   /// head page is filled at least 50%. If a flush occurs earlier, a slightly oversize, single+   /// page will be committed.+   RPage fHeadPage[2];

I guess fHeadPage[fHeadPageIdx] is used for writing while fCurrentPage is only used for reading. Maybe we should rename those to be more descriptive, e.g. fWritePage and fReadPage?

jblomer

comment created time in 6 days

Pull request review commentroot-project/root

[ntuple] Overhaul tuning and default settings when writing

 All page sink classes need to support the common options. class RNTupleWriteOptions {    int fCompression{RCompressionSetting::EDefaults::kUseAnalysis};    ENTupleContainerFormat fContainerFormat{ENTupleContainerFormat::kTFile};-   NTupleSize_t fNEntriesPerCluster = 64000;-   NTupleSize_t fNElementsPerPage = 10000;+   /// Approximation of the target compressed cluster size+   std::size_t fApproxZippedClusterSize = 50 * 1000 * 1000;+   /// Memory limit for committing a cluster: with very high compression ratio, we need a limit+   /// on how large the I/O buffer can grow during writing.+   std::size_t fMaxUnzippedClusterSize = 512 * 1024 * 1024;+   /// Should be just large enough so that the compression ratio does not benefit much more from larger pages.+   /// Unless the cluster is too small to contain a sufficiently large page, pages are+   /// fApproxUnzippedPageSize in size and tail pages (the last page in a cluster) is between+   /// fApproxUnzippedPageSize/2 and fApproxUnzippedPageSize * 2 in size.

Wasn't the upper limit 1.5 * fApproxUnzippedPageSize (according to tuning.md)?

jblomer

comment created time in 6 days