profile
viewpoint
Daniel Lemire lemire Université du Québec (TELUQ) Montreal, Canada http://lemire.me/en/ Daniel Lemire is a computer science professor. His research is focused on software performance and indexing.

google/highwayhash 1066

Fast strong hash functions: SipHash/HighwayHash

FastFilter/xorfilter 429

Go library implementing xor filters

FastFilter/xor_singleheader 149

Header-only Xor Filter library

FastFilter/fastfilter_java 144

Fast Approximate Membership Filters (Java)

FastFilter/fastfilter_cpp 116

Fast Approximate Membership Filters (C++)

eddelbuettel/rcppsimdjson 75

Rcpp Bindings for the 'simdjson' Header Library

geofflangdale/simdcsv 74

A fast SIMD parser for CSV files

FastFilter/FilterPassword 25

Experiments in C++: Xor filters vs. Bloom filters

lemire/backward_multiplication 10

Multiplying... backward?

lemire/BitSliceIndex 10

Experiments on bit-slice indexing

startedlemire/cbitset

started time in 10 minutes

PR opened lemire/fast_float

remove 1.1MB (85%) of binary size by not including iostream

Picking up on the discussion started on #23 about large binaries: it is definitely real. This is what I'm seeing on the size harness that I detailed on that discussion:

[jpmag@pc] 3039$ for i in * ; do echo "$i/`cat $i/bm/float/*fast_float_d*dat`" ; done
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.343s, file_size: 1447312B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.353s, file_size: 1391776B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.318s, file_size: 1451928B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.203s, file_size: 1391768B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.334s, file_size: 1462908B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.328s, file_size: 1392740B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.203s, file_size: 1457560B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.191s, file_size: 1392732B}

[jpmag@pc] 3040$ for i in * ; do echo "$i/`cat $i/bm/float/*fast_float_f*dat`" ; done
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.349s, file_size: 1447312B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.361s, file_size: 1391776B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.334s, file_size: 1451904B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.234s, file_size: 1391768B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.412s, file_size: 1462892B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.365s, file_size: 1392740B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.225s, file_size: 1457524B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.194s, file_size: 1392732B}

Notice that this happens with both g++ and clang++, for x86 and x86_64 and also for Debug and Release. Notice also that the baseline executable consisting of the while(fgets()) { fputs() } is rarely above 20KB.

[jpmag@pc] 3041$ for i in * ; do echo "$i/`cat $i/bm/float/*base*dat`" ; done
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-baseline: {compile: 0.173s, file_size: 20096B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-baseline: {compile: 0.156s, file_size: 16800B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-baseline: {compile: 0.150s, file_size: 21072B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-baseline: {compile: 0.122s, file_size: 16800B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-baseline: {compile: 0.196s, file_size: 19988B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-baseline: {compile: 0.164s, file_size: 15640B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-baseline: {compile: 0.124s, file_size: 19804B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-baseline: {compile: 0.127s, file_size: 15676B}

When you point out that the fast_float code is small, you are right. But there is an #include <iostream>, and that is usually reason enough to cause bloated binaries. It brings a mountain of code: 30K lines and 713K characters, together with exceptions, new()s, delete()s, etc:

[jpmag@pc] 3051$ echo "#include <iostream>" | g++ -E -x c++ - | wc -lc
  29998  712929

Let's look at the sizes for iostream:

[jpmag@pc] 3052$ for i in * ; do echo "$i/`cat $i/bm/float/*iostream_f*dat`" ; done
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-iostream_f: {compile: 0.343s, file_size: 1357672B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-iostream_f: {compile: 0.328s, file_size: 1345232B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-iostream_f: {compile: 0.307s, file_size: 1362576B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-iostream_f: {compile: 0.226s, file_size: 1345272B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-iostream_f: {compile: 0.424s, file_size: 1356560B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-iostream_f: {compile: 0.355s, file_size: 1343316B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-iostream_f: {compile: 0.299s, file_size: 1356252B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-iostream_f: {compile: 0.189s, file_size: 1347436B}

[jpmag@pc] 3053$ for i in * ; do echo "$i/`cat $i/bm/float/*iostream_d*dat`" ; done
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-iostream_d: {compile: 0.346s, file_size: 1357672B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-iostream_d: {compile: 0.368s, file_size: 1345232B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-iostream_d: {compile: 0.333s, file_size: 1362576B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-iostream_d: {compile: 0.220s, file_size: 1345272B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-iostream_d: {compile: 0.324s, file_size: 1356560B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-iostream_d: {compile: 0.331s, file_size: 1343316B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-iostream_d: {compile: 0.202s, file_size: 1356252B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-iostream_d: {compile: 0.208s, file_size: 1347436B}

Don't these sizes look suspiciously similar to fast_float above? Let's check:

[jpmag@pc] 3054$ bloaty -d segments,sections,symbols linux-x86_64-gxx10.2-Debug/bm/float/c4core-bm-readfloat-fast_float_d
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  47.9%   678Ki  67.7%   678Ki    LOAD #3 [RX]
   100.0%   678Ki 100.0%   678Ki    .text
      70.1%   475Ki  70.1%   475Ki    [1476 Others]
       5.2%  35.1Ki   5.2%  35.1Ki    std::num_get<>::_M_extract_int<>()
       3.5%  23.8Ki   3.5%  23.8Ki    [section .text]
       2.9%  20.0Ki   2.9%  20.0Ki    std::__cxx11::money_get<>::_M_extract<>()
       2.5%  16.9Ki   2.5%  16.9Ki    std::money_get<>::_M_extract<>()
       2.3%  15.6Ki   2.3%  15.6Ki    d_print_comp_inner
       1.3%  8.68Ki   1.3%  8.68Ki    std::__cxx11::money_put<>::_M_insert<>()
       1.1%  7.50Ki   1.1%  7.50Ki    std::num_get<>::_M_extract_float()
       1.1%  7.16Ki   1.1%  7.16Ki    std::money_put<>::_M_insert<>()
       1.0%  7.05Ki   1.0%  7.05Ki    std::__moneypunct_cache<>::_M_cache()
       1.0%  7.04Ki   1.0%  7.04Ki    _ZNKSt7__cxx118time_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE21_M_extract_via_formatES4_S4_RSt8ios_baseRSt12_Ios_IostateP2tmPKc.localalias
       0.9%  5.99Ki   0.9%  5.99Ki    std::__facet_shims::__moneypunct_fill_cache<>()
       0.9%  5.87Ki   0.9%  5.87Ki    std::num_get<>::do_get()
       0.8%  5.68Ki   0.8%  5.68Ki    std::basic_fstream<>::basic_fstream()
       0.8%  5.55Ki   0.8%  5.55Ki    std::__cxx11::time_get<>::get()
       0.8%  5.45Ki   0.8%  5.45Ki    _ZNKSt7__cxx118time_getIwSt19istreambuf_iteratorIwSt11char_traitsIwEEE21_M_extract_via_formatES4_S4_RSt8ios_baseRSt12_Ios_IostateP2tmPKw.localalias
       0.8%  5.37Ki   0.8%  5.37Ki    std::__cxx11::moneypunct<>::_M_initialize_moneypunct()
       0.8%  5.37Ki   0.8%  5.37Ki    std::moneypunct<>::_M_initialize_moneypunct()
       0.8%  5.17Ki   0.8%  5.17Ki    std::time_get<>::get()
       0.7%  4.90Ki   0.7%  4.90Ki    std::num_put<>::_M_insert_int<>()
       0.7%  4.87Ki   0.7%  4.87Ki    _ZNKSt8time_getIcSt19istreambuf_iteratorIcSt11char_traitsIcEEE21_M_extract_via_formatES3_S3_RSt8ios_baseRSt12_Ios_IostateP2tmPKc.localalias
     0.0%      48   0.0%      48    .plt
     0.0%      32   0.0%      32    .plt.got
     0.0%      27   0.0%      27    .init
     0.0%      13   0.0%      13    .fini
     0.0%       8   0.0%       8    [LOAD #3 [RX]]
  29.9%   424Ki   0.0%       0    [Unmapped]
    58.7%   249Ki   NAN%       0    .strtab
      85.2%   212Ki   NAN%       0    [1867 Others]
       1.8%  4.38Ki   NAN%       0    std::__cxx11::basic_string<>::replace()
       1.3%  3.35Ki   NAN%       0    std::__cxx11::basic_string<>::basic_string()
       1.2%  2.99Ki   NAN%       0    std::use_facet<>()
       1.1%  2.64Ki   NAN%       0    std::has_facet<>()
       0.9%  2.30Ki   NAN%       0    std::num_get<>::do_get()
       0.9%  2.24Ki   NAN%       0    std::num_get<>::get()
       0.9%  2.21Ki   NAN%       0    [section .strtab]
       0.8%  1.94Ki   NAN%       0    std::__cxx11::basic_string<>::insert()
       0.6%  1.44Ki   NAN%       0    std::num_get<>::_M_extract_int<>()
       0.6%  1.38Ki   NAN%       0    std::__cxx11::basic_string<>::_M_construct<>()
       0.6%  1.38Ki   NAN%       0    std::operator<< <>()
       0.5%  1.36Ki   NAN%       0    std::num_put<>::do_put()
       0.5%  1.32Ki   NAN%       0    std::num_put<>::put()
       0.5%  1.27Ki   NAN%       0    std::operator>><>()
       0.5%  1.27Ki   NAN%       0    std::__cxx11::moneypunct<>::moneypunct()
       0.4%  1.12Ki   NAN%       0    std::basic_string<>::basic_string()
       0.4%  1.08Ki   NAN%       0    std::moneypunct<>::moneypunct()
       0.4%  1.07Ki   NAN%       0    std::__cxx11::moneypunct_byname<>::moneypunct_byname()
       0.4%  1.05Ki   NAN%       0    std::time_put_byname<>::time_put_byname()
       0.4%  1.05Ki   NAN%       0    std::__facet_shims::__moneypunct_fill_cache<>()
    28.6%   121Ki   NAN%       0    .symtab
      86.4%   104Ki   NAN%       0    [1872 Others]
       3.1%  3.75Ki   NAN%       0    [section .symtab]
       1.1%  1.29Ki   NAN%       0    (anonymous namespace)::get_global()::global
       1.0%  1.22Ki   NAN%       0    std::__cxx11::basic_string<>::basic_string()
       0.9%  1.03Ki   NAN%       0    std::__cxx11::basic_string<>::replace()
       0.9%  1.03Ki   NAN%       0    std::use_facet<>()
       0.8%     960   NAN%       0    std::has_facet<>()
       0.5%     624   NAN%       0    std::basic_string<>::basic_string()
       0.5%     624   NAN%       0    std::string::string()
       0.5%     576   NAN%       0    std::__cxx11::moneypunct<>::moneypunct()
       0.5%     576   NAN%       0    std::__facet_shims::(anonymous namespace)::moneypunct_shim<>
       0.5%     576   NAN%       0    std::__facet_shims::(anonymous namespace)::moneypunct_shim<>::~moneypunct_shim()
       0.5%     576   NAN%       0    std::moneypunct<>::moneypunct()
       0.4%     528   NAN%       0    std::__cxx11::basic_string<>::insert()
       0.4%     528   NAN%       0    std::num_get<>::do_get()
       0.4%     528   NAN%       0    std::num_get<>::get()
       0.4%     528   NAN%       0    std::operator<< <>()
       0.4%     480   NAN%       0    std::operator>><>()
       0.3%     408   NAN%       0    std::basic_istream<>::operator>>()
       0.3%     408   NAN%       0    std::basic_ostream<>::operator<<()
       0.3%     408   NAN%       0    std::istream::operator>>()
     6.6%  28.1Ki   NAN%       0    .debug_info

...... continues

A lot of entries suspiciously related to stream/string. So let's see what happens if we remove these:

modified   include/fast_float/decimal_to_binary.h
@@ -10,7 +10,6 @@
 #include <cstdio>
 #include <cstdlib>
 #include <cstring>
-#include <iostream>
 
 namespace fast_float {
 
modified   include/fast_float/float_common.h
@@ -363,8 +363,8 @@ constexpr int binary_format<float>::smallest_power_of_ten() {
 } // namespace fast_float
 
 // for convenience:
-#include <ostream>
-inline std::ostream &operator<<(std::ostream &out, const fast_float::decimal &d) {
+template<class OStream>
+inline OStream& operator<<(OStream &out, const fast_float::decimal &d) {
   out << "0.";
   for (size_t i = 0; i < d.num_digits; i++) {
     out << int32_t(d.digits[i]);

... and as I expected the result is now this:


[jpmag@pc] 3055$ for i in * ; do echo "$i/`cat $i/bm/float/*fast_float_f*dat`" ; done                                               
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.164s, file_size: 203176B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.141s, file_size: 149080B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.103s, file_size: 211976B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.092s, file_size: 34488B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.128s, file_size: 219004B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.131s, file_size: 150748B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-fast_float_f: {compile: 0.225s, file_size: 1457524B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-fast_float_f: {compile: 0.093s, file_size: 37492B}

[jpmag@pc] 3056$ for i in * ; do echo "$i/`cat $i/bm/float/*fast_float_d*dat`" ; done 
linux-x86_64-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.171s, file_size: 203184B}
linux-x86_64-clangxx11.0-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.129s, file_size: 149080B}
linux-x86_64-gxx10.2-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.103s, file_size: 212008B}
linux-x86_64-gxx10.2-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.080s, file_size: 34488B}
linux-x86-clangxx11.0-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.143s, file_size: 219020B}
linux-x86-clangxx11.0-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.133s, file_size: 150752B}
linux-x86-gxx10.2-Debug/c4core-bm-readfloat-fast_float_d: {compile: 0.203s, file_size: 1457560B}
linux-x86-gxx10.2-Release/c4core-bm-readfloat-fast_float_d: {compile: 0.094s, file_size: 37492B}

So the size went down from 1.4MB to 0.2MB. The new clang size of 200KB is still high, but we can take a look at that at a later occasion. Let's take a look at the new binary:

[jpmag@pc] 3057$ bloaty -d segments,sections,symbols linux-x86_64-gxx10.2-Debug/bm/float/c4core-bm-readfloat-fast_float_d
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  39.6%  81.9Ki  63.4%  81.9Ki    LOAD #3 [RX]
    99.8%  81.8Ki  99.8%  81.8Ki    .text
      36.6%  30.0Ki  36.6%  30.0Ki    [216 Others]
      19.0%  15.6Ki  19.0%  15.6Ki    d_print_comp_inner
       5.9%  4.81Ki   5.9%  4.81Ki    fast_float::parse_long_mantissa<>()
       5.5%  4.52Ki   5.5%  4.52Ki    fast_float::from_chars<>()
       3.1%  2.55Ki   3.1%  2.55Ki    d_type
       2.8%  2.31Ki   2.8%  2.31Ki    d_print_mod
       2.6%  2.11Ki   2.6%  2.11Ki    execute_cfa_program
       2.4%  1.93Ki   2.4%  1.93Ki    search_object
       2.2%  1.80Ki   2.2%  1.80Ki    execute_stack_op
       2.1%  1.74Ki   2.1%  1.74Ki    d_expression_1
       2.0%  1.63Ki   2.0%  1.63Ki    d_name
       2.0%  1.60Ki   2.0%  1.60Ki    uw_frame_state_for
       1.8%  1.50Ki   1.8%  1.50Ki    [section .text]
       1.7%  1.43Ki   1.7%  1.43Ki    __gxx_personality_v0
       1.7%  1.42Ki   1.7%  1.42Ki    _Unwind_IteratePhdrCallback
       1.7%  1.40Ki   1.7%  1.40Ki    d_demangle_callback.constprop.0
       1.7%  1.38Ki   1.7%  1.38Ki    d_special_name
       1.5%  1.26Ki   1.5%  1.26Ki    d_unqualified_name
       1.3%  1.06Ki   1.3%  1.06Ki    uw_update_context_1
       1.1%     959   1.1%     959    _Unwind_RaiseException
       1.1%     944   1.1%     944    d_maybe_print_fold_expression
     0.1%      48   0.1%      48    .plt
     0.0%      27   0.0%      27    .init
     0.0%      24   0.0%      24    .plt.got
     0.0%      16   0.0%      16    [LOAD #3 [RX]]
     0.0%      13   0.0%      13    .fini
  36.7%  76.0Ki   0.0%       0    [Unmapped]
    36.2%  27.5Ki   NAN%       0    .debug_info
    14.4%  11.0Ki   NAN%       0    .strtab
      72.1%  7.92Ki   NAN%       0    [262 Others]
       6.1%     691   NAN%       0    [section .strtab]
       1.8%     207   NAN%       0    (anonymous namespace)::get_global()::global
       1.2%     140   NAN%       0    fast_float::(anonymous namespace)::number_of_digits_decimal_left_shift()::number_of_digits_decimal_left_shift_table_powers_of_5
       1.2%     139   NAN%       0    __cxxabiv1::__class_type_info::__do_upcast()
       1.2%     138   NAN%       0    __gnu_cxx::__concurrence_unlock_error::~__concurrence_unlock_error()
       1.2%     135   NAN%       0    __gnu_cxx::__concurrence_unlock_error
       1.2%     132   NAN%       0    __gnu_cxx::__concurrence_lock_error::~__concurrence_lock_error()
       1.1%     129   NAN%       0    __gnu_cxx::__concurrence_lock_error
       1.1%     128   NAN%       0    __cxxabiv1::__si_class_type_info::__do_dyncast()
       1.1%     128   NAN%       0    fast_float::(anonymous namespace)::number_of_digits_decimal_left_shift()::number_of_digits_decimal_left_shift_table
       1.1%     126   NAN%       0    __cxxabiv1::__si_class_type_info::~__si_class_type_info()
       1.1%     123   NAN%       0    __cxxabiv1::__foreign_exception::~__foreign_exception()
       1.1%     123   NAN%       0    __cxxabiv1::__si_class_type_info
       1.1%     122   NAN%       0    __libc_csu_init
       1.1%     120   NAN%       0    __cxxabiv1::__foreign_exception
       1.0%     117   NAN%       0    __cxxabiv1::__class_type_info::~__class_type_info()
       1.0%     114   NAN%       0    __cxxabiv1::__class_type_info
       1.0%     111   NAN%       0    __cxxabiv1::__forced_unwind::~__forced_unwind()
       1.0%     108   NAN%       0    __cxxabiv1::__forced_unwind
       1.0%     107   NAN%       0    __cxxabiv1::__class_type_info::__do_dyncast()
    13.2%  10.0Ki   NAN%       0    .symtab
      64.9%  6.49Ki   NAN%       0    [266 Others]
      17.8%  1.78Ki   NAN%       0    [section .symtab]
       3.5%     360   NAN%       0    (anonymous namespace)::get_global()::global
       1.2%     120   NAN%       0    __gnu_cxx::__verbose_terminate_handler()
       1.2%     120   NAN%       0    __libc_csu_init
       0.9%      96   NAN%       0    stdout@@GLIBC_2.2.5
       0.7%      72   NAN%       0    __cxxabiv1::__class_type_info
       0.7%      72   NAN%       0    __cxxabiv1::__class_type_info::~__class_type_info()
       0.7%      72   NAN%       0    __cxxabiv1::__forced_unwind
       0.7%      72   NAN%       0    __cxxabiv1::__forced_unwind::~__forced_unwind()
       0.7%      72   NAN%       0    __cxxabiv1::__foreign_exception
       0.7%      72   NAN%       0    __cxxabiv1::__foreign_exception::~__foreign_exception()
       0.7%      72   NAN%       0    __cxxabiv1::__si_class_type_info
       0.7%      72   NAN%       0    __cxxabiv1::__si_class_type_info::~__si_class_type_info()
       0.7%      72   NAN%       0    __gnu_cxx::__concurrence_lock_error
       0.7%      72   NAN%       0    __gnu_cxx::__concurrence_lock_error::~__concurrence_lock_error()
       0.7%      72   NAN%       0    __gnu_cxx::__concurrence_unlock_error
       0.7%      72   NAN%       0    __gnu_cxx::__concurrence_unlock_error::~__concurrence_unlock_error()
       0.7%      72   NAN%       0    std::bad_exception
       0.7%      72   NAN%       0    std::bad_exception::~bad_exception()
       0.7%      72   NAN%       0    std::exception
    12.5%  9.51Ki   NAN%       0    .debug_str

So that was it. streams was our culprit.

This is actually not a surprise; I've seen it before. But unfortunately, for most people this will likely come as surprise, even if they have a faint idea of the cost of streams. These creatures should have no place in code that is intended to be lean and fast. They are the exact opposite of that and to paraphrase goto, "streams considered evil". The headers are heavy, the binaries are heavy, and the code is slow. They certainly do not follow C++'s mantra of not paying for what's not used. streams are the modern equivalent to slavery. They are widely used an they seem an integral part of daily life, but with many people you run a risk of being taken for a lunatic if you point out how evil they are. Like with slavery, status quo is very strong.

I will now stop the rant, collect myself and press the submit button.

+2 -3

0 comment

2 changed files

pr created time in an hour

startedTessil/array-hash

started time in 3 hours

Pull request review commentsimdjson/simdjson

Adding an example corresponding to issue 1316

 int main(void) { } ``` +The following is a similar example where one first accesses

I don't see it here, but we should probably include a relevant snippet of twitter.json here. Also, some grammar:

The following is a similar example where one wants to get the id of the first tweet. To do this, we use ["statuses"].at(0)["id"]. To break that down:

  1. Get the list of tweets (the "statuses" key of the document) using ["statuses"]).
  2. Get the first tweet using .at(0).
  3. Get the id of the tweet using ["id"].
lemire

comment created time in 5 hours

Pull request review commentsimdjson/simdjson

Adding an example corresponding to issue 1316

 int main(void) { } ``` +The following is a similar example where one first accesses+an array matching the "statuses" key. The result is expected to be+an array. One select the first value in the array. The result+is expected to be an object. One then selects the vallue corresponding+to the key "id". We expec the value to be a non-negative integer.++```C+++#include "simdjson.h"++int main(void) {+  simdjson::dom::parser parser;+  simdjson::dom::element tweets;

(And skip the line above.)

lemire

comment created time in 5 hours

Pull request review commentsimdjson/simdjson

Adding an example corresponding to issue 1316

 int main(void) { } ``` +The following is a similar example where one first accesses+an array matching the "statuses" key. The result is expected to be+an array. One select the first value in the array. The result+is expected to be an object. One then selects the vallue corresponding+to the key "id". We expec the value to be a non-negative integer.++```C+++#include "simdjson.h"++int main(void) {+  simdjson::dom::parser parser;+  simdjson::dom::element tweets;

I think you can actually just do auto tweets = parser.load("twitter.json"); and skip these two lines. Uses chaining to make it more compact, possibly easier to read.

lemire

comment created time in 5 hours

push eventRRZE-HPC/OSACA

Julian

commit sha 818b516289c249dcba1a0a8a218eaa2fa619c1b6

Update README.rst

view details

push time in 15 hours

issue openedsimdjson/simdjson

How to parse Complex Nested JSON File

While parsing complex nested JSON file it is difficult to access/search value of the object.

Here is the json file which I'm working on --

{
"statuses": [
{
"metadata": {
"result_type": "recent",
"iso_language_code": "ja"
},
"created_at": "Sun Aug 31 00:29:15 +0000 2014",
"id": 505874924095815700,
"id_str": "505874924095815681",
"text": "@aym0566x \n\n名前:前田あゆみ\n第一印象:なんか怖っ!\n今の印象:とりあえずキモい。噛み合わない\n好きなところ:ぶすでキモいとこyumsparklessparkles\n思い出:んーーー、ありすぎblushheart\nLINE交換できる?:あぁ……ごめんhand\nトプ画をみて:照れますがなkissing_heartsparkles\n一言:お前は一生もんのダチsparkling_heart",
"source": "<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone",
"truncated": false,
"in_reply_to_status_id": null,
"in_reply_to_status_id_str": null,
"in_reply_to_user_id": 866260188,
"in_reply_to_user_id_str": "866260188",
"in_reply_to_screen_name": "aym0566x",
"user": {
"id": 1186275104,
"id_str": "1186275104",
"name": "AYUMI",
"screen_name": "ayuu0123",
"location": "",
"description": "元野球部マネージャー❤︎…最高の夏をありがとう…❤︎",
"url": null,
"entities": {
"description": {
"urls": []
}
}
}
}
]
}

Using the sample code of your's I was able to parse whole json file but while searching for an object and it's value I wasn't able to search it just shows no such object/object value. Please help me with the code as in how and what to do.

Here is the code

#include "simdjson.h"
int main(void) {
  simdjson::dom::parser parser;
  simdjson::dom::element tweets = parser.load("twitter.json");
  std::cout << "Value of ID " << tweets["statuses"]["id"] << std::endl;
}

Here is the output which I'm getting

~/simdjson-master$ c++ strt.cpp simdjson.cpp
~/simdjson-master$ ./a.out 
terminate called after throwing an instance of 'simdjson::simdjson_error'
  what():  The JSON element does not have the requested type.
Aborted (core dumped)

This is the output I'm expecting

~/simdjson-master$ c++ strt.cpp simdjson.cpp
~/simdjson-master$ ./a.out 
Value of ID 505874924095815700

created time in 16 hours

push eventRRZE-HPC/OSACA

JanLJL

commit sha d7e5e12961e95ccb9d2671f7ae735b4a31a6f730

version bump

view details

push time in 16 hours

created tagRRZE-HPC/OSACA

tagv0.3.13

Open Source Architecture Code Analyzer

created time in 16 hours

fork hpuliv/externalsortinginjava

External-Memory Sorting in Java

fork in 19 hours

pull request commentRoaringBitmap/roaring

Remove msgp

Coverage Status

Coverage increased (+4.0%) to 86.821% when pulling 8418a163b18f5e0a4028096e0c64ebfb1277bd47 on AskAlexSharov:remove_msgp into 3700649954e2c3d54aa57cd1b27ef30a2958da2f on RoaringBitmap:master.

AskAlexSharov

comment created time in a day

startedlemire/javaewah

started time in a day

issue openedimneme/pcg-cpp

Have you considered designing a PCG without the constraint that it has to be "challenging" to predict?

Firstly, you should know that I am not "trolling" you, and that I'm writing this because I appreciate your work designing PRNGs (and a lot of your blog posts and similar have been valuable and enjoyable to me). I understand that trivial predictability seems to be your pet peeve. However, as far as I understand, "challenging" predictability comes at some cost for either statistical quality or speed, and you yourself acknowledge that

Algorithmic-complexity attacks aren't a major risk for our algorithms most of the time

So why not give users who don't need "challenging" predictability the option of using a PRNG that is tailored purely for excellent statistical quality, speed and small state size?

Furthermore, and I hope that writing this is a good idea, but I believe that your reasoning for making "challenging" predictability a goal at all is flawed, and your claims regarding the predictability of PCGs are deceptive.

The problem is that if someone needs to worry about algorithmic complexity attacks, PCG is not the right choice for them anyway, rather something like Randen seems to be a proper choice. The reason is that, as you yourself acknowledge, the PCG family is not cryptographically strong, i.e. the "challenging" predictability can not be relied on. In other words, trivial vs challenging predictability isn't a useful distinction. I'm pretty sure you already know that, unsurprisingly, a relatively efficient attack has already been described. Another thing that should be noted is that in your reasoning about the threat model you suppose that an attacker may only choose the easiest target; while in the real world determined and resourceful attackers with specific targets do exist. Such an attacker may even, for example, implement custom hardware to help with efficiently breaking the PRNG at hand.

To reiterate my point, one either worries about attacks on the PRNG, in which case they can use Randen or a CSPRNG; or they don't and they need good statistical quality and probably also speed, in which case they may use a PCG.

I hope you can appreciate my honesty, are not offended and your desire for making a trivially predictable PCG increases.

created time in a day

MemberEvent

startedeonpatapon/gnome-shell-extension-caffeine

started time in a day

issue openedTkTech/pysimdjson

as_buffer() support for Object values.

We now support as_buffer() #59 which drastically improves performance when loading arrays from JSON into numpy.

We should also support as_buffer() for Objects, which would retrieve the values as an array. This is to support calling it like this:

https://github.com/riddell-stan/pystan-next/blob/1e027a8bded88d51c6b957a841b859938c0ac86d/stan/fit.py#L93

For @riddell-stan's use case, this should use less memory and be roughly 4x faster (since we also void the iterator creation for values())

created time in a day

push eventRRZE-HPC/OSACA

JanLJL

commit sha 6bc6349c25d52a0946d2a0058c42dd13fe2f9ecb

fetch version from __init__ file and write uarch in upper case

view details

push time in 2 days

push eventRRZE-HPC/OSACA

JanLJL

commit sha f69b5f88f0b3ad38ca8cc609500530fdb4e1716a

removed false entries

view details

push time in 2 days

push eventRRZE-HPC/OSACA

JanLJL

commit sha 596a323dfbebf570488061f51413d4d8be505f7c

bugfixes

view details

push time in 2 days

pull request commentlemire/fast_float

re #33: 32bit version

Phew. This last part was really hard -- I usually find that iterating with travis setup is very time consuming, and in this case it was compounded by the fact that the s390x and ppc64le images are a bit out of the norm.

But finally it seems this is ready.

Thanks for granting access. Of course, any changes from me will still go through the PR rule. As for the fast_float org, it indeed makes sense if the repo gains traction.

biojppm

comment created time in 2 days

startedlemire/testingRNG

started time in 2 days

issue openedlemire/testingRNG

Double-free in RNG_test

Hi, I was playing around with this and couldn't reproduce the results. Minor issue was that runtests.sh was missing the #!/bin/bash at the beginning, so non-standard shells chokoe on it... but the bigger issue was that often tests aborted with a double-free, while being reported as successful. Example:

> ./runtests.sh                            [11:50:31]
Testing 512GB  of data per run
Note: running the tests longer could expose new failures.
# RUNNING testmitchellmoore Outputting result to  testmitchellmoore.log
Failure!
# RUNNING testmersennetwister Outputting result to  testmersennetwister.log
free(): double free detected in tcache 2
./runtests.sh: line 11: 10915 Broken pipe             ./$t
     10916 Aborted                 | ./RNG_test stdin64 -tlmax $MEM > $filelog
Success!
# RUNNING testxorshift-k4 Outputting result to  testxorshift-k4.log
Failure!
# RUNNING testxorshift-k5 Outputting result to  testxorshift-k5.log
Failure!
# RUNNING testwidynski Outputting result to  testwidynski.log
free(): double free detected in tcache 2
./runtests.sh: line 11: 10935 Broken pipe             ./$t
     10936 Aborted                 | ./RNG_test stdin64 -tlmax $MEM > $filelog
Success!
# RUNNING testaesctr Outputting result to  testaesctr.log
free(): double free detected in tcache 2
./runtests.sh: line 11: 10941 Broken pipe             ./$t
     10942 Aborted                 | ./RNG_test stdin64 -tlmax $MEM > $filelog
Success!
# RUNNING testaesdragontamer Outputting result to  testaesdragontamer.log
free(): double free detected in tcache 2
./runtests.sh: line 11: 10947 Broken pipe             ./$t
     10948 Aborted                 | ./RNG_test stdin64 -tlmax $MEM > $filelog
Success!
# RUNNING testv8xorshift128plus -H Outputting result to  testv8xorshift128plus-H.log
free(): double free detected in tcache 2
./runtests.sh: line 11: 10953 Broken pipe             ./$t
     10954 Aborted                 | ./RNG_test stdin64 -tlmax $MEM > $filelog
Success!
# RUNNING testv8xorshift128plus Outputting result to  testv8xorshift128plus.log
Failure!
# RUNNING testxorshift128plus -H Outputting result to  testxorshift128plus-H.log
free(): double free detected in tcache 2
./runtests.sh: line 11: 10965 Broken pipe             ./$t
     10966 Aborted                 | ./RNG_test stdin64 -tlmax $MEM > $filelog
Success!
# RUNNING testxorshift128plus Outputting result to  testxorshift128plus.log
...

I tracked the double-free down enough to this backtrace:

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f996e51d537 in __GI_abort () at abort.c:79
#2  0x00007f996e5766c8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f996e684e31 "%s\n")
    at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f996e57d9ba in malloc_printerr (str=str@entry=0x7f996e687150 "free(): double free detected in tcache 2")
    at malloc.c:5347
#4  0x00007f996e57efb5 in _int_free (av=0x7f996e6b6b80 <main_arena>, p=0x56554fb6cad0, have_lock=0) at malloc.c:4201
#5  0x000056554dd8bd76 in std::vector<PractRand::TestResult, std::allocator<PractRand::TestResult> >::~vector() ()
#6  0x000056554dd76595 in show_checkpoint(TestManager*, int, unsigned long long, double, bool, double, bool) [clone .cold] ()
#7  0x000056554dd7c609 in main ()

but unfortunately looking at show_checkpoint() didn't show anything obviously wrong and I don't have time to keep digging into this problem.

Thank you for your work!

Platform: Debian 11 Linux, x86_64, gcc

created time in 2 days

startedMananTank/radioactive-state

started time in 2 days

PR opened RoaringBitmap/roaring

Remove msgp

For: https://github.com/RoaringBitmap/roaring/issues/188 https://github.com/RoaringBitmap/roaring/issues/165

Removed also conserz field from roaringArray - because looks like it was used only by msgp

+0 -3606

0 comment

20 changed files

pr created time in 3 days

Pull request review commentlemire/fast_float

Introduces fast path for integers.

 from_chars_result from_chars(const char *first, const char *last,   }   answer.ec = std::errc(); // be optimistic   answer.ptr = pns.lastmatch;-+  // Special fast path for integers.+  if((pns.exponent == 0) && !pns.too_many_digits) {

@lemire maybe there is sense to use "unlikely" here ?

lemire

comment created time in 3 days

push eventsimdjson/simdjson

Paul Dreik

commit sha 68a80045186e14eb33ba0401b55716f57480d4b1

don't memcpy after failed alloc (#1315) Should fix https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=27675

view details

push time in 3 days

PR merged simdjson/simdjson

don't memcpy after failed alloc

If the allocator fails and returns nullptr, the memcpy should not happen.

See ossfuzz https://oss-fuzz.com/testcase-detail/6612227272212480

I don't know what the error strategy for padded_string is, I guess it should be kept noexcept so we can't throw here. Setting the size to zero and data to nullptr seems reasonable, but that looks exactly like a default constructed empty padded_string. Should those be kept separate? As of now, if the allocation fails, the padded string will be empty, and the program will proceed as if the input was empty which may be surprising.

One way to keep the failed alloc case apart from the zero size case would be to let the zero size point to a static buffer. That would make it always be padded, even in case of zero size. A member function could be added to query the padded_string object if it's empty, or failed. That function would check for zero size and then disambiguate on the dataptr being null or pointing to the static buffer.

+5 -0

2 comments

1 changed file

pauldreik

pr closed time in 3 days

pull request commentRoaringBitmap/roaring

Move stream objects and pools into "internal" package

Coverage Status

Coverage remained the same at 82.852% when pulling 5033e34f1adf655ef127ae8be36ce08006515ac6 on AskAlexSharov:pool_to_internal into 3700649954e2c3d54aa57cd1b27ef30a2958da2f on RoaringBitmap:master.

AskAlexSharov

comment created time in 3 days

pull request commentRoaringBitmap/roaring

Move stream objects and pools into "internal" package

Coverage Status

Coverage remained the same at 82.852% when pulling 5033e34f1adf655ef127ae8be36ce08006515ac6 on AskAlexSharov:pool_to_internal into 3700649954e2c3d54aa57cd1b27ef30a2958da2f on RoaringBitmap:master.

AskAlexSharov

comment created time in 3 days

more