BLAKE3 team BLAKE3-team The BLAKE3 cryptographic hash function

official implementations of the BLAKE3 cryptographic hash function

The BLAKE3 paper: specifications, analysis, and design rationale

startedBLAKE3-team/BLAKE3

started time in 10 hours

startedBLAKE3-team/BLAKE3

started time in a day

startedBLAKE3-team/BLAKE3

started time in 2 days

startedBLAKE3-team/BLAKE3

started time in 3 days

fork clayne/BLAKE3

official implementations of the BLAKE3 cryptographic hash function

fork in 4 days

startedBLAKE3-team/BLAKE3

started time in 4 days

startedBLAKE3-team/BLAKE3-specs

started time in 4 days

startedBLAKE3-team/BLAKE3

started time in 4 days

issue closedBLAKE3-team/BLAKE3

When small message (that fits single block) is hashed with Blake3 and 512 bit output is taken, then whole final internal state (v[0..16] vector) is easilly recovered. Therefore initial state and final state is fully known.

Doesn't it simplify preimage or collision attack on unkeyed Blake3-512? Is there enough safety margin?

closed time in 5 days

funny-falcon

issue commentBLAKE3-team/BLAKE3

Looks like it is already discussed in specs, iirc.

funny-falcon

comment created time in 5 days

issue openedBLAKE3-team/BLAKE3

When small message (that fits single block) is hashed with Blake3 and 512 bit output is taken, then whole final internal state (v[0..16] vector) is easilly recovered. Therefore initial state and final state is fully known.

Doesn't it simplify preimage or collision attack on unkeyed Blake3? Is there enough safety margin?

created time in 5 days

startedBLAKE3-team/BLAKE3

started time in 5 days

startedBLAKE3-team/BLAKE3

started time in 5 days

startedBLAKE3-team/BLAKE3

started time in 6 days

release BLAKE3-team/BLAKE3

released time in 6 days

created tagBLAKE3-team/BLAKE3

official implementations of the BLAKE3 cryptographic hash function

created time in 6 days

push eventBLAKE3-team/BLAKE3

commit sha 7d0de7be14789924a4cfdb0e844eef3540237b2b

version 0.3.5 Changes since 0.3.4: - The digest dependency is now v0.9 and the crypto-mac dependency is now v0.8. - Intel CET is supported in the assembly implementations. - b3sum error output includes filepaths again.

push time in 6 days

official implementations of the BLAKE3 cryptographic hash function

fork in 7 days

startedBLAKE3-team/BLAKE3

started time in 7 days

startedBLAKE3-team/BLAKE3

started time in 7 days

startedBLAKE3-team/BLAKE3

started time in 8 days

startedBLAKE3-team/BLAKE3

started time in 9 days

startedBLAKE3-team/BLAKE3

started time in 9 days

startedBLAKE3-team/BLAKE3

started time in 10 days

startedBLAKE3-team/BLAKE3

started time in 11 days

startedBLAKE3-team/BLAKE3

started time in 11 days

startedBLAKE3-team/BLAKE3

started time in 12 days

startedBLAKE3-team/BLAKE3

started time in 12 days

startedBLAKE3-team/BLAKE3-specs

started time in 13 days

startedBLAKE3-team/BLAKE3

started time in 13 days

startedBLAKE3-team/BLAKE3

started time in 13 days

startedBLAKE3-team/BLAKE3

started time in 15 days

startedBLAKE3-team/BLAKE3

started time in 15 days

startedBLAKE3-team/BLAKE3

started time in 16 days

startedBLAKE3-team/BLAKE3

started time in 16 days

startedBLAKE3-team/BLAKE3

started time in 16 days

pull request commentBLAKE3-team/BLAKE3

It looks like we might be at a place where the entire hash function could be implemented with const fn? Here's some similar work: https://www.reddit.com/r/rust/comments/hi11op/announcing_constsha1_a_sha1_implementation_for/

inanna-malick

comment created time in 17 days

push eventBLAKE3-team/BLAKE3

commit sha 2f6f56f3477321c9e4742f1c863e45f2cfcbfb5e

stop being a jerk and add the context string to test_vectors.json

push time in 17 days

startedBLAKE3-team/BLAKE3

started time in 17 days

issue closedBLAKE3-team/BLAKE3

ran b3sum v0.3.4 64bit windows release, on a 143249768448 bytes file, on a laptop with 16GB ram. b3sum allocated over 13GB ram while running, and my ram usage was at 99% (im guessing it allocated as much ram as it possibly could without swapping?), is that intended behavior?

edit: also i didn't pipe it in, i gave the filename as an argument to b3sum

b3sum: b3sum v0.3.4 64bit windows release from https://github.com/BLAKE3-team/BLAKE3/releases/download/0.3.4/b3sum_windows_x64_bin.exe OS: win10 x64 version 1909 CPU: Intel Core i7-8565U, 4 cores, 4 threads (hyperthreading disabled) RAM: 16 GB total, ~13GB available (2x8GB LPDDR3-2133)

closed time in 17 days

divinity76

issue commentBLAKE3-team/BLAKE3

Ok cool, closing as expected behavior. Thanks for the report though, @divinity76. This will be a useful reference for people who see the same thing in the future.

divinity76

comment created time in 17 days

issue commentBLAKE3-team/BLAKE3

This is just how Windows looks like with memory-mapped files. Mapped pages will factor into the working set size, even though they're not "real" memory allocated by the program:

The working set of a process is the set of pages in the virtual address space of the process that are currently resident in physical memory.

So this is not a memory leak or anything, it's just the OS doing as it's expected to. The same happens in the Unices.

divinity76

comment created time in 17 days

pull request commentBLAKE3-team/BLAKE3

The intrinsics code will work for 32-bit x86, whereas the assembly is x86_64-specific. It's also conceivable that future architectures will not like the choices made on the assembly and the intrinsics code, with compilers tuned for those, will perform better. So it's somewhat useful to keep that code around.

k0001

comment created time in 17 days

issue commentBLAKE3-team/BLAKE3

Something like

#if defined(__GNUC__)
#define BLAKE3_ASSUME(cond) do { if(!(cond)) __builtin_unreachable(); } while(0)
#else
#define BLAKE3_ASSUME(cond)
#endif

...

size_t num_cvs = blake3_compress_subtree_wide(...);
BLAKE3_ASSUME(num_cvs <= MAX_SIMD_DEGREE_OR_2);

I've played around a bit with this issue, and what seems to confuse GCC is the recursion in blake3_compress_subtree_wide. Turning the recursion into a loop could conceivably unconfuse GCC, but I don't particularly feel like that's a worthy change.

xnox

comment created time in 17 days

pull request commentBLAKE3-team/BLAKE3

Maybe I'm misunderstanding what you're saying, but the test program does not iterate through AVX etc. Say, for example, you have an ARM chip with NEON and SVE2 extensions: https://gcc.godbolt.org/z/J8i6MV

The program will iterate only through the combination of features NONE, NEON|NONE, SVE2|NONE, SVE|NEON|NONE, and ignore all other values (here NONE = 0 = portable code).

i misunderstood how it works. And that website is really nice to show C code!

xnox

comment created time in 17 days

pull request commentBLAKE3-team/BLAKE3

This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using).

Ok. Even adding a suffix to the file might make that obvious. I.e. Makefile.example will cause people to have to type make -f Makefile.example and just a plain make will stop working.

Separately, if intrinsics implementation are not recommended, and one should use the assembly one.... why are intrinsics ones provided at all? and why is @k0001 using them, instead of assembly?

k0001

comment created time in 17 days

startedBLAKE3-team/BLAKE3

started time in 17 days

pull request commentBLAKE3-team/BLAKE3

This is admittedly confusing, but the Makefile in here is really only for internal testing, not for public consumption. I should probably put a big comment at the top or something. It's mentioned briefly in the README here. The way I see it, we don't really provide any build system, and the docs are supposed to tell you how to compile the files together (using whatever system you're already using).

k0001

comment created time in 18 days

issue commentBLAKE3-team/BLAKE3

It should be possible to be made into compile time assert, no? without any runtime impact? Or rewrite code such that compiler doesn't get triggered about it?

xnox

comment created time in 18 days

pull request commentBLAKE3-team/BLAKE3

Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing.

Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible.

The above does not change anything about picking assembly vs intrinsics, one still has to do that, as both implementations provide the same symbols. It would allow collapsing to just just have two make targets, one for intrinsics and one for assembly.

Also current makefile is very X86 specific, and attempts to compile X86 code on ARM, Power, Z....

MSVC does not use Makefiles, and the most natural thing to provide there would be to provide a visualstudio project files.

Given that there are a number of options to build these files for, maybe a Makefile is not enough. And a configuration tool should be added? I.e. like a meson config file, which will allow to configure, built and test this C code with various options.

Note that meson can generate visual studio & xcode projects too, thus a meson project files might be nice to provide as it will work on any platform correctly. Including using special configure flag for certain files.

k0001

comment created time in 18 days

pull request commentBLAKE3-team/BLAKE3

Looking at this again, I'm mildly against it. A "good citizen" caller will still need to supply file-specific compiler flags to build the intrinsics implementations correctly under MSVC. Getting rid of that step on Unix might be convenient for callers who are certain they'll never need Windows support, but it introduces some inconsistency, and it makes it more likely that callers who only test on GCC end up doing the wrong thing.

Similarly, it sounds like one of the intended benefits of this change is that callers could build the intrinsics implementations for all targets (again assuming MSVC is not a target), side-stepping the issue of selecting assembly for 64-bit vs intrinsics for 32-bit. But this is also something we probably don't want to encourage. The assembly implementations perform better (and more consistently), and we want them to be used as widely as possible.

k0001

comment created time in 18 days

issue commentBLAKE3-team/BLAKE3

This is related to #55 I think.

If adding that assert silences the compiler warning, it might be nice to add it, even if technically adds some runtime cost?

xnox

comment created time in 18 days

issue commentBLAKE3-team/BLAKE3

Hmm that is very suspicious. The blake3 library crate itself does very little allocation. There shouldn't be anything besides whatever memmap does, and whatever the global rayon thread pool does. The b3sum binary does some bookkeeping on top of that, like parsing command line args, but that should also be pretty trivial. I wonder if memory mapping works differently on Windows than on Unix, in some surprising way that could cause this?

Is there any way you can debug what part of the program is consuming memory? I don't know what the usual approach is for Windows tooling, but a memory profile of this would of course be super informative.

One thing I'm curious to try is hashing the same file from stdin (using <). I'll be pretty surprised if the problem persists in that mode. (Though if it doesn't persist, it would still be ambiguous whether the culprit is Rayon or Memmap or something else.)

Another thing I'd be curious to try is keeping memory mapping on, but only using one thread, like this: b3sum --num-threads=1.

divinity76

comment created time in 18 days

startedBLAKE3-team/BLAKE3

started time in 18 days

issue openedBLAKE3-team/BLAKE3

Reddit user /u/gzk11059 gave some very helpful pointers. Based on that thread, here's a first start on running sanitizers against the C implementation.

cd c/blake3_c_rust_bindings

I can confirm that if I add some undefined behavior like this:

int signed_overflow = 2147483647;
signed_overflow++;

and then run the command above, I get a failure like this:

test test::test_compare_reference_impl ... ../blake3.c:11:18: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../blake3.c:11:18 in

created time in 18 days

pull request commentBLAKE3-team/BLAKE3

Maybe I'm misunderstanding what you're saying, but the test program does not iterate through AVX etc. Say, for example, you have an ARM chip with NEON and SVE2 extensions: https://gcc.godbolt.org/z/J8i6MV

The program will iterate only through the combination of features NONE, NEON|NONE, SVE2|NONE, SVE|NEON|NONE, and ignore all other values (here NONE = 0 = portable code).

xnox

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

I would prefer feature bits not to overlap, even across separate architectures, but that's your call.

That's ok, but then can there be delimiter of min features? such that one iterates implementations from min to max, for a given arch? such that e.g. 7 is arm portable, and 8 is arm neon, and the C test program does not iterate through all the avx stuff (and like exercising portable implementation many times).

xnox

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

I'm pushing my work-in-progress code, it fixes a few things, but I think it's still incomplete. Did not finish testing that v6, v7, v8 arm builds and tests correctly across rust & c.

xnox

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

I would prefer feature bits not to overlap, even across separate architectures, but that's your call.

c/blake3_c_rust_bindings/src/lib.rs needs a definition of neon_detected.

xnox

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

On the C side, this seems to work OK on Linux/AArch64. I made the following change to cycle though reference and NEON code when testing:

diff --git a/c/blake3_impl.h b/c/blake3_impl.h
index 98272d4..223fffd 100644
--- a/c/blake3_impl.h
+++ b/c/blake3_impl.h
@@ -73,8 +73,8 @@ enum cpu_feature {
AVX512F = 1 << 5,
AVX512VL = 1 << 6,
#endif
-#if defined(IS_ARMHF)
-  NEON = 1 << 0,
+#if defined(IS_ARM)
+  NEON = 1 << 7,
#endif

The define change is correct. The bump from 0 to 7, is not. As none of the other optimizations are available on ARM, as NEON is the first level of optimisation on.

On ARM, I expect 1<<1 be defined as SVE, 1<<2 as SVE2, etc.

xnox

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

On the C side, this seems to work OK on Linux. I made the following change to cycle though reference and NEON code when testing:

diff --git a/c/blake3_impl.h b/c/blake3_impl.h
index 98272d4..223fffd 100644
--- a/c/blake3_impl.h
+++ b/c/blake3_impl.h
@@ -73,8 +73,8 @@ enum cpu_feature {
AVX512F = 1 << 5,
AVX512VL = 1 << 6,
#endif
-#if defined(IS_ARMHF)
-  NEON = 1 << 0,
+#if defined(IS_ARM)
+  NEON = 1 << 7,
#endif
/* ... */
UNDEFINED = 1 << 30

I can't comment on the Rust side.

xnox

comment created time in 19 days

issue closedBLAKE3-team/BLAKE3

Intel Control-flow Enforcement Technology (CET):

https://software.intel.com/en-us/articles/intel-sdm

Using CET enabled toolchain (ie. Ubuntu Focal or Groovy), CET is enabled for intrinsic implementation, but not the assembly.

$make test$ readelf -a blake3 | grep IBT
Properties: x86 feature: IBT, SHSTK
$make clean$ make test_asm
$readelf -a blake3 | grep IBT$
# no output, thus no CET support

I suspect that assembly files must include cet.h header and have endbranch declaration similar to e.g. other asm implementations in other software.

endbranch declarations needed & the gnu.property: https://github.com/openssl/openssl/pull/12272/files

The gnu.property can also be added by including a cet header: https://github.com/gpg/libgcrypt/commit/4c88c2bd2a418435506325cd53246acaaa52750c

closed time in 19 days

xnox

pull request commentBLAKE3-team/BLAKE3

Thank you @hjl-tools for your input! Much appreciated!

sneves

comment created time in 19 days

push eventBLAKE3-team/BLAKE3

commit sha a3ec6c1ccfe613cca886f6bff5feb0ec9c3710d9

enable CET on asm

commit sha f2005678f84a8222be69c54c3d5457c6c40e87d2

Merge pull request #96 from BLAKE3-team/cet Assembly: enable CET

push time in 19 days

PR merged BLAKE3-team/BLAKE3

With our current dispatcher, both on C and Rust, having endbr64 at the top of the functions is rather pointless since these will never be at the end of an indirect call. But since this is a glorified nop, might as well.

+25 -1

3 changed files

sneves

pr closed time in 19 days

pull request commentBLAKE3-team/BLAKE3

Most of CET fixes have been backported to LLVM 10.x, except for <cet.h>. You can do

#if defined(__ELF__) && defined(__CET__) && __has_include(<cet.h>)
# include <cet.h>
#else
...
#endif

Goes in as suggested.

sneves

comment created time in 19 days

push eventBLAKE3-team/BLAKE3

commit sha a3ec6c1ccfe613cca886f6bff5feb0ec9c3710d9

enable CET on asm

push time in 19 days

pull request commentBLAKE3-team/BLAKE3

The reason I did not just include the cet.h header is that it appears to be GCC-specific; building with Clang would fail, even though it supports -fcf-protection=full.

LLVM doesn't working CET support before LLVM 11 which also includes <cet.h>:

https://bugs.llvm.org/show_bug.cgi?id=45484

Most of CET fixes have been backported to LLVM 10.x, except for <cet.h>. You can do

#if defined(__ELF__) && defined(__CET__) && __has_include(<cet.h>)
# include <cet.h>
#else
...
#endif
sneves

comment created time in 19 days

push eventBLAKE3-team/BLAKE3

commit sha 881ffe398faef8c9b9dfb47738cca5e551d24b40

put endbr64 behind a macro

push time in 19 days

pull request commentBLAKE3-team/BLAKE3

The reason I did not just include the cet.h header is that it appears to be GCC-specific; building with Clang would fail, even though it supports -fcf-protection=full.

sneves

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

Since the assembly codes are in .S, you should do

#if defined(__ELF__) && defined(__CET__)
# include <cet.h>
#else
# define _CET_ENDBR
#endif

and use _CET_ENDBR everywhere ENDBR is needed. Since -fcf-protection needs binutils with CET support, it should work.

sneves

comment created time in 19 days

pull request commentBLAKE3-team/BLAKE3

Maybe @hjl-tools could review this too. This adds CET support to BLAKE3 asm.

sneves

comment created time in 19 days

delete branch BLAKE3-team/BLAKE3

delete branch : shrink_buf

delete time in 20 days

push eventBLAKE3-team/BLAKE3

commit sha c908847c3ff484f562226a4787bcb99778ca04c1

shrink a stack array that's twice as big as it needs to be It looks like I originally made this mistake when I was copying code from the baokeshed prototype (a274a9b0faa444dd842a0584483eae6e97dbf21e), and then it got replicated into the C implementation later.

push time in 20 days

create barnchBLAKE3-team/BLAKE3

created branch time in 20 days

pull request commentBLAKE3-team/BLAKE3

I have no idea what this means. Can I leave it to you to hit the merge button whenever you think it's ready? :)

I find this presentation the best explanation about the CET technology https://www.linuxplumbersconf.org/event/2/contributions/147/attachments/72/83/CET-LPC-2018.pdf

sneves

comment created time in 20 days

pull request commentBLAKE3-team/BLAKE3

Looks good to me! The test C binary is marked as CET enabled.

sneves

comment created time in 20 days

issue commentBLAKE3-team/BLAKE3

IS_ARM is defined as ARM with hardware floating point v7, or 64-bit ARM v8. Becuase for it we may have optimized NEON code path.

Thus ARM with software floating point v6 and lower do not get IS_ARM definition, and only access portable implementation.

I'm not sure if I should rename IS_ARM to something better, like IS_SIMD_ARM.

xnox

comment created time in 20 days

issue commentBLAKE3-team/BLAKE3

According to the error messages out_array is uint8_t[32]. But your PR #91 makes MAX_SIMD_DEGREE unconditionally 4 on ARM, so shouldn't MAX_SIMD_DEGREE_OR_2 be 4 instead of 2, and that array be uint8_t[64]?

Either way, this warning happens when MAX_SIMD_DEGREE is 1 or 2, and I believe it's a false positive. When MAX_SIMD_DEGREE is that low, the function is never called:

while (num_cvs > 2) {
num_cvs =
compress_parents_parallel(cv_array, num_cvs, key, flags, out_array);
memcpy(cv_array, out_array, num_cvs * BLAKE3_OUT_LEN);
}
xnox

comment created time in 20 days

pull request commentBLAKE3-team/BLAKE3

I have no idea what this means. Can I leave it to you to hit the merge button whenever you think it's ready? :)

sneves

comment created time in 20 days

PR opened BLAKE3-team/BLAKE3

With our current dispatcher, both on C and Rust, having endbr64 at the top of the functions is rather pointless since these will never be at the end of an indirect call. But since this is a glorified nop, might as well.

+64 -0

0 comment

3 changed files

pr created time in 20 days

create barnchBLAKE3-team/BLAKE3

created branch time in 20 days

startedBLAKE3-team/BLAKE3

started time in 20 days

startedBLAKE3-team/BLAKE3

started time in 20 days

issue openedBLAKE3-team/BLAKE3

Intel Control-flow Enforcement Technology (CET):

https://software.intel.com/en-us/articles/intel-sdm

Using CET enabled toolchain (ie. Ubuntu Focal or Groovy), CET is enabled for intrinsic implementation, but not the assembly.

$make test$ readelf -a blake3 | grep IBT
Properties: x86 feature: IBT, SHSTK
$make clean$ make test_asm
$readelf -a blake3 | grep IBT$
# no output, thus no CET support

I suspect that assembly files must include cet.h header and have endbranch declaration similar to e.g. other asm implementations in other software.

endbranch declarations needed & the gnu.property: https://github.com/openssl/openssl/pull/12272/files

The gnu.property can also be added by including a cet header: https://github.com/gpg/libgcrypt/commit/4c88c2bd2a418435506325cd53246acaaa52750c

created time in 21 days

Pull request review commentBLAKE3-team/BLAKE3

TARGETS= ASM_TARGETS= EXTRAFLAGS= +ifeq (ifunc,\$(BLAKE3_DISPATCH))

This build variant must be added to the Github actions test cases.

xnox

comment created time in 21 days

issue openedBLAKE3-team/BLAKE3

Compiling for 32bit arm v6 target without ability for NEON support, emits warnings:

arm-linux-gnueabi-gcc -O3 -Wall -Wextra -std=c11 -pedantic -DBLAKE3_TESTING -fsanitize=address,undefined  blake3.c blake3_dispatch.c blake3_portable.c main.c blake3_neon.o -o blake3
In file included from /usr/arm-linux-gnueabi/include/string.h:495,
from blake3.c:3:
In function ‘memcpy’,
inlined from ‘compress_parents_parallel’ at blake3.c:236:5,
inlined from ‘compress_subtree_to_parent_node’ at blake3.c:348:9,
inlined from ‘blake3_hasher_update.part.0’ at blake3.c:527:7:
/usr/arm-linux-gnueabi/include/bits/string_fortified.h:34:10: warning: ‘__builtin_memcpy’ forming offset [33, 64] is out of the bounds [0, 32] of object ‘out_array’ with type ‘uint8_t[32]’ {aka ‘unsigned char[32]’} [-Warray-bounds]
34 |   return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
|          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: In function ‘blake3_hasher_update.part.0’:
blake3.c:345:11: note: ‘out_array’ declared here
345 |   uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
|           ^~~~~~~~~
In file included from /usr/arm-linux-gnueabi/include/string.h:495,
from blake3.c:3:
In function ‘memcpy’,
inlined from ‘compress_subtree_to_parent_node’ at blake3.c:349:5,
inlined from ‘blake3_hasher_update.part.0’ at blake3.c:527:7:
/usr/arm-linux-gnueabi/include/bits/string_fortified.h:34:10: warning: ‘__builtin___memcpy_chk’ forming offset [33, 64] is out of the bounds [0, 32] of object ‘out_array’ with type ‘uint8_t[32]’ {aka ‘unsigned char[32]’} [-Warray-bounds]
34 |   return __builtin___memcpy_chk (__dest, __src, __len, __bos0 (__dest));
|          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
blake3.c: In function ‘blake3_hasher_update.part.0’:
blake3.c:345:11: note: ‘out_array’ declared here
345 |   uint8_t out_array[MAX_SIMD_DEGREE_OR_2 * BLAKE3_OUT_LEN / 2];
|           ^~~~~~~~~

Is the out_array missized in such circumstances?

created time in 21 days

Pull request review commentBLAKE3-team/BLAKE3-specs

\subsection{Modes}\label{sec:modes} the first stage, and the \flag{DERIVE_KEY_MATERIAL} flag set for every compression. +Developers adapting BLAKE3 to ASN.1-based message formats should use+the Algorithm Identifier blake3 with OID identifier+1.3.6.1.4.1.1722.12.3.8 for all modes and 256-bit default output size.+

@oconnor663 based on discussion here, and the https://github.com/BLAKE3-team/BLAKE3/issues/68 I've updated this patch.

Moved the notice to the Modes section. Specifying that the same OID applies to all modes, for the default output size of 256-bit. Which matches the description of the previous sections which define the default output size.

xnox

comment created time in 21 days

issue commentBLAKE3-team/BLAKE3

I'm unfamiliar with how OIDs work in general, so I might be misunderstanding things. It looks like 1.3.6.1.4.1.1722.12.2.3 is registered as blake3 and 1.3.6.1.4.1.1722.12.2.3.8 (one extra number on the end) is registered to blake3-256. I guess I was hoping that only the former would exist? But if that clashes with how other hashes are defined, I suppose it's really not a big deal. Also am I seeing correctly that no OIDs are defined for any hash other than BLAKE2 and BLAKE3? Maybe this system is less widely used than I assumed from my initial impression?

re 256 => blake produces arbitrary length output, but in keyed_hash mode requires a 256-bit key. Hence 3.8 kind of makes sence as a catch all for all three modes of blake3, no?

dlegaultbbry

comment created time in 21 days

issue commentBLAKE3-team/BLAKE3

It seems there is mish mash of OIDs and we can do whatever, i.e.:

• SHA1 http://oid-info.com/get/1.3.14.3.2.26
• SHA256 http://oid-info.com/get/2.16.840.1.101.3.4.2.1
• SHA3 http://oid-info.com/get/2.16.840.1.101.3.4.2.9

So yeah, we can drop 1.3.6.1.4.1.1722.12.2.3.8 and just have 1.3.6.1.4.1.1722.12.2.3 as "blake3" hash algorithm.

But I now wonder if we should allocate OID for every mode of BLAK3 and if more modes might be added later (unlikely). Cause 1.3.6.1.4.1.1722.12.2.3 means hash(input) mode right? What are the OIDs for keyed_hash and derive_key modes? Just use the same ones? or have like 3.1, 3.2, 3.3 for them? Or move the keyed_hash to be under hmacs somewhere under http://oid-info.com/get/1.3.6.1.4.1.1722.12.3?

dlegaultbbry

comment created time in 21 days

issue commentBLAKE3-team/BLAKE3

I'm unfamiliar with how OIDs work in general, so I might be misunderstanding things. It looks like 1.3.6.1.4.1.1722.12.2.3 is registered as blake3 and 1.3.6.1.4.1.1722.12.2.3.8 (one extra number on the end) is registered to blake3-256. I guess I was hoping that only the former would exist? But if that clashes with how other hashes are defined, I suppose it's really not a big deal. Also am I seeing correctly that no OIDs are defined for any hash other than BLAKE2 and BLAKE3? Maybe this system is less widely used than I assumed from my initial impression?

There are OIDs for all hash algos under the sun. You are looking at only those registered by a single private company with 1722. All OIDs are scoped by company IDs.

dlegaultbbry

comment created time in 21 days

issue commentBLAKE3-team/BLAKE3

I'm unfamiliar with how OIDs work in general, so I might be misunderstanding things. It looks like 1.3.6.1.4.1.1722.12.2.3 is registered as blake3 and 1.3.6.1.4.1.1722.12.2.3.8 (one extra number on the end) is registered to blake3-256. I guess I was hoping that only the former would exist? But if that clashes with how other hashes are defined, I suppose it's really not a big deal. Also am I seeing correctly that no OIDs are defined for any hash other than BLAKE2 and BLAKE3? Maybe this system is less widely used than I assumed from my initial impression?

dlegaultbbry

comment created time in 21 days

push eventBLAKE3-team/BLAKE3

commit sha e0f193ddc9c0400262dae4118b0660ae8d70586e

put the file name in b3sum error output This was previously there, but got dropped in c5c07bb337d0af7522666d05308aaf24eef3709c.

push time in 22 days

pull request commentBLAKE3-team/BLAKE3-specs

The same OID can be used for both keyed and unkeyed hashing since in the latter case the key simply has zero length.

That doesn't sound right to me. The key is essentially the first 8 of 16 initial state words. In the unkeyed (default) mode, it's set to a constant. But its length never changes; it's always 8 words / 32 bytes / 256 bits.

I am re-reading the modes section. It sounds like unlike BLAKE2, the modes are different and have different flags, and I guess should have different OID for each mode? aka 3.3.8 for HMAC? (where the first 3 is MacAlgs) and something else for key derivation mode.

I think at the very least I should drop that sentance, and ensure that 2.3.8 refers to just the blake3 hash mode.

xnox

comment created time in 22 days

issue commentBLAKE3-team/BLAKE3

Apologies for not chiming in on this sooner. There isn't a lot of established practice on this yet, but I've argued for avoiding labels like blake3-256, in favor of just calling it blake3 as much as possible. I wrote out a longer version of my reasoning here: BLAKE3-team/BLAKE3-specs#3. Here's a shorter version:

• "BLAKE3-128" is not domain-separated from "BLAKE3-256". The former is just a truncated version of the latter. This is in contrast to BLAKE2, where shorter output lengths are domain-separated.
• "BLAKE3-512" offers no additional security over "BLAKE3-256". This is in contrast to BLAKE2b-512, which does offer more security than BLAKE2b-256. (Note that BLAKE3 is derived from BLAKE2s, and there is no such thing as BLAKE2s-512.)

The vast majority of applications using BLAKE3 should stick to the default output length. I think truncated BLAKE3 should be about as common as truncated MD5. Extended outputs are useful in niche applications, like deriving long or oddly-sized keys for uncommon algorithms, but again not something most applications should think about using. (And because different output lengths aren't domain separated, there's no real sense in which different lengths represent different hash functions.)

As far as I can tell, sets of hash functions with multiple labeled sizes are only as common as they are because of old government standards. BLAKE3 tries as much as possible to remove unnecessary choices. We've avoided concerning the caller with details like the word size or the parallelism degree, and it would be nice not to concern them with output size either.

Cool, so it's blake3. But that doesn't change the oid numbers right? cause it's just 1.3.6.1.4.1.1722.12.2.3.8 no? and that's the only id registered so far, which i guess should be refer to as just BLAKE3. Or do you dislike that OID number too somehow? I think we do need to specify a child number of .3 and .8 is just as good as any other number.

I guess i should update https://github.com/BLAKE3-team/BLAKE3-specs/pull/4/files to not mention -256, and just call it blake3 there too.

dlegaultbbry

comment created time in 22 days

startedBLAKE3-team/BLAKE3

started time in 22 days

issue commentBLAKE3-team/BLAKE3

Are you sure b3sum actually allocated that space?

looks like it (altho, given what you said, i guess it's just the OS allocating it on behalf of b3sum for memory mapping?) this is the system before starting b3sum:

and this is after b3sum has started:

Does it all go away when b3sum exits, or does utilization remain high?

yes it goes away, this is after b3sum has exited:

divinity76

comment created time in 22 days

issue commentBLAKE3-team/BLAKE3

Apologies for not chiming in on this sooner. There isn't a lot of established practice on this yet, but I've argued for avoiding labels like blake3-256, in favor of just calling it blake3 as much as possible. I wrote out a longer version of my reasoning here: https://github.com/BLAKE3-team/BLAKE3-specs/issues/3. Here's a shorter version:

• "BLAKE3-128" is not domain-separated from "BLAKE3-256". The former is just a truncated version of the latter. This is in contrast to BLAKE2, where shorter output lengths were domain-separated.
• "BLAKE3-512" offers no additional security over "BLAKE3-256". This is in contrast to BLAKE2b-512, which does offer more security than BLAKE2b-256. (Note that BLAKE3 is derived from BLAKE2s, and there is no such thing as BLAKE2s-512.)

The vast majority of applications using BLAKE3 should stick to the default output length. I think truncated BLAKE3 should be about as common as truncated MD5. Extended outputs are useful in niche applications, like deriving very long or oddly-sized keys for uncommon algorithms, but again not something most applications should think about using. (And because different output lengths aren't domain separated, there's no real sense in which different lengths represent different algorithms.)

As far as I can tell, hash algorithm sets with multiple labeled sizes are only as common as they are because of old government standards. BLAKE3 tries as much as possible to remove unnecessary choices. We've avoided concerning the caller with details like the word size or the parallelism degree, and it would be nice not to concern them with output size either.

dlegaultbbry

comment created time in 22 days

pull request commentBLAKE3-team/BLAKE3-specs

The same OID can be used for both keyed and unkeyed hashing since in the latter case the key simply has zero length.

That doesn't sound right to me. The key is essentially the first 8 of 16 initial state words. In the unkeyed (default) mode, it's set to a constant. But its length never changes; it's always 8 words / 32 bytes / 256 bits.

xnox

comment created time in 22 days

issue commentBLAKE3-team/BLAKE3

Are you sure b3sum actually allocated that space? Does it all go away when b3sum exits, or does utilization remain high? What b3sum is supposed to be doing is memory mapping the file, so the kernel should be using as much memory as possible to cache file pages. On Linux, I think that shows up as a lot of memory "in use" but still "available", which can be a source of confusion. It can look like the machine is low on RAM, when in fact the kernel is able to reclaim those available pages whenever it needs to. However, I don't know how these things get reported on Windows, so I don't know the right questions to ask here.

divinity76

comment created time in 22 days

more