profile
viewpoint

marjakh/ecma262 0

Status, process, and documents for ECMA-262

marjakh/jquery 0

jQuery JavaScript Library

marjakh/node 0

Node.js JavaScript runtime :sparkles::turtle::rocket::sparkles:

marjakh/proposal-promise-any 0

ECMAScript proposal: Promise.any

marjakh/SillyLittleCompiler 0

A silly little hobby project for learning more about compilers.

marjakh/test262 0

Official ECMAScript Conformance Test Suite

marjakh/v8.dev 0

The source code of v8.dev, the official website of the V8 project.

delete branch marjakh/test262

delete branch : remove-duplicate-promise-all-tests

delete time in 8 days

issue openedtc39/test262

Missing tests for Promise combinators

I started looking into which tests exist for Promise.all/allSettled/any/race, and discovered a bunch of missing tests (e.g., a test for Promise.allSettled being there and the corresponding test for Promise.any not being there).

See the (incomplete) table here: https://docs.google.com/document/d/1lrDTH6UtVj4a_SZcXKBSgknL-wOtDXC5mLQFVcjmgSE/edit?usp=sharing

created time in 8 days

PR opened tc39/test262

Remote duplicate Promise.all tests
  1. Promise/all/S25.4.4.1_A6.1_T1.js is the same as Promise/all/S25.4.4.1_A2.1_T1.js

  2. Promise/all/S25.4.4.1_A6.1_T2.js is covered by Promise/all/S25.4.4.1_A2.3_T1.js and Promise/all/S25.4.4.1_A2.3_T2.js

+0 -39

0 comment

2 changed files

pr created time in 9 days

create barnchmarjakh/test262

branch : remove-duplicate-promise-all-tests

created branch time in 9 days

fork marjakh/test262

Official ECMAScript Conformance Test Suite

fork in 9 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-08+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify the grammar for syntactic constructs which look ambiguous at first.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Finite lookaheads++Typically, parsers decide which production to use based on a finite lookahead (a fixed amount of following tokens).++In some cases, the next token determines the production to use unambiguously. [For example](https://tc39.es/ecma262/#prod-UpdateExpression):++```grammar+UpdateExpression :+  LeftHandSideExpression+  LeftHandSideExpression +++  LeftHandSideExpression --+  ++ UnaryExpression+  -- UnaryExpression+```++If we're parsing an `UpdateExpression` and the next token is `++` or `--`, we know the production to use right away. If the next token is neither, it's still not too bad: we can parse a `LeftHandSideExpression` starting from the position we're at, and figure out what to do after we've parsed it. If the token following the `LeftHandSideExpression` is `++`, the production to use is `UpdateExpression : LeftHandSideExpression ++`. The case for `--` is similar. And if the token following the `LeftHandSideExpression` is neither `++` nor `--`, we use the production `UpdateExpression : LeftHandSideExpression`.++### Arrow function parameter list or a parenthesized expression?++Distinguishing arrow function parameter lists from parenthesized expressions is more complicated.++For example:++```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression, like this?++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++Let's imagine for a moment that we had the following straightforward productions:++```grammar+AssignmentExpression :+  ...+  ArrowFunction+  ParenthesizedExpression++ArrowFunction :+  ArrowParameterList => ConciseBody+```++Now we can't choose the production to use with a finite lookahead. If we had to parse a `AssignmentExpression` and the next token was `(`, how would we decide what to parse next? We could either parse an `ArrowParameterList` or a `ParenthesizedExpression`, but our guess could go wrong.++### The very permissive new symbol: `CPEAAPL`++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+  ( Expression )+  ( Expression , )+  ( )+  ( ... BindingIdentifier )+  ( ... BindingPattern )+  ( Expression , ... BindingIdentifier )+  ( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value. Numbers and other `PrimaryExpressions` which are not valid parameter names (or parameter destructuring patterns) can only occur in `ParenthesizedExpression`. But they all can occur inside a `CPEAAPL`.++### Using `CPEAAPL` in productions++Now we can use the very permissive `CPEAAPL` in [`AssignmentExpression` productions](https://tc39.es/ecma262/#prod-AssignmentExpression):++```grammar+AssignmentExpression :+  ConditionalExpression

Moved the "note" explaining it up:

(ConditionalExpression leads to PrimaryExpression via a long production chain.)

marjakh

comment created time in 9 days

push eventv8/v8.dev

Marja Hölttä

commit sha 802ad86f6288187baa22f5adf4d780e11f844f63

review

view details

push time in 9 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

 In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves wi  ## Cover grammars -In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar rules for syntactic constructs which look ambiguous at first.+In this episode, we take a deeper look into *cover grammars*. They are a way to specify the grammar for syntactic constructs which look ambiguous at first.  Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage. -## Parenthesized expression or an arrow parameter list?+## Finite lookaheads -Typically, parsers decide which grammar production to follow based on finite lookahead (a fixed amount of following tokens).+Typically, parsers decide which production to use based on a finite lookahead (a fixed amount of following tokens).++In some cases, the next token determines the production to use unambiguously. [For example](https://tc39.es/ecma262/#prod-UpdateExpression):++```grammar+UpdateExpression :+  LeftHandSideExpression+  LeftHandSideExpression +++  LeftHandSideExpression --+  ++ UnaryExpression+  -- UnaryExpression+```++If we're parsing an `UpdateExpression` and the next token is `++` or `--`, we know the production to use right away. If the next token is neither, it's still not too bad: we can parse a `LeftHandSideExpression` starting from the position we're at, and figure out what to do after we've parsed it. If the token following the `LeftHandSideExpression` is `++`, the production to use is `UpdateExpression : LeftHandSideExpression ++`. The case for `--` is similar. And if the token following the `LeftHandSideExpression` is neither `++` nor `--`, we use the production `UpdateExpression : LeftHandSideExpression`.

I was deliberately leaving it out, afaics it would be inaccurate to say it's LR(1), it's rather "LR(1) with some reasonable extensions" and I don't want to go in the details there.

marjakh

comment created time in 13 days

push eventv8/v8.dev

Marja Hölttä

commit sha bbb71f1e25673c443e140f391f7f235486307348

fix lint

view details

push time in 14 days

pull request commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

I uploaded a new version, PTAL.

marjakh

comment created time in 14 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.

Added lookahead examples.

marjakh

comment created time in 14 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression:++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction++PrimaryExpression :+...+ParenthesizedExpression++ArrowFunction :+ArrowParameterList => ConciseBody+```++We couldn't choose the correct production with limited lookahead. Imagine we had to parse a `AssignmentExpression` and the next token is `(`. How would we decide what to parse next? We could either parse an `ParenthesizedExpression` or an `ArrowParameterList`, but our guess could go wrong.++### The very permissive new symbol: CPEAAPL++We'd like to specify the grammar in such a way that it's possible to parse JavaScript according to it with limited lookahead.++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+( Expression )+( Expression , )+( )+(... BindingIdentifier )+(... BindingPattern )+( Expression , ... BindingIdentifier )+( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value.++### Using CPEAAPL in grammar rules++Now we can use the very permissive `CPEAAPL` in grammar productions:++```grammar+AssignmentExpression :+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction+...++ArrowFunction :+ArrowParameters => ConciseBody++ArrowParameters :+BindingIdentifier+CPEAAPL++PrimaryExpression :+...+CPEAAPL++```++Imagine we're again in the situation that we need to parse an `AssignmentExpression` and the next token is `(`. Now we can just decide to parse a `CPEAAPL` and figure out later what it actually is. It doesn't matter whether we're parsing an `ArrowFunction` or a `ParenthesizedExpression`, the next symbol to parse is `CPEAAPL` in any case!++After we've parsed the `CPEAAPL`, we can decide whether the original `AssignmentExpression` is an `ArrowFunction` or a `ParenthesizedExpression` based on the token following the `CPEAAPL`.

I think this is resolved based on the offline discussion.

marjakh

comment created time in 14 days

push eventv8/v8.dev

Marja Hölttä

commit sha cbeb1b8a4654fa984390127bee72cad07a2b376f

more

view details

push time in 14 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.

... if I'd like to have an example of a finite positive lookahead, UnaryExpression would be it. But I'm not sure if explaining that brings anything at this point. A more interesting concept would be the "we're not yet know which production to take, but we know the next symbol, so let's parse the next symbol and decide after that" which occurs e.g., for CallExpression, Expression, etc. Not sure if adding that would be beneficial either.

marjakh

comment created time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.

You mean the negative lookahead? I wouldn't want to confuse this blog post with negative lookaheads, as that's not related to cover grammars.

marjakh

comment created time in 19 days

push eventv8/v8.dev

Marja Hölttä

commit sha 751c40e87d9a0bbdb4a3046cd25902dfb1cfb778

review rreverser

view details

push time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression, like this?++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)

Added spaces, also took the "comment inside grammar" out of the grammar element, so now we have 2 grammar elements here, but prob that's the best we can do.

marjakh

comment created time in 19 days

push eventv8/v8.dev

Marja Hölttä

commit sha c4b59b4d94b106f5851025ef6e2b2068f8a5deb1

more

view details

push time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.

How about this: "In this episode, we take a deeper look into cover grammars. They are a way to specify grammar rules for syntactic constructs which look ambiguous based on a finite lookahead."

marjakh

comment created time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression:++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction++PrimaryExpression :+...+ParenthesizedExpression++ArrowFunction :+ArrowParameterList => ConciseBody+```++We couldn't choose the correct production with limited lookahead. Imagine we had to parse a `AssignmentExpression` and the next token is `(`. How would we decide what to parse next? We could either parse an `ParenthesizedExpression` or an `ArrowParameterList`, but our guess could go wrong.++### The very permissive new symbol: CPEAAPL++We'd like to specify the grammar in such a way that it's possible to parse JavaScript according to it with limited lookahead.++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+( Expression )+( Expression , )+( )+(... BindingIdentifier )+(... BindingPattern )+( Expression , ... BindingIdentifier )+( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value.++### Using CPEAAPL in grammar rules++Now we can use the very permissive `CPEAAPL` in grammar productions:++```grammar+AssignmentExpression :+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction+...++ArrowFunction :+ArrowParameters => ConciseBody++ArrowParameters :+BindingIdentifier+CPEAAPL++PrimaryExpression :+...+CPEAAPL++```++Imagine we're again in the situation that we need to parse an `AssignmentExpression` and the next token is `(`. Now we can just decide to parse a `CPEAAPL` and figure out later what it actually is. It doesn't matter whether we're parsing an `ArrowFunction` or a `ParenthesizedExpression`, the next symbol to parse is `CPEAAPL` in any case!++After we've parsed the `CPEAAPL`, we can decide whether the original `AssignmentExpression` is an `ArrowFunction` or a `ParenthesizedExpression` based on the token following the `CPEAAPL`.++```javascript+let x = (a, b) => { return a + b; };+//      ^^^^^^+//     CPEAAPL+//             ^^+//             The token following the CPEAAPL++let x = (a, 3);+//      ^^^^^^+//     CPEAAPL+//            ^+//            The token following the CPEAAPL+```++### Restricting CPEAAPLs++As we saw before, the grammar productions for `CPEAAPL` are very permissive and allow constructs (such as `(1, ...a)`) which are never valid. Once we know whether we were parsing an `ArrowFunction` or `ParenthesizedExpression`, we need to disallow the corresponding illegal constructs.++The spec does this by adding the following restrictions:++:::ecmascript-algorithm+> [Static Semantics: Early Errors](https://tc39.es/ecma262/#sec-grouping-operator-static-semantics-early-errors)+>+> `PrimaryExpression : CPEAAPL`+>+> It is a Syntax Error if `CPEAAPL` is not _covering_ a `ParenthesizedExpression`.++:::ecmascript-algorithm+> [Supplemental Syntax](https://tc39.es/ecma262/#sec-primary-expression)+>+> When processing an instance of the production+>+> `PrimaryExpression : CPEAAPL`+>+> the interpretation of the `CPEAAPL` is refined using the following grammar:+>+> `ParenthesizedExpression : ( Expression )`++This means: if we try to use a `CPEAAPL` as a `PrimaryExpression`, it is actually an `ParenthesizedExpression` and this is its only valid production.++`Expression` can never be empty, so `( )` is not a valid `ParenthesizedExpression`. Comma separated lists like `(1, 2, 3)` are created by [the comma operator](https://tc39.es/ecma262/#sec-comma-operator):++```grammar+Expression :+AssignmentExpression+Expression , AssignmentExpression+```++Similarly, if we try to use a `CPEAAPL` as an `ArrowParameters`, the following restrictions apply:++:::ecmascript-algorithm+> [Static Semantics: Early Errors](https://tc39.es/ecma262/#sec-arrow-function-definitions-static-semantics-early-errors)+>+> `ArrowParameters : CPEAAPL`+>+> It is a Syntax Error if `CPEAAPL` is not covering an `ArrowFormalParameters`.++:::ecmascript-algorithm+> [Supplemental Syntax](https://tc39.es/ecma262/#sec-arrow-function-definitions)+>+> When the production+>+> `ArrowParameters` : `CPEAAPL`+>+> is recognized the following grammar is used to refine the interpretation of `CPEAAPL`:+>+> `ArrowFormalParameters :`+> `( UniqueFormalParameters )`++### Other CPEAAPL restrictions++There are also additional rules related to `CPEAAPL`s. For example:++:::ecmascript-algorithm+> [Static Semantics: Early Errors](https://tc39.es/ecma262/#sec-delete-operator-static-semantics-early-errors)+>+> `UnaryExpression: delete UnaryExpression`+>+> - It is a Syntax Error if the `UnaryExpression` is contained in strict mode code and the derived `UnaryExpression` is `PrimaryExpression : IdentifierReference`.+> - It is a Syntax Error if the derived `UnaryExpression` is+> `PrimaryExpression : CPEAAPL`+> and `CPEAAPL` ultimately derives a phrase that, if used in place of `UnaryExpression`, would produce a  Syntax Error according to these rules. This rule is recursively applied.++The first rule forbids `delete IdentifierReference` (for example, `delete foo`) in strict mode. The second rule forbids `CPEAAPL`s which would ultimately produce an `IdentifierReference`, such as `delete (foo)`, `delete ((foo))` and so on.++### Other cover grammars++In addition to `CPEAAPL`, the spec uses gover grammars for other ambiguous-looking constructs.++`ObjectLiteral` is used as a cover grammar for `ObjectAssignmentPattern` which occurs inside arrow function parameter lists. This means `ObjectLiteral` allows constructs which cannot occur inside actual object literals.++```grammar+ObjectLiteral :+...+{ PropertyDefinitionList }++PropertyDefinition :+...+CoverInitializedName++CoverInitializedName :+IdentifierReference Initializer++Initializer :+= AssignmentExpression+```++```javascript+let o = { a = 1 }; // syntax error++// Arrow function with a destructuring parameter with a default value:+let f = ({ a = 1 }) => { return a; };+f({}); // returns 1+f({a : 6}); // returns 6+```+Async arrow functions also look ambiguous with limited lookahead:++```javascript+let x = async(a,+```++Is this a call to a function called `async` or an async arrow function?++```javascript+let x1 = async(a, b);+let x2 = async();+function async() { }++let x3 = async(a, b) => {};+let x4 = async();+```++To this end, the grammar defines a cover grammar symbol `CoverCallExpressionAndAsyncArrowHead` which works similarly to `CPEAAPL`.++## Summary++In this episode we looked into how the spec defines the grammar in such a way that implementing a finite lookahead parser based on it is straightforward.

Modified:

"In this episode we looked into how the spec defines a concise grammar for cases where we cannot identify the current syntactic construct based on a finite lookahead.

In particular, we looked into how the spec uses a cover grammars for first parsing ambiguous-looking constructs permissively and restricting them with static semantic rules later."

marjakh

comment created time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression:++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction++PrimaryExpression :+...+ParenthesizedExpression++ArrowFunction :+ArrowParameterList => ConciseBody+```++We couldn't choose the correct production with limited lookahead. Imagine we had to parse a `AssignmentExpression` and the next token is `(`. How would we decide what to parse next? We could either parse an `ParenthesizedExpression` or an `ArrowParameterList`, but our guess could go wrong.++### The very permissive new symbol: CPEAAPL++We'd like to specify the grammar in such a way that it's possible to parse JavaScript according to it with limited lookahead.++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+( Expression )+( Expression , )+( )+(... BindingIdentifier )+(... BindingPattern )+( Expression , ... BindingIdentifier )+( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value.++### Using CPEAAPL in grammar rules++Now we can use the very permissive `CPEAAPL` in grammar productions:++```grammar+AssignmentExpression :+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction+...++ArrowFunction :+ArrowParameters => ConciseBody++ArrowParameters :+BindingIdentifier+CPEAAPL++PrimaryExpression :+...+CPEAAPL++```++Imagine we're again in the situation that we need to parse an `AssignmentExpression` and the next token is `(`. Now we can just decide to parse a `CPEAAPL` and figure out later what it actually is. It doesn't matter whether we're parsing an `ArrowFunction` or a `ParenthesizedExpression`, the next symbol to parse is `CPEAAPL` in any case!++After we've parsed the `CPEAAPL`, we can decide whether the original `AssignmentExpression` is an `ArrowFunction` or a `ParenthesizedExpression` based on the token following the `CPEAAPL`.

I tried to clarify this, by making the section "Using CPEAAPL" only talk about how to disambiguate during parsing, and the section "Restricting CPEAAPLs" only talk about the covering. PTAL

marjakh

comment created time in 19 days

push eventv8/v8.dev

Marja Hölttä

commit sha f0ebedae8297d974d8320abb98478fdd1e763ebd

moar

view details

push time in 19 days

push eventv8/v8.dev

Marja Hölttä

commit sha a50e79ce09b1fef12afa639a0cd612912b748af3

review

view details

push time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression:++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction++PrimaryExpression :+...+ParenthesizedExpression++ArrowFunction :+ArrowParameterList => ConciseBody+```++We couldn't choose the correct production with limited lookahead. Imagine we had to parse a `AssignmentExpression` and the next token is `(`. How would we decide what to parse next? We could either parse an `ParenthesizedExpression` or an `ArrowParameterList`, but our guess could go wrong.++### The very permissive new symbol: CPEAAPL++We'd like to specify the grammar in such a way that it's possible to parse JavaScript according to it with limited lookahead.++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+( Expression )+( Expression , )+( )+(... BindingIdentifier )+(... BindingPattern )+( Expression , ... BindingIdentifier )+( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value.++### Using CPEAAPL in grammar rules++Now we can use the very permissive `CPEAAPL` in grammar productions:++```grammar+AssignmentExpression :+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction+...++ArrowFunction :+ArrowParameters => ConciseBody++ArrowParameters :+BindingIdentifier+CPEAAPL++PrimaryExpression :+...+CPEAAPL++```++Imagine we're again in the situation that we need to parse an `AssignmentExpression` and the next token is `(`. Now we can just decide to parse a `CPEAAPL` and figure out later what it actually is. It doesn't matter whether we're parsing an `ArrowFunction` or a `ParenthesizedExpression`, the next symbol to parse is `CPEAAPL` in any case!++After we've parsed the `CPEAAPL`, we can decide whether the original `AssignmentExpression` is an `ArrowFunction` or a `ParenthesizedExpression` based on the token following the `CPEAAPL`.

Ah, right, I might see what you mean. This sentence is confusing: "Once we know whether we were parsing an ArrowFunction or ParenthesizedExpression, we need to disallow the corresponding illegal construct"

-> From this one might think that this is done right away, after disambiguating... let's see how I can fix this.

marjakh

comment created time in 19 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression:++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction++PrimaryExpression :+...+ParenthesizedExpression++ArrowFunction :+ArrowParameterList => ConciseBody+```++We couldn't choose the correct production with limited lookahead. Imagine we had to parse a `AssignmentExpression` and the next token is `(`. How would we decide what to parse next? We could either parse an `ParenthesizedExpression` or an `ArrowParameterList`, but our guess could go wrong.++### The very permissive new symbol: CPEAAPL++We'd like to specify the grammar in such a way that it's possible to parse JavaScript according to it with limited lookahead.++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+( Expression )+( Expression , )+( )+(... BindingIdentifier )+(... BindingPattern )+( Expression , ... BindingIdentifier )+( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value.++### Using CPEAAPL in grammar rules++Now we can use the very permissive `CPEAAPL` in grammar productions:++```grammar+AssignmentExpression :+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction+...++ArrowFunction :+ArrowParameters => ConciseBody++ArrowParameters :+BindingIdentifier+CPEAAPL++PrimaryExpression :+...+CPEAAPL++```++Imagine we're again in the situation that we need to parse an `AssignmentExpression` and the next token is `(`. Now we can just decide to parse a `CPEAAPL` and figure out later what it actually is. It doesn't matter whether we're parsing an `ArrowFunction` or a `ParenthesizedExpression`, the next symbol to parse is `CPEAAPL` in any case!++After we've parsed the `CPEAAPL`, we can decide whether the original `AssignmentExpression` is an `ArrowFunction` or a `ParenthesizedExpression` based on the token following the `CPEAAPL`.++```javascript+let x = (a, b) => { return a + b; };+//      ^^^^^^+//     CPEAAPL+//             ^^+//             The token following the CPEAAPL++let x = (a, 3);+//      ^^^^^^+//     CPEAAPL+//            ^+//            The token following the CPEAAPL+```++### Restricting CPEAAPLs++As we saw before, the grammar productions for `CPEAAPL` are very permissive and allow constructs (such as `(1, ...a)`) which are never valid. Once we know whether we were parsing an `ArrowFunction` or `ParenthesizedExpression`, we need to disallow the corresponding illegal constructs.++The spec does this by adding the following restrictions:++:::ecmascript-algorithm+> [Static Semantics: Early Errors](https://tc39.es/ecma262/#sec-grouping-operator-static-semantics-early-errors)+>+> `PrimaryExpression : CPEAAPL`+>+> It is a Syntax Error if `CPEAAPL` is not _covering_ a `ParenthesizedExpression`.++:::ecmascript-algorithm+> [Supplemental Syntax](https://tc39.es/ecma262/#sec-primary-expression)+>+> When processing an instance of the production+>+> `PrimaryExpression : CPEAAPL`+>+> the interpretation of the `CPEAAPL` is refined using the following grammar:+>+> `ParenthesizedExpression : ( Expression )`++This means: if we try to use a `CPEAAPL` as a `PrimaryExpression`, it is actually an `ParenthesizedExpression` and this is its only valid production.++`Expression` can never be empty, so `( )` is not a valid `ParenthesizedExpression`. Comma separated lists like `(1, 2, 3)` are created by [the comma operator](https://tc39.es/ecma262/#sec-comma-operator):++```grammar+Expression :+AssignmentExpression+Expression , AssignmentExpression+```++Similarly, if we try to use a `CPEAAPL` as an `ArrowParameters`, the following restrictions apply:++:::ecmascript-algorithm+> [Static Semantics: Early Errors](https://tc39.es/ecma262/#sec-arrow-function-definitions-static-semantics-early-errors)+>+> `ArrowParameters : CPEAAPL`+>+> It is a Syntax Error if `CPEAAPL` is not covering an `ArrowFormalParameters`.++:::ecmascript-algorithm+> [Supplemental Syntax](https://tc39.es/ecma262/#sec-arrow-function-definitions)+>+> When the production+>+> `ArrowParameters` : `CPEAAPL`+>+> is recognized the following grammar is used to refine the interpretation of `CPEAAPL`:+>+> `ArrowFormalParameters :`+> `( UniqueFormalParameters )`++### Other CPEAAPL restrictions++There are also additional rules related to `CPEAAPL`s. For example:++:::ecmascript-algorithm+> [Static Semantics: Early Errors](https://tc39.es/ecma262/#sec-delete-operator-static-semantics-early-errors)+>+> `UnaryExpression: delete UnaryExpression`+>+> - It is a Syntax Error if the `UnaryExpression` is contained in strict mode code and the derived `UnaryExpression` is `PrimaryExpression : IdentifierReference`.+> - It is a Syntax Error if the derived `UnaryExpression` is+> `PrimaryExpression : CPEAAPL`+> and `CPEAAPL` ultimately derives a phrase that, if used in place of `UnaryExpression`, would produce a  Syntax Error according to these rules. This rule is recursively applied.++The first rule forbids `delete IdentifierReference` (for example, `delete foo`) in strict mode. The second rule forbids `CPEAAPL`s which would ultimately produce an `IdentifierReference`, such as `delete (foo)`, `delete ((foo))` and so on.

Removed it

marjakh

comment created time in 19 days

issue closedtc39/proposal-promise-any

Inaccurate text: we don't always reject with an array of rejection reasons

Text:

The any function returns a promise that is fulfilled by the first given promise to be fulfilled, or rejected with an array of rejection reasons if all of the given promises are rejected. It resolves all elements of the passed iterable to promises as it runs this algorithm.

This is inaccurate; when an error is thrown while iterating the Promises, we don't reject with an "array of rejection reasons" (meaning AggregateError?), we reject with the individual error.

These "more fundamental failures" occur when:

  • GetIterator throws (Promise.any step 3)
  • Get throws (PerformPromiseAny step 6)
  • IteratorStep throws (PerformPromiseAny step 8.a-c)
  • IteratorValue throws (PerformPromiseAny steps 8.e-g)
  • Call(promiseResolve) throws (PerformPromiseAny step 8.i)
  • Invoke(nextPromise, "then") throws (PerformPromiseAny step 8.r)

The corrsponding text in Promise.all says:

"The all function returns a new promise which is fulfilled with an array of fulfillment values for the passed promises, or rejects with the reason of the first passed promise that rejects. "

This is also not fully accurate, but we can interpret it permissively to mean that the fundamental failures are equal to the rejection of the corresponding Promise which we were iterating when the failure occurred.

However, this kind of permissive interpretation is harder with the text of Promise.any which says explicitly that it rejects with an array of errors.

closed time in 19 days

marjakh

issue commenttc39/proposal-promise-any

Inaccurate text: we don't always reject with an array of rejection reasons

Yea, this is subtle (subtler than other cases where the error cases don't follow what the spec says), but I don't think we should modify the text either.

marjakh

comment created time in 19 days

issue commenttc39/ecma262

Promise.all: slightly inaccurate text regarding the rejection reason

Closing this as this prob only makes sense to discuss in terms of Promise.any.

marjakh

comment created time in 20 days

issue closedtc39/ecma262

Promise.all: slightly inaccurate text regarding the rejection reason

Text in https://tc39.es/ecma262/#sec-promise.all :

The all function returns a new promise which is fulfilled with an array of fulfillment values for the passed promises, or rejects with the reason of the first passed promise that rejects.

However, if an error is thrown while we iterate the Promises, we reject the overall promise with that particular error. The error is not "the reason of the first passed promise that rejects". Though, the error can be permissively interpreted to be equivalent to rejecting the Promise we were currently iterating (if there's an non-zero amount of them).

The iteration failures occur if:

  • GetIterator throws (Promise.all steps 3-4)
  • Iteratorstep throws (PerformPromiseAll steps 8.a - c)
  • Call(resultCapability.[[Resolve]]) throws (PerformPromiseAll step 8.iii.2)
  • IteratorValue throws (PerformPromiseAll steps 8.e - g)
  • Call(promiseResolve) throws (PerformPromiseAll steps 8.i)
  • Invoke(nextPromise, "then") throws (PerformPromiseAll step 8.r)

(Similar inaccuracies occur in Promise.race and Promise.any.)

closed time in 20 days

marjakh

issue commenttc39/ecma262

Promise.all: slightly inaccurate text regarding the rejection reason

Right. This came up with "Promise.any", where the text suggests that when it rejects, it always rejects with an AggregateError. However, that's not true. In the context of Promise combinators, this is more subtle than just an implicit "or throws". E.g., it would be almost equal to a case where the text says that a function returns a String, whereas it actually returns "false" in some error cases.

But I guess we can't do much here without being inconsistent with other parts of the spec.

marjakh

comment created time in 20 days

push eventv8/v8.dev

Marja Hölttä

commit sha 0cea04679e9f4babfcb675f9bbf6a46c230085ea

review

view details

push time in 20 days

Pull request review commentv8/v8.dev

Add a blog post: Understanding the ECMAScript spec, part 4

+---+title: 'Understanding the ECMAScript spec, part 4'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-05-06+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and the **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`.++In [part 3](/blog/understanding-ecmascript-part-3), we familiarized ourselves with the lexical grammar, the syntactic grammar, and the shorthands used for defining the syntactic grammar.++## Meanwhile in other parts of the Web++[Jason Orendorff](https://github.com/jorendorff) from Mozilla published [a great in-depth analysis of JS syntactic quirks](https://github.com/mozilla-spidermonkey/jsparagus/blob/master/js-quirks.md#readme). Even though the implementation details differ, every JS engine faces the same problems with these quirks.++## Cover grammars++In this episode, we take a deeper look into *cover grammars*. They are a way to specify grammar  rules for syntactic constructs where we don't know what we're looking at until we've seen the complete construct.++Again, we'll skip the subscripts for `[In, Yield, Await]` for brevity, as they aren't important for this blog post. See [part 3](/blog/understanding-ecmascript-part-3) for an explanation of their meaning and usage.++## Parenthesized expression or an arrow parameter list?++Typically, parsers decide which grammar production to follow based on finite lookahead.++For example:+```javascript+let x = (a,+```++Is this the start of an arrow function, like this?++```javascript+let x = (a, b) => { return a + b };+```++Or maybe it's a parenthesized expression:++```javascript+let x = (a, 3);+```++The parenthesized whatever-it-is can be arbitrarily long - we cannot know what it is based on a finite amount of tokens.++If the productions were written like this:++```grammar+AssignmentExpression :+...+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction++PrimaryExpression :+...+ParenthesizedExpression++ArrowFunction :+ArrowParameterList => ConciseBody+```++We couldn't choose the correct production with limited lookahead. Imagine we had to parse a `AssignmentExpression` and the next token is `(`. How would we decide what to parse next? We could either parse an `ParenthesizedExpression` or an `ArrowParameterList`, but our guess could go wrong.++### The very permissive new symbol: CPEAAPL++We'd like to specify the grammar in such a way that it's possible to parse JavaScript according to it with limited lookahead.++The spec solves this problem by introducing the symbol `CoverParenthesizedExpressionAndArrowParameterList` (`CPEAAPL` for short). `CPEAAPL` is a symbol that is actually an `ParenthesizedExpression` or an `ArrowParameterList` behind the scenes, but we don't yet know which one.++The [productions](https://tc39.es/ecma262/#prod-CoverParenthesizedExpressionAndArrowParameterList) for `CPEAAPL` are very permissive, allowing all constructs that can occur in `ParenthesizedExpression`s and in `ArrowParameterList`s:++```grammar+CPEAAPL :+( Expression )+( Expression , )+( )+(... BindingIdentifier )+(... BindingPattern )+( Expression , ... BindingIdentifier )+( Expression , ... BindingPattern )+```++For example, the following expressions are valid `CPEAAPL`s:++```javascript+// Valid ParenthesizedExpression and ArrowParameterList:+(a, b)+(a, b = 1)++// Valid ParenthesizedExpression:+(1, 2, 3)+(function foo() { })++// Valid ArrowParameterList:+()+(a, b,)+(a, ...b)+(a = 1, ...b)++// Not valid either, but still a CPEAAPL:+(1, ...b)+(1, )+```++Trailing comma and the `...` can occur only in `ArrowParameterList`. Some constructs, like `b = 1` can occur in both, but they have different meanings: Inside `ParenthesizedExpression` it's an assignment, inside `ArrowParameterList` it's a parameter with a default value.++### Using CPEAAPL in grammar rules++Now we can use the very permissive `CPEAAPL` in grammar productions:++```grammar+AssignmentExpression :+ConditionalExpression (eventually leading to PrimaryExpression)+ArrowFunction+...++ArrowFunction :+ArrowParameters => ConciseBody++ArrowParameters :+BindingIdentifier+CPEAAPL++PrimaryExpression :+...+CPEAAPL++```++Imagine we're again in the situation that we need to parse an `AssignmentExpression` and the next token is `(`. Now we can just decide to parse a `CPEAAPL` and figure out later what it actually is. It doesn't matter whether we're parsing an `ArrowFunction` or a `ParenthesizedExpression`, the next symbol to parse is `CPEAAPL` in any case!++After we've parsed the `CPEAAPL`, we can decide whether the original `AssignmentExpression` is an `ArrowFunction` or a `ParenthesizedExpression` based on the token following the `CPEAAPL`.

I never said it was :)

I''ll need to clarify this, I'll write my thoughts here.

There are two things:

  1. the existence of CPEAAPL enables us to go ahead with parsing and pick the production to follow after having parsed the CPEAAPL

  2. after we have the AST (having the AST requires having decided whether we have an ArrowFunction or ParenthesizedExpression), we apply the "covering" rules like you said.

-> How to makes this clearer in the text?

2 is the "cover semantics" but 1 is a needed step for constructing the AST in the first place.

marjakh

comment created time in 20 days

issue commenttc39/proposal-promise-any

Inaccurate text: we don't always reject with an array of rejection reasons

See also https://github.com/tc39/ecma262/issues/1983 for Promise.all

marjakh

comment created time in 20 days

issue openedtc39/ecma262

Promise.all: slightly inaccurate text regarding the rejection reason

Text in https://tc39.es/ecma262/#sec-promise.all :

The all function returns a new promise which is fulfilled with an array of fulfillment values for the passed promises, or rejects with the reason of the first passed promise that rejects.

However, if an error is thrown while we iterate the Promises, we reject the overall promise with that particular error. The error is not "the reason of the first passed promise that rejects". Though, the error can be permissively interpreted to be equivalent to rejecting the Promise we were currently iterating (if there's an non-zero amount of them).

The iteration failures occur if:

  • GetIterator throws (Promise.all steps 3-4)
  • Iteratorstep throws (PerformPromiseAll steps 8.a - c)
  • Call(resultCapability.[[Resolve]]) throws (PerformPromiseAll step 8.iii.2)
  • IteratorValue throws (PerformPromiseAll steps 8.e - g)
  • Call(promiseResolve) throws (PerformPromiseAll steps 8.i)
  • Invoke(nextPromise, "then") throws (PerformPromiseAll step 8.r)

(Similar inaccuracies occur in Promise.race and Promise.any.)

created time in 20 days

issue openedtc39/proposal-promise-any

Inaccurate text: we don't always reject with an array of rejection reasons

Text:

The any function returns a promise that is fulfilled by the first given promise to be fulfilled, or rejected with an array of rejection reasons if all of the given promises are rejected. It resolves all elements of the passed iterable to promises as it runs this algorithm.

This is inaccurate; when something goes wrong in iterating the Promises, we don't reject with an "array of rejection reasons" (meaning AggregateError?), we reject with the individual error that was thrown.

These "more fundamental failures" occur when:

  • GetIterator throws (Promise.any step 3)
  • Get throws (PerformPromiseAny step 6)
  • IteratorStep throws (PerformPromiseAny step 8.a-c)
  • IteratorValue throws (PerformPromiseAny steps 8.e-g)
  • Call(promiseResolve) throws (PerformPromiseAny step 8.i)
  • Invoke(nextPromise, "then") throws (PerformPromiseAny step 8.r)

The corrsponding text in Promise.all says:

"The all function returns a new promise which is fulfilled with an array of fulfillment values for the passed promises, or rejects with the reason of the first passed promise that rejects. "

This is also not fully accurate, but we can interpret it permissively to mean that the fundamental failures are equal to the rejection of the corresponding Promise which we were iterating when the failure occurred.

However, this kind of permissive interpretation is harder with the text of Promise.any which says explicitly that it rejects with an array of errors.

created time in 20 days

PR opened v8/v8.dev

Reviewers
Add a blog post: Understanding the ECMAScript spec, part 4

This is not super polished yet, but sending this out for early comments.

+280 -0

0 comment

1 changed file

pr created time in 21 days

push eventv8/v8.dev

Marja Hölttä

commit sha 31cac678009af97345bb5bbefbfc5ce27d284dad

more

view details

push time in 21 days

push eventv8/v8.dev

Marja Hölttä

commit sha 25b56e98fe731a9035bb52def9492501841ab356

more

view details

push time in 21 days

create barnchv8/v8.dev

branch : ecma-part-4

created branch time in 21 days

push eventmarjakh/proposal-promise-any

Marja Hölttä

commit sha c94c7ebe33dcb0d5a3709bf6d8ee4d74a983a264

fix assert

view details

push time in a month

push eventmarjakh/proposal-promise-any

Marja Hölttä

commit sha f6df10352fa9f7d03125a1bc0db4a5fbe82c876f

mention [[AggregateErrors]] in Properties of AE Instances

view details

push time in a month

push eventmarjakh/proposal-promise-any

Marja Hölttä

commit sha 615b61c1764e5104db141bde46e7db8626e7837f

Added an assert

view details

push time in a month

Pull request review commenttc39/test262

Add Promise/*/resolve-not-callable-close.js

+// Copyright (C) 2020 the V8 project authors. All rights reserved.+// This code is governed by the BSD license found in the LICENSE file.++/*---+description: >+    Explicit iterator closing if Promise.resolve is not callable+esid: sec-promise.all+info: |+    5. Let result be PerformPromiseAll(iteratorRecord, C, promiseCapability).+    6. If result is an abrupt completion,+        a. If iteratorRecord.[[Done]] is false, let result be+           IteratorClose(iterator, result).+        b. IfAbruptRejectPromise(result, promiseCapability).++    [...]++    Runtime Semantics: PerformPromiseAll++    [...]+    5. Let promiseResolve be ? Get(constructor, "resolve").+    6. If ! IsCallable(promiseResolve) is false, throw a TypeError exception.+    [...]++flags: [async]+features: [Symbol.iterator, computed-property-names, arrow-function]+---*/++let returnCount = 0;+const iter = { +  [Symbol.iterator]: function() {+    return {+      return: function() {+        ++returnCount;+      }+    };+  }+}++Promise.resolve = "certainly not callable";++Promise.all(iter).then(() => {+  $DONE('The promise should be rejected, but was resolved');+}, (reason) => {+  assert(reason instanceof TypeError);+}).then($DONE, $DONE);++assert.sameValue(returnCount, 1);

Exactly, like @shvaikalesh says. This is covering the case where "resolve" is not callable, while existing tests cover the case that Get(constructor, "resolve") throws.

As an additional data point, we had a situation in V8 where the existing test262 tests were passing but this test failed B-) so it's surely not a duplicate. For more info: https://bugs.chromium.org/p/v8/issues/detail?id=10452

marjakh

comment created time in a month

PR opened tc39/test262

Add Promise/*/resolve-not-callable-close.js
+182 -0

0 comment

4 changed files

pr created time in a month

push eventmarjakh/test262

Marja Hölttä

commit sha 4948516b380867c0ebe87d669a4d86cb4bdd5f7e

fix

view details

push time in a month

push eventmarjakh/test262

Marja Hölttä

commit sha a3c004952596b02eaf447d6feef5031cf69d9f5f

fix

view details

push time in a month

push eventmarjakh/test262

Marja Hölttä

commit sha 3b853d60fb5b2494f3e35eef8e8efedceba52056

fix

view details

push time in a month

create barnchmarjakh/test262

branch : promise-all-iterator-close

created branch time in a month

fork marjakh/test262

Official ECMAScript Conformance Test Suite

fork in a month

Pull request review commenttc39/proposal-promise-any

Fix typo-like mistake

 <h1>AggregateError ( _errors_, _message_ )</h1>         <p>When the *AggregateError* function is called with arguments _errors_ and _message_, the following steps are taken:</p>         <emu-alg>           1. If NewTarget is *undefined*, let _newTarget_ be the active function object, else let _newTarget_ be NewTarget.-          1. Let _O_ be ? OrdinaryCreateFromConstructor(_newTarget_, `"%AggregateError.prototype%"`, « [[ErrorData]], [[AggregateErrors]] »).+          1. Let _O_ be ? OrdinaryCreateFromConstructor(_newTarget_, `"%AggregateErrorPrototype%"`, « [[ErrorData]], [[AggregateErrors]] »).

Ah, cool, thanks for the info :)

marjakh

comment created time in a month

push eventmarjakh/proposal-promise-any

Marja Hölttä

commit sha 6b6bcf72155d85d7eea9b7dc5f4ed9d7d1b8fab1

Fix typo-like mistake

view details

push time in a month

PR opened tc39/proposal-promise-any

Fix typo-like mistake
+1 -1

0 comment

1 changed file

pr created time in a month

Pull request review commenttc39/proposal-promise-any

Unify AggregateError ctor with Error ctor

 <h1>AggregateError ( _errors_, _message_ )</h1>         <emu-alg>           1. If NewTarget is *undefined*, let _newTarget_ be the active function object, else let _newTarget_ be NewTarget.           1. Let _O_ be ? OrdinaryCreateFromConstructor(_newTarget_, `"%AggregateError.prototype%"`, « [[ErrorData]], [[AggregateErrors]] »).-          1. Let _errorsList_ be ? IterableToList(_errors_).-          1. Set _O_.[[AggregateErrors]] to _errorsList_.

Thanks for the clarification of your preferences. Right, maybe it's not too surprising that super() is not the first line (it doesn't have to be, but it's conventional that it is).

I would be happy with option 3 as well, let's wait what @syg says. Maybe that's the "lesser evil", as all of the options have downsides.

marjakh

comment created time in a month

Pull request review commenttc39/proposal-promise-any

Unify AggregateError ctor with Error ctor

 <h1>AggregateError ( _errors_, _message_ )</h1>         <emu-alg>           1. If NewTarget is *undefined*, let _newTarget_ be the active function object, else let _newTarget_ be NewTarget.           1. Let _O_ be ? OrdinaryCreateFromConstructor(_newTarget_, `"%AggregateError.prototype%"`, « [[ErrorData]], [[AggregateErrors]] »).-          1. Let _errorsList_ be ? IterableToList(_errors_).-          1. Set _O_.[[AggregateErrors]] to _errorsList_.

That version of course makes "super(message)" possible to reuse, but it's surprising that the super call is not the first line.

To summarize:

Option 1 (current):

constructor(errors, message) {
    super(undefined);
    const errorsList = IterableToList(errors);
    this.#AggregateErrors = errorsList;
    this.message = ToString(message);
  }

Downside: can't reuse super(message)

Option 2 (this PR):

constructor(errors, message) {
    super(message);
    const errorsList = IterableToList(errors);
    this.#AggregateErrors = errorsList;
  }

Downside: arguments not processed in order

Option 3:

constructor(errors, message) {
    const errorsList = IterableToList(errors);
    super(message);
    this.#AggregateErrors = errorsList;
  }

Downside: super() not the first line

Option 4:

constructor(message, errors) {
    super(message);
    const errorsList = IterableToList(errors);
    this.#AggregateErrors = errorsList;
  }

Downside: argument order not ergonomic

If I understand correctly, @ljharb would prefer: 1 > 3 > 4 > 2

syg@ and I prefer: 2 > {1, 3} > 4 (I'm undecided which one is more surprising, 3 or 1, and idk what syg@ thinks.)

marjakh

comment created time in a month

Pull request review commenttc39/proposal-promise-any

Unify AggregateError ctor with Error ctor

 <h1>AggregateError ( _errors_, _message_ )</h1>         <emu-alg>           1. If NewTarget is *undefined*, let _newTarget_ be the active function object, else let _newTarget_ be NewTarget.           1. Let _O_ be ? OrdinaryCreateFromConstructor(_newTarget_, `"%AggregateError.prototype%"`, « [[ErrorData]], [[AggregateErrors]] »).-          1. Let _errorsList_ be ? IterableToList(_errors_).-          1. Set _O_.[[AggregateErrors]] to _errorsList_.

In that case, we can reduce this patch to "2. Unify handling of the message parameter with Error ctor". @marjakh WDYT?

My thinking is similar to syg@ (the comment below yours). Especially I find this pseudo code from #14 compelling:

class AggregateError extends Error {
  errors;
  constructor(errors, message) {
    super(message);
    this.errors = Array.from(errors);
  }
}

Also from implementor's point of view: we already have a function that's essentially the "super(message);", and it's weird that we cannot use it. It would be intuitive to implement this feature by reusing the Error construction code.

marjakh

comment created time in a month

PR opened tc39/proposal-promise-any

Unify AggregateError ctor with Error ctor
  1. The order of setting fields: 1) create object 2) toString(message) & set "message" 3) IterableToList(errors) & set "errors".
  • This order is more natural (first "superclass fields", then "subclass fields")
  • This enables engines to reuse the code for creating Error objects
  1. Unify handling of the message parameter with Error ctor

Having the exact same spec text makes it clear that these ctors do the exact same thing w/ the message parameter.

+4 -3

0 comment

1 changed file

pr created time in a month

push eventmarjakh/proposal-promise-any

Marja Hölttä

commit sha 97c8b88baec073633b3b64692e2170b5bf53043d

Unify AggregateError ctor with Error ctor 1) The order of setting fields: 1) create object 2) toString(message) & set "message" 3) IterableToList(errors) & set "errors". - This order is more natural (first "superclass fields", then "subclass fields") - This enables engines to reuse the code for creating Error objects 2) Unify handling of the message parameter with Error ctor Having the exact same spec text makes it clear that these ctors do the exact same thing w/ the message parameter.

view details

push time in a month

push eventmarjakh/proposal-promise-any

Marja Hölttä

commit sha 793f8e6b05d23bfc0fa9429c6984c134c4c5288c

Remove redundant line from AggregateError.prototype.errors get

view details

push time in a month

issue openedkahole/vscode-magit

Pressing enter inside a modified chunk doesn't open the edited file in the right location

Steps to repro:

Edit a file (e.g., add a line) Open magit status Expand the changes with tab Move the cursor e.g., to an added line Press Enter

Expected: it opens the edited file, in the right location, where the change is. (This is how the original magit works.)

Actual: it opens the edited file but at the top (need to search for the changed location manually).

created time in 2 months

PR opened v8/v8.dev

Language fix
+1 -1

0 comment

1 changed file

pr created time in 2 months

create barnchv8/v8.dev

branch : marjakh-patch-3

created branch time in 2 months

PR opened v8/v8.dev

Fix FormalParameters parameters
+5 -5

0 comment

1 changed file

pr created time in 2 months

create barnchv8/v8.dev

branch : marjakh-patch-2

created branch time in 2 months

PR closed v8/v8.dev

Fix instructions cla: yes

depot_tools has to be before the system path.

+6 -1

1 comment

1 changed file

marjakh

pr closed time in 2 months

pull request commentv8/v8.dev

Fix instructions

Apparently the depot_tools change was not intentional and it's getting fixed on their side -> closing this

marjakh

comment created time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha c3a2ac5369d6191ca67ad8e685e251822569d17f

lint fixes

view details

push time in 2 months

pull request commentv8/v8.dev

Add missing figcaption to a blog post

Yes, this was accidental! Thanks for fixing

RReverser

comment created time in 2 months

PR opened v8/v8.dev

Fix instructions

depot_tools has to be before the system path.

+6 -1

0 comment

1 changed file

pr created time in 2 months

create barnchv8/v8.dev

branch : depot-tools-instructions-fix

created branch time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-03-02 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good time to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++The ECMAScript spec defines four grammars:++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how [Unicode code points](https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology) are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++The [syntactic grammar](https://tc39.es/ecma262/#sec-syntactic-grammar) defines how syntactically correct programs are composed of tokens.++The [RegExp grammar](https://tc39.es/ecma262/#sec-patterns) describes how Unicode code points are translated into regular expressions.++The [numeric string grammar](https://tc39.es/ecma262/#sec-tonumber-applied-to-the-string-type) describes how Strings are translated into numeric values.++Each grammar is defined as a context-free grammar, consisting of a set of production rules.++The grammars use slighlty different notation: the syntactic grammar uses `LeftHandSideSymbol :` whereas the lexical grammar and the RegExp grammar use `LeftHandSideSymbol ::` and the numeric string grammar uses `LeftHandSideSymbol :::`.++Next we'll look into the lexical grammar and the syntactic grammar in more detail.++## Lexical grammar++The spec defines ECMAScript source text as a sequence of Unicode code points. For example, variable names are not limited to ASCII characters but can also include other Unicode characters, such as emojis. The spec doesn't talk about the actual encoding (for example, UTF-8 or UTF-16). It assumes that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++It's not possible to tokenize ECMAScript source code in advance, which makes defining the lexical grammar slightly more complicated.++For example, we cannot determine whether `/` is the division operator or the start of a RegExp without looking at the larger context it occurs:++```js+const x = 10 / 5;+```++Here `/` is a DivPunctuator.++```js+const r = /foo/;+```++Here the first `/` is the start of a RegularExpressionLiteral.++Templates introduce a similar ambiguity &mdash; the interpretation of <code>}`</code> depends on the context it occurs:++```js+const what1 = 'temp';+const what2 = 'late';+const t = `I am a ${ what1 + what2 }`;+```++Here <code>\`I am a ${</code> is TemplateHead and <code>}\`</code> is TemplateTail.++```js+if (0 == 1) {+}`not very useful`;+```++Here `}` is RightBracePunctuator and <code>\`</code> is the start of a NoSubstitutionTemplate.++Even though the interpretation of `/` and <code>}`</code> depends on their "context" &mdash; their position in the syntactic structure of the code &mdash; the grammars we'll describe next are still context-free.++The lexical grammar uses several goal symbols to distinguish between the contexts where some input elements are permitted and some are not. For example, the goal symbol `InputElementDiv` is used in contexts where `/` is a division and `/=` is a division-assignment. The `InputElementDiv` productions list the possible tokens which can be produced in this context:++> [`InputElementDiv ::`](https://tc39.es/ecma262/#prod-InputElementDiv)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `DivPunctuator`+> `RightBracePunctuator`++In this context, encountering `/` will produce the `DivPunctuator` input element. Producing a `RegularExpressionLiteral` is not an option here.++On the other hand, `InputElementRegExp` is the goal symbol for the contexts where `/` is the beginning of a RegExp:++> [`InputElementRegExp ::`](https://tc39.es/ecma262/#prod-InputElementRegExp)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `RightBracePunctuator`+> `RegularExpressionLiteral`++As we see from the productions, it's possible that this produces the `RegularExpressionLiteral` input element, but producing `DivPunctuator` is not possible.++Similarly, there is another goal symbol, `InputElementRegExpOrTemplateTail`, for contexts where `TemplateMiddle` and `TemplateTail` are permitted, in addition to `RegularExpressionLiteral`. And finally, `InputElementTemplateTail` is the goal symbol for contexts where only `TemplateMiddle` and `TemplateTail` are permitted but `RegularExpressionLiteral` is not permitted.++In implementations, the syntactic grammar analyzer ("parser") may call the lexical grammar analyzer ("tokenizer" or "lexer"), passing the goal symbol as a parameter and asking for the next input element suitable for that goal symbol.++## Syntactic grammar++We looked into the lexical grammar, which defines how we construct tokens from Unicode code points. The syntactic grammar builds on it: It defines how syntactically correct programs are composed of tokens.++### Example: Allowing legacy identifiers++Introducing a new keyword to the grammar is a possibly breaking change &mdash; what if existing code already uses the keyword as an identifier?++For example, before `await` was a keyword, someone might have written the following code:++```js+function old() {+  var await;+}+```++The ECMAScript grammar carefully added the `await` keyword in such a way that this code will continue to work. Inside async functions, `await` is a keyword, so this doesn't work:++```js+async function modern() {+  var await; // Syntax error+}+```++Allowing `yield` as an identifier in non-generators and disallowing it in generators works similarly.++Understanding how `await` is allowed as an identifier requires understanding ECMAScript-specific syntactic grammar notation. Let's dive right in!++### Productions and shorthands++Let's look at how the productions for `VariableStatement` are defined. At the first glance, the grammar can look a bit scary:++> [<code>VariableStatement<sub>[Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-VariableStatement)+> <code>var VariableDeclarationList<sub>[+In, ?Yield, ?Await]</sub>;</code>++What do the subscripts (`[Yield, Await]`) and prefixes (`+` in `+In` and `?` in `?Async`) mean?++The notation is explained in section [Grammar Notation](https://tc39.es/ecma262/#sec-grammar-notation).++The subscripts are a shorthand for expressing a set of productions, for a set of left-hand side symbols, all at once. The left-hand side symbol has two parameters, so the "real" left-hand side symbols we're defining are `VariableStatement`, `VariableStatement_Yield`, `VariableStatement_Await` and `VariableStatement_Yield_Await`.++Note that here the plain `VariableStatement` means "`VariableStatement` without `_Await` and `_Yield`". It should not be confused with <code>VariableStatement<sub>[Yield, Await]</sub></code>.++On the right-hand side of the production, we see the shorthand `+In`, meaning "use the version with `_In`", and `?Await`, meaning "use the version with `_Await` if and only if the left-hand side symbol has `_Await`" (similarly with `?Yield`).++(The third shorthand, `~Foo`, meaning "use the version without `_Foo`", is not used in this production.)++With this information, we can expand the productions like this:++> `VariableStatement` :+> `var VariableDeclarationList_In;`+>+> `VariableStatement_Yield` :+> `var VariableDeclarationList_In_Yield;`+>+> `VariableStatement_Await` :+> `var VariableDeclarationList_In_Await;`+>+> `VariableStatement_Yield_Await` :+> `var VariableDeclarationList_In_Yield_Await;`++Ultimately, we'll need to find out two things:+1. Where is it decided whether we're in the case with `_Await` or without `_Await`?+1. Where does it make a difference &mdash; where do the productions for `Something_Await` and `Something` (without `_Await`) diverge?++### `_Await` or no `_Await`?++Let's tackle question 1 first. It's somewhat easy to guess that non-async functions and async functions differ in whether we pick the parameter `_Await` for the function body or not. Reading the productions for async function declarations, we find this:++> [`AsyncFunctionBody :`](https://tc39.es/ecma262/#prod-AsyncFunctionBody)+> <code>FunctionBody<sub>[~Yield, +Await]</sub></code>++Note that `AsyncFunctionBody` has no parameters &mdash; they get added to the `FunctionBody` on the right-hand side.++If we expand this production, we get:++> `AsyncFunctionBody :`+> `FunctionBody_Await`++In other words, async functions have `FunctionBody_Await`, meaning a function body where `await` is treated as a keyword.++On the other hand, if we're inside a non-async function, the relevant production is:++> [<code>FunctionDeclaration<sub>[Yield, Await, Default]</sub>](https://tc39.es/ecma262/#prod-FunctionDeclaration) :</code>+> <code>function BindingIdentifier<sub>[?Yield, ?Await]</sub> ( FormalParameters<sub>[~Yield, ~Await]</sub> ) { FunctionBody<sub>[~Yield, ~Await]</sub> }</code>++(`FunctionDeclaration` has another production, but it's not relevant for our code example.)++To avoid combinatorial expansion, let's ignore the `Default` parameter which is not used in this particular production.++The expanded form of the production is:++> `FunctionDeclaration :`+> `function BindingIdentifier ( FormalParameters ) { FunctionBody }`++> `FunctionDeclaration_Yield :`+> `function BindingIdentifier_Yield ( FormalParameters_Yield ) { FunctionBody }`++> `FunctionDeclaration_Await :`+> `function BindingIdentifier_Await ( FormalParameters_Await ) { FunctionBody }`++> `FunctionDeclaration_Yield_Await :`+> `function BindingIdentifier_Yield_Await ( FormalParameters_Yield_Await ) { FunctionBody }`++In this production we always get `FunctionBody` (without `_Yield` and without `_Await`), since the `FunctionBody` in the non-expanded production is parameterized with `[~Yield, ~Await]`.++Function name and formal parameters are treated differently: they get the parameters `_Await` and `_Yield` if the left-hand side symbol has them.++To summarize: Async functions have a `FunctionBody_Await` and non-async functions have a `FunctionBody` (without `_Await`). Since we're talking about non-generator functions, both our async example function and our non-async example function are parameterized without `_Yield`.++Maybe it's hard to remember which one is `FunctionBody` and which `FunctionBody_Await`. Is `FunctionBody_Await` for a function where `await` is an identifier, or for a function where `await` is a keyword?++You can think of the `_Await` parameter meaning "`await` is a keyword". This approach is also future proof. Imagine a new keyword, `blob` being added, but only inside "blobby" functions. Non-blobby non-async non-generators would still have `FunctionBody` (without `_Await`, `_Yield` or `_Blob`), exactly like they have now. Blobby functions would have a `FunctionBody_Blob`, async blobby functions would have `FunctionBody_Await_Blob` and so on.

Added: We'd still need to add the Blob subscript to the rules, but the expanded forms of FunctionBody for already existing functions stay the same.

marjakh

comment created time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha d75d0546da4230eb3ca29da0adb841a41cec0663

minor

view details

push time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-03-02 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 1](/blog/understanding-ecmascript-part-1), we read through a simple method — `Object.prototype.hasOwnProperty` — and **abstract operations** it invokes. We familiarized ourselves with the shorthands `?` and `!` related to error handling. We encountered **language types**, **specification types**, **internal slots**, and **internal methods**.++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good time to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++The ECMAScript spec defines four grammars:++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how [Unicode code points](https://en.wikipedia.org/wiki/Unicode#Architecture_and_terminology) are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++The [syntactic grammar](https://tc39.es/ecma262/#sec-syntactic-grammar) defines how syntactically correct programs are composed of tokens.++The [RegExp grammar](https://tc39.es/ecma262/#sec-patterns) describes how Unicode code points are translated into regular expressions.++The [numeric string grammar](https://tc39.es/ecma262/#sec-tonumber-applied-to-the-string-type) describes how Strings are translated into numeric values.++Each grammar is defined as a context-free grammar, consisting of a set of production rules.++The grammars use slighlty different notation: the syntactic grammar uses `LeftHandSideSymbol :` whereas the lexical grammar and the RegExp grammar use `LeftHandSideSymbol ::` and the numeric string grammar uses `LeftHandSideSymbol :::`.++Next we'll look into the lexical grammar and the syntactic grammar in more detail.++## Lexical grammar++The spec defines ECMAScript source text as a sequence of Unicode code points. For example, variable names are not limited to ASCII characters but can also include other Unicode characters, such as emojis. The spec doesn't talk about the actual encoding (for example, UTF-8 or UTF-16). It assumes that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++It's not possible to tokenize ECMAScript source code in advance, which makes defining the lexical grammar slightly more complicated.++For example, we cannot determine whether `/` is the division operator or the start of a RegExp without looking at the larger context it occurs:++```js+const x = 10 / 5;+```++Here `/` is a DivPunctuator.++```js+const r = /foo/;+```++Here the first `/` is the start of a RegularExpressionLiteral.++Templates introduce a similar ambiguity &mdash; the interpretation of <code>}`</code> depends on the context it occurs:++```js+const what1 = 'temp';+const what2 = 'late';+const t = `I am a ${ what1 + what2 }`;+```++Here <code>\`I am a ${</code> is TemplateHead and <code>}\`</code> is TemplateTail.++```js+if (0 == 1) {+}`not very useful`;+```++Here `}` is RightBracePunctuator and <code>\`</code> is the start of a NoSubstitutionTemplate.++Even though the interpretation of `/` and <code>}`</code> depends on their "context" &mdash; their position in the syntactic structure of the code &mdash; the grammars we'll describe next are still context-free.++The lexical grammar uses several goal symbols to distinguish between the contexts where some input elements are permitted and some are not. For example, the goal symbol `InputElementDiv` is used in contexts where `/` is a division and `/=` is a division-assignment. The `InputElementDiv` productions list the possible tokens which can be produced in this context:++> [`InputElementDiv ::`](https://tc39.es/ecma262/#prod-InputElementDiv)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `DivPunctuator`+> `RightBracePunctuator`++In this context, encountering `/` will produce the `DivPunctuator` input element. Producing a `RegularExpressionLiteral` is not an option here.++On the other hand, `InputElementRegExp` is the goal symbol for the contexts where `/` is the beginning of a RegExp:++> [`InputElementRegExp ::`](https://tc39.es/ecma262/#prod-InputElementRegExp)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `RightBracePunctuator`+> `RegularExpressionLiteral`++As we see from the productions, it's possible that this produces the `RegularExpressionLiteral` input element, but producing `DivPunctuator` is not possible.++Similarly, there is another goal symbol, `InputElementRegExpOrTemplateTail`, for contexts where `TemplateMiddle` and `TemplateTail` are permitted, in addition to `RegularExpressionLiteral`. And finally, `InputElementTemplateTail` is the goal symbol for contexts where only `TemplateMiddle` and `TemplateTail` are permitted but `RegularExpressionLiteral` is not permitted.++In implementations, the syntactic grammar analyzer ("parser") may call the lexical grammar analyzer ("tokenizer" or "lexer"), passing the goal symbol as a parameter and asking for the next input element suitable for that goal symbol.++## Syntactic grammar++We looked into the lexical grammar, which defines how we construct tokens from Unicode code points. The syntactic grammar builds on it: It defines how syntactically correct programs are composed of tokens.++### Example: Allowing legacy identifiers++Introducing a new keyword to the grammar is a possibly breaking change &mdash; what if existing code already uses the keyword as an identifier?++For example, before `await` was a keyword, someone might have written the following code:++```js+function old() {+  var await;+}+```++The ECMAScript grammar carefully added the `await` keyword in such a way that this code will continue to work. Inside async functions, `await` is a keyword, so this doesn't work:++```js+async function modern() {+  var await; // Syntax error+}+```++Allowing `yield` as an identifier in non-generators and disallowing it in generators works similarly.++Understanding how `await` is allowed as an identifier requires understanding ECMAScript-specific syntactic grammar notation. Let's dive right in!++### Productions and shorthands++Let's look at how the productions for `VariableStatement` are defined. At the first glance, the grammar can look a bit scary:++> [<code>VariableStatement<sub>[Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-VariableStatement)+> <code>var VariableDeclarationList<sub>[+In, ?Yield, ?Await]</sub>;</code>++What do the subscripts (`[Yield, Await]`) and prefixes (`+` in `+In` and `?` in `?Async`) mean?++The notation is explained in section [Grammar Notation](https://tc39.es/ecma262/#sec-grammar-notation).++The subscripts are a shorthand for expressing a set of productions, for a set of left-hand side symbols, all at once. The left-hand side symbol has two parameters, so the "real" left-hand side symbols we're defining are `VariableStatement`, `VariableStatement_Yield`, `VariableStatement_Await` and `VariableStatement_Yield_Await`.++Note that here the plain `VariableStatement` means "`VariableStatement` without `_Await` and `_Yield`". It should not be confused with <code>VariableStatement<sub>[Yield, Await]</sub></code>.++On the right-hand side of the production, we see the shorthand `+In`, meaning "use the version with `_In`", and `?Await`, meaning "use the version with `_Await` if and only if the left-hand side symbol has `_Await`" (similarly with `?Yield`).++(The third shorthand, `~Foo`, meaning "use the version without `_Foo`", is not used in this production.)++With this information, we can expand the productions like this:++> `VariableStatement` :+> `var VariableDeclarationList_In;`+>+> `VariableStatement_Yield` :+> `var VariableDeclarationList_In_Yield;`+>+> `VariableStatement_Await` :+> `var VariableDeclarationList_In_Await;`+>+> `VariableStatement_Yield_Await` :+> `var VariableDeclarationList_In_Yield_Await;`++Ultimately, we'll need to find out two things:+1. Where is it decided whether we're in the case with `_Await` or without `_Await`?+1. Where does it make a difference &mdash; where do the productions for `Something_Await` and `Something` (without `_Await`) diverge?++### `_Await` or no `_Await`?++Let's tackle question 1 first. It's somewhat easy to guess that non-async functions and async functions differ in whether we pick the parameter `_Await` for the function body or not. Reading the productions for async function declarations, we find this:++> [`AsyncFunctionBody :`](https://tc39.es/ecma262/#prod-AsyncFunctionBody)+> <code>FunctionBody<sub>[~Yield, +Await]</sub></code>++Note that `AsyncFunctionBody` has no parameters &mdash; they get added to the `FunctionBody` on the right-hand side.++If we expand this production, we get:++> `AsyncFunctionBody :`+> `FunctionBody_Await`++In other words, async functions have `FunctionBody_Await`, meaning a function body where `await` is treated as a keyword.++On the other hand, if we're inside a non-async function, the relevant production is:++> [<code>FunctionDeclaration<sub>[Yield, Await, Default]</sub>](https://tc39.es/ecma262/#prod-FunctionDeclaration) :</code>+> <code>function BindingIdentifier<sub>[?Yield, ?Await]</sub> ( FormalParameters<sub>[~Yield, ~Await]</sub> ) { FunctionBody<sub>[~Yield, ~Await]</sub> }</code>++(`FunctionDeclaration` has another production, but it's not relevant for our code example.)++To avoid combinatorial expansion, let's ignore the `Default` parameter which is not used in this particular production.++The expanded form of the production is:++> `FunctionDeclaration :`+> `function BindingIdentifier ( FormalParameters ) { FunctionBody }`++> `FunctionDeclaration_Yield :`+> `function BindingIdentifier_Yield ( FormalParameters_Yield ) { FunctionBody }`++> `FunctionDeclaration_Await :`+> `function BindingIdentifier_Await ( FormalParameters_Await ) { FunctionBody }`++> `FunctionDeclaration_Yield_Await :`+> `function BindingIdentifier_Yield_Await ( FormalParameters_Yield_Await ) { FunctionBody }`++In this production we always get `FunctionBody` (without `_Yield` and without `_Await`), since the `FunctionBody` in the non-expanded production is parameterized with `[~Yield, ~Await]`.++Function name and formal parameters are treated differently: they get the parameters `_Await` and `_Yield` if the left-hand side symbol has them.++To summarize: Async functions have a `FunctionBody_Await` and non-async functions have a `FunctionBody` (without `_Await`). Since we're talking about non-generator functions, both our async example function and our non-async example function are parameterized without `_Yield`.++Maybe it's hard to remember which one is `FunctionBody` and which `FunctionBody_Await`. Is `FunctionBody_Await` for a function where `await` is an identifier, or for a function where `await` is a keyword?++You can think of the `_Await` parameter meaning "`await` is a keyword". This approach is also future proof. Imagine a new keyword, `blob` being added, but only inside "blobby" functions. Non-blobby non-async non-generators would still have `FunctionBody` (without `_Await`, `_Yield` or `_Blob`), exactly like they have now. Blobby functions would have a `FunctionBody_Blob`, async blobby functions would have `FunctionBody_Await_Blob` and so on.

Right, I meant that the expanded forms won't change.

marjakh

comment created time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha bd4d4d69d6aaa130bfbe75ce2166b25e86042dc4

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 805bf06273d89498034dcaa12f4a13fdce6804c1

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha ced8a58c0564b62933eb0f23226c0d1e2df08e4a

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 6638e0d7dc85a4df9575a71692c8366cce840c54

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 360e1b08a4e9d0c8f3c1dad59d70b656b4a36966

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 5263278a2a663a2b7124c8b521132da58437f0e7

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha bbe22cdb05c9d0d1add9ca1cd9122f1783f4d83a

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 21b3cfd396aec3edb706aed736cefaacbb39173d

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 95b424cb034d365bde42c9fd9504a89a569bd0e9

Update src/blog/understanding-ecmascript-part-3.md Co-Authored-By: Shu-yu Guo <shu@rfrn.org>

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 0b3e5b20b52a07e73d27a9d5266026ced69d12a3

review

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 708e3d70e01d06b63cc494eee785f30cb2607cbe

review

view details

push time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-03-02 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++ECMAScript source text is a sequence of Unicode code points. Each Unicode code point is an integral value between `U+0000` and `U+10FFFF`. The actual encoding (for example, UTF-8 or UTF-16) is not important &mdash; we assume that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++The spec contains several grammars which we'll briefly describe next.++### Lexical grammar++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how Unicode code points are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++There are several cases where the next token cannot be identified purely by looking at the Unicode code point stream, but we need to know where we are in the syntactic grammar. A classic example is `/`. To know whether it's a division or the start of the RegExp, we need to know which one is allowed in the syntactic context we're currently in.++For example:+```javascript+const x = 10 / 5;+//           ^ this is a DivPunctuator++const r = /foo/;+//        ^ this is the start of a RegularExpressionLiteral+```++A similar thing happens with templates &mdash; the interpretation of <code>}`</code> depends on the context we're in:++```javascript+const what1 = 'temp';+const what2 = 'late';+const t = `I am a ${ what1 + what2 }`;+// `I am a ${ is TemplateHead+// }` is TemplateTail

Took this comments out of the source code, hopefully they're easier to read now.

marjakh

comment created time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-03-02 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++ECMAScript source text is a sequence of Unicode code points. Each Unicode code point is an integral value between `U+0000` and `U+10FFFF`. The actual encoding (for example, UTF-8 or UTF-16) is not important &mdash; we assume that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++The spec contains several grammars which we'll briefly describe next.++### Lexical grammar++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how Unicode code points are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++There are several cases where the next token cannot be identified purely by looking at the Unicode code point stream, but we need to know where we are in the syntactic grammar. A classic example is `/`. To know whether it's a division or the start of the RegExp, we need to know which one is allowed in the syntactic context we're currently in.++For example:+```javascript+const x = 10 / 5;+//           ^ this is a DivPunctuator++const r = /foo/;+//        ^ this is the start of a RegularExpressionLiteral+```++A similar thing happens with templates &mdash; the interpretation of <code>}`</code> depends on the context we're in:++```javascript+const what1 = 'temp';+const what2 = 'late';+const t = `I am a ${ what1 + what2 }`;+// `I am a ${ is TemplateHead+// }` is TemplateTail++if (0 == 1) {+}`not very useful`;+// } is RightBracePunctuator+// ` is the start of a NoSubstitutionTemplate++```++The lexical grammar uses several goal symbols to distinguish between the contexts where some input elements are permitted and some are not. For example, the goal symbol `InputElementDiv` is used in contexts where `/` is a division and `/=` is a division-assignment. The `InputElementDiv` productions list the possible tokens which can be produced in this context:++> [`InputElementDiv ::`](https://tc39.es/ecma262/#prod-InputElementDiv)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `DivPunctuator`+> `RightBracePunctuator`++In this context, encountering `/` will produce the `DivPunctuator` input element. Producing a `RegularExpressionLiteral` is not an option here.++On the other hand, `InputElementRegExp` is the goal symbol for the contexts where `/` is the beginning of a RegExp:++> [`InputElementRegExp ::`](https://tc39.es/ecma262/#prod-InputElementRegExp)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `RightBracePunctuator`+> `RegularExpressionLiteral`++As we see from the productions, it's possible that this produces the `RegularExpressionLiteral` input element, but producing `DivPunctuator` is not possible.++Similarly, there is another goal symbol, `InputElementRegExpOrTemplateTail`, for contexts where `TemplateMiddle` and `TemplateTail` are permitted, in addition to `RegularExpressionLiteral`. And finally, `InputElementTemplateTail` is the goal symbol for contexts where only `TemplateMiddle` and `TemplateTail` are permitted but `RegularExpressionLiteral` is not permitted.++We can imagine the syntactic grammar analyzer ("parser") calling the lexical grammar analyzer ("tokenizer" or "lexer"), passing the goal symbol as a parameter and asking for the next input element suitable for that goal symbol.++### Other grammars++The [RegExp grammar](https://tc39.es/ecma262/#sec-patterns) describes how Unicode code points are translated into regular expressions.++We can imagine the parser asking the tokenizer for the next token in a context where RegExps are allowed. If the tokenizer returns `RegularExpressionLiteral`, we branch into the RegExp grammar for converting the string of the `RegularExpressionLiteral` into a RegExp pattern.++The [numeric string grammar](https://tc39.es/ecma262/#sec-tonumber-applied-to-the-string-type) describes how Strings are translated into numeric values.++The [syntactic grammar](https://tc39.es/ecma262/#sec-syntactic-grammar) describes how syntactically correct programs are composed of tokens.++The notation used for different grammars differs slightly. For example, the syntactic grammar uses `Symbol :` whereas the lexical grammar and the RegExp grammar use `Symbol ::` and the numeric string grammar uses `Symbol :::`.++For the rest of this episode, we'll focus on the syntactic grammar.++## Example: Allowing legacy identifiers++In some contexts, `await` and `yield` are allowed identifiers. Finding out when exactly they are allowed can be a bit involved, so let's dive right in!++Let's have a closer look at allowing `await` as an identifier (`yield` works similarly).++For example, this code works:++```javascript+function my_non_async_function() {+  var await;+  console.log(await);+}+```++However, if we're inside an async function, `await` is treated as a keyword. So this code doesn't work:++```javascript+async function my_async_function() {+  var await; // Syntax error+}+```++### Productions and shorthands++Let's look at how the productions for `VariableStatement` are defined. At the first glance, the grammar can look a bit scary:++> [<code>VariableStatement<sub>[Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-VariableStatement)+> <code>var VariableDeclarationList<sub>[+In, ?Yield, ?Await]</sub>;</code>++What do the subscripts (`[Yield, Await]`) and prefixes (`+` in `+In` and `?` in `?Async`) mean?++The notation is explained in section [Grammar Notation](https://tc39.es/ecma262/#sec-grammar-notation).++The subscripts are a shorthand for expressing a set of productions, for a set of left-hand side symbols, all at once. The left-hand side symbol has two parameters, so the "real" left-hand side symbols we're defining are `VariableStatement`, `VariableStatement_Yield`, `VariableStatement_Await` and `VariableStatement_Yield_Await`.++Note that here the plain `VariableStatement` means "`VariableStatement` without `_Await` and without `_Yield`" and shoud not be confused with <code>VariableStatement<sub>[Yield, Await]</sub></code>.++On the right-hand side of the production, we see the shorthand `+In`, meaning "use the version with `_In`", and `?Await`, meaning "use the version with `_Await` iff the left-hand side symbol has `_Await`" (similarly with `?Yield`).++(The third shorthand, `~Foo`, meaning "use the version without `_Foo`", is not used in this production.)++With this information, we can expand the productions like this:++> `VariableStatement` :+> `var VariableDeclarationList_In;`+>+> `VariableStatement_Yield` :+> `var VariableDeclarationList_In_Yield;`+>+> `VariableStatement_Await` :+> `var VariableDeclarationList_In_Await;`+>+> `VariableStatement_Yield_Await` :+> `var VariableDeclarationList_In_Yield_Await;`++Ultimately, we'll need to find out two things:+1. Where is it decided whether we're in the case with `_Await` or without `_Await`?+1. Where does it make a difference &mdash; where do the productions for `Something_Await` and `Something` (without `_Await`) diverge?++### `_Await` or no `_Await`?++Let's tackle question 1 first. It's somewhat easy to guess that non-async functions and async functions differ in whether we pick the parameter `_Await` for the function body or not. Reading the productions for async function declarations, we find this:++> [`AsyncFunctionBody :`](https://tc39.es/ecma262/#prod-AsyncFunctionBody)+> <code>FunctionBody<sub>[~Yield, +Await]</sub></code>++Note that `AsyncFunctionBody` has no parameters &mdash; they get added to the `FunctionBody` on the right-hand side.++If we expand this production, we get:++> `AsyncFunctionBody :`+> `FunctionBody_Await`++Since `FunctionBody_Await` is used for async functions. It means a function body where `await` is treated as a keyword.++On the other hand, if we're inside a non-async function, the relevant production is:++> [<code>FunctionDeclaration<sub>[Yield, Await, Default]</sub>](https://tc39.es/ecma262/#prod-FunctionDeclaration) :</code>+> <code>function BindingIdentifier<sub>[?Yield, ?Await]</sub> ( FormalParameters<sub>[~Yield, ~Await]</sub> ) { FunctionBody<sub>[~Yield, ~Await]</sub> }</code>++(`FunctionDeclaration` has another production, but it's not relevant for our code example.)++To avoid combinatorial expansion, let's ignore the `Default` parameter which is not used in this particular production.++The expanded form of the production is:++> `FunctionDeclaration :`+> `function BindingIdentifier ( FormalParameters ) { FunctionBody }`++> `FunctionDeclaration_Yield :`+> `function BindingIdentifier_Yield ( FormalParameters_Yield ) { FunctionBody }`++> `FunctionDeclaration_Await :`+> `function BindingIdentifier_Await ( FormalParameters_Await ) { FunctionBody }`++> `FunctionDeclaration_Yield_Await :`+> `function BindingIdentifier_Yield_Await ( FormalParameters_Yield_Await ) { FunctionBody }`++In this production we always get `FunctionBody` (without `_Yield` and without `_Await`), since the `FunctionBody` in the non-expanded production is parameterized with `[~Yield, ~Await]`.++Function name and formal parameters are treated differently: they get the parameters `_Await` and `_Yield` if the left-hand side symbol has them.++To summarize: Async functions have a `FunctionBody_Await` and non-async functions have a `FunctionBody` (without `_Await`). You can think of the `_Await` parameter meaning "`await` is a keyword".++Since we're talking about non-generator functions, both our async example function and our non-async example function are parameterized without `_Yield`.++### Disallowing `await` as an identifier++Next, we need to find out how `await` is disallowed as an identifier if we're inside a `FunctionBody_Await`.++We can follow the productions further to see that the `_Await` parameter gets carried unchanged from `FunctionBody` all the way to the `VariableStatement` production we were previously looking at.++Thus, inside an async function, we'll have a `VariableStatement_Await` and inside a non-async function, we'll have a `VariableStatement`.++We can follow the productions further and keep track of the parameters. We already saw the productions for `VariableStatement`:++> [<code>VariableStatement<sub>[Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-VariableStatement)+> <code>var VariableDeclarationList<sub>[+In, ?Yield, ?Await]</sub>;</code>++All productions for `VariableDeclarationList` just carry the parameters on as is:++> [<code>VariableDeclarationList<sub>[In, Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-VariableDeclarationList)+> <code>VariableDeclaration<sub>[?In, ?Yield, ?Await]</sub></code>++(Here we show only the production relevant to our example.)++> [<code>VariableDeclaration<sub>[In, Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-VariableDeclaration)+> <code>BindingIdentifier<sub>[?Yield, ?Await]</sub> Initializer<sub>[?In, ?Yield, ?Await]</sub> opt</code>++The `opt` shorthand means that the right-hand side symbol is optional; there are in fact two productions, one with the optional symbol, and one without.++In the simple case relevant to our example, `VariableStatement` consists of the keyword `var`, followed by a single `BindingIdentifier` without an initializer, and ending with a semicolon.++To disallow or allow `await` as a `BindingIdentifier`, we hope to end up with something like this:++> `BindingIdentifier_Await :`+> `Identifier`+> `yield`+>+> `BindingIdentifier :`+> `Identifier`+> `yield`+> `await`++This would disallow `await` as an identifier inside async functions and allow it as an identifier inside non-async functions.++But the spec doesn't define it like this, instead we find this production:++> [<code>BindingIdentifier<sub>[Yield, Await]</sub> :</code>](https://tc39.es/ecma262/#prod-BindingIdentifier)+> `Identifier`+> `yield`+> `await`++Expanded, this means the following productions:++> `BindingIdentifier_Await :`+> `Identifier`+> `yield`+> `await`+>+> `BindingIdentifier :`+> `Identifier`+> `yield`+> `await`++(We're omitting the productions for `BindingIdentifier_Yield` and `BindingIdentifier_Yield_Await` which are not needed in our example.)++This looks like `await` and `yield` would be always allowed as identifiers. What's up with that? Is the whole blog post for nothing?++### Statics semantics to the rescue++Turns out **static semantics** are needed for forbidding `await` as an identifier inside async functions.++Static semantics describe static rules &mdash; that is, rules that can be checked before the program is ran.++In this case, the [static semantics for BindingIdentifier](https://tc39.es/ecma262/#sec-identifiers-static-semantics-early-errors) define the following syntax-directed rule:++>`BindingIdentifier : await`+>+> It's a Syntax Error if this production has an <code><sub>[Await]</sub></code> parameter.++Effectively, this forbids the `BindingIdentifier_Await : await` production.++The spec explains that the reason for having this production but defining it as a Syntax Error by the static semantics is because of interference with automatic semicolon insertion. If the production was missing, automatic semicolon insertion might kick in and insert a semicolon into a program which is syntactically incorrect only because it uses `await` or `yield` as an identifier, changing the meaning of the program.

Rewrote this part.

marjakh

comment created time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha ad7f9f5fee3ee32488239dcefcd8ea8d6b7b5f51

review

view details

Marja Hölttä

commit sha f19e43e7c2a2b2b137497c3a19e2ab5e937d112b

minor

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 5896d20bda6cafe4d71d1d2c3c1a9fe521316d84

review

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha 30f1d84e6512d2f88dd9d4012fce320d43ab864a

review

view details

push time in 2 months

pull request commenttc39/ecma262

Editorial: Add subscripts to BindingIdentifier static semantics.

Done!

marjakh

comment created time in 2 months

pull request commenttc39/ecma262

Editorial: Add subscripts to BindingIdentifier static semantics.

I work for Google and I was under the impression we're covered somehow. Isn't that the case? This is my second contribution and I don't remember what i did the first time, whether I registered to something or not. It was a while back. I can try to search my e-mail for history, if needed.

marjakh

comment created time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha a7438b30caa8d82d3f479e2a052548b54993f218

review

view details

push time in 2 months

push eventv8/v8.dev

Marja Hölttä

commit sha c7520522ed41dcba149f375627389970a4a997d2

review

view details

Marja Hölttä

commit sha 747312779395521d996aeec77b579928798f0213

review

view details

push time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-03-02 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined. In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++ECMAScript source text is a sequence of Unicode code points. Each Unicode code point is an integral value between `U+0000` and `U+10FFFF`. The actual encoding (for example, UTF-8 or UTF-16) is not important &mdash; we assume that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++The spec contains several grammars which we'll briefly describe next.++### Lexical grammar++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how Unicode code points are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++There are several cases where the next token cannot be identified purely by looking at the Unicode code point stream, but we need to know where we are in the syntactic grammar. A classic example is `/`. To know whether it's a division or the start of the RegExp, we need to know which one is allowed in the syntactic context we're currently in.++For example:+```javascript+const x = 10 / 5;+//           ^ this is a DivPunctuator++const r = /foo/;+//        ^ this is the start of a RegularExpressionLiteral+```++A similar thing happens with templates &mdash; the interpretation of <code>}`</code> depends on the context we're in:++```javascript+const what1 = 'temp';+const what2 = 'late';+const t = `I am a ${ what1 + what2 }`;+// `I am a ${ is TemplateHead+// }` is TemplateTail++if (0 == 1) {+}`not very useful`;+// } is RightBracePunctuator+// ` is the start of a NoSubstitutionTemplate++```++The lexical grammar uses several goal symbols to distinguish between the contexts where some input elements are permitted and some are not. For example, the goal symbol `InputElementDiv` is used in contexts where `/` is a division and `/=` is a division-assignment. The `InputElementDiv` productions list the possible tokens which can be produced in this context:++> [`InputElementDiv ::`](https://tc39.es/ecma262/#prod-InputElementDiv)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `DivPunctuator`+> `RightBracePunctuator`++In this context, encountering `/` will produce the `DivPunctuator` input element. Producing a `RegularExpressionLiteral` is not an option here.++On the other hand, `InputElementRegExp` is the goal symbol for the contexts where `/` is the beginning of a RegExp:++> [`InputElementRegExp ::`](https://tc39.es/ecma262/#prod-InputElementRegExp)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `RightBracePunctuator`+> `RegularExpressionLiteral`++As we see from the productions, it's possible that this produces the `RegularExpressionLiteral` input element, but producing `DivPunctuator` is not possible.++Similarly, there is another goal symbol, `InputElementRegExpOrTemplateTail`, for contexts where `TemplateMiddle` and `TemplateTail` are permitted, in addition to `RegularExpressionLiteral`. And finally, `InputElementTemplateTail` is the goal symbol for contexts where only `TemplateMiddle` and `TemplateTail` are permitted but `RegularExpressionLiteral` is not permitted.++We can imagine the syntactic grammar analyzer ("parser") calling the lexical grammar analyzer ("tokenizer" or "lexer"), passing the goal symbol as a parameter and asking for the next input element suitable for that goal symbol.++### Other grammars++The [RegExp grammar](https://tc39.es/ecma262/#sec-patterns) describes how Unicode code points are translated into regular expressions.++We can imagine the parser asking the tokenizer for the next token in a context where RegExps are allowed. If the tokenizer returns `RegularExpressionLiteral`, we branch into the RegExp grammar for converting the string of the `RegularExpressionLiteral` into a RegExp pattern.++The [numeric string grammar](https://tc39.es/ecma262/#sec-tonumber-applied-to-the-string-type) describes how Strings are translated into numeric values.++The [syntactic grammar](https://tc39.es/ecma262/#sec-syntactic-grammar) describes how syntactically correct programs are composed of tokens.++The notation used for different grammars differs slightly. For example, the syntactic grammar uses `Symbol :` whereas the lexical grammar and the RegExp grammar use `Symbol ::` and the numeric string grammar uses `Symbol :::`.++For the rest of this episode, we'll focus on the syntactic grammar.++## Example: Allowing legacy identifiers++In some contexts, `await` and `yield` are allowed identifiers. Finding out when exactly they are allowed can be a bit involved, so let's dive right in!

Added explanation about the history here.

Example: Allowing legacy identifiers

Introducing new keywords to the grammar is a possibly breaking change — what if existing code already uses the keywords as identifiers?

For example, before await was a keyword, someone might have written the following code:

function old() { var await; } The ECMAScript grammar carefully added the await keyword in such a way that this code will continue to work. Inside async functions, await is a keyword, so this doesn't work:

async function modern() { var await; // Syntax error } Allowing yield as an identifier in non-generators and disallowing it in generators works similarly; let's focus on the await case for the rest of the post.

Understanding how await is allowed as an identifier requires understanding ECMAScript-specific syntactic grammar notation. Let's dive right in!

marjakh

comment created time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-02-11 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined.  In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.

Offline discussion: I use the word "context", e.g., to say that "/" is division in some contexts and regexp in others. This might be confusing because of the overloaded word "context". I tried to think of a better word but couldn't find any... I'll prob add a note that "context" here doesn't mean the same as "context" in the "context-free grammar" sense.

marjakh

comment created time in 2 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-02-11 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined.  In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++ECMAScript source text is a sequence of Unicode code points. Each Unicode code point is an integral value between `U+0000` and `U+10FFFF`. The actual encoding (for example, UTF-8 or UTF-16) is not important &mdash; we assume that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++The spec contains several grammars which we'll briefly describe next.++### Lexical grammar++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how Unicode code points are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++There are several cases where the next token cannot be identified purely by looking at the Unicode code point stream, but we need to know where we are in the syntactic grammar. A classic example is `/`. To know whether it's a division or the start of the RegExp, we need to know which one is allowed in the syntactic context we're currently in.++For example:+```javascript+const x = 10 / 5;+//           ^ this is a DivPunctuator++const r = /foo/;+//        ^ this is the start of a RegularExpressionLiteral+```++A similar thing happens with templates &mdash; the interpretation of <code>{`</code> depends on the context we're in:++```javascript+const what1 = 'temp';+const what2 = 'late';+const t = `I am a ${ what1 + what2 }`;+// `I am a ${ is TemplateHead+// }` is TemplateTail++if (0 == 1) {+}`not very useful`;+// } is RightBracePunctuator+// ` is the start of a NoSubstitutionTemplate++```++The lexical grammar uses several goal symbols to distinguish between the contexts where some input elements are permitted and some are not. For example, the goal symbol `InputElementDiv` is used in contexts where `/` is a division and `/=` is a division-assignment. The `InputElementDiv` productions list the possible tokens which can be produced in this context:++> [`InputElementDiv ::`](https://tc39.es/ecma262/#prod-InputElementDiv)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `DivPunctuator`+> `RightBracePunctuator`++In this context, encountering `/` will produce the `DivPunctuator` input element. Producing a `RegularExpressionLiteral` is not an option here.++On the other hand, `InputElementRegExp` is the goal symbol for the contexts where `/` is the beginning of a RegExp:++> [`InputElementRegExp ::`](https://tc39.es/ecma262/#prod-InputElementRegExp)+> `WhiteSpace`+> `LineTerminator`+> `Comment`+> `CommonToken`+> `RightBracePunctuator`+> `RegularExpressionLiteral`++As we see from the productions, it's possible that this produces the `RegularExpressionLiteral` input element, but producing `DivPunctuator` is not possible.++Similarly, there is another goal symbol, `InputElementRegExpOrTemplateTail`, for contexts where `TemplateMiddle` and `TemplateTail` are permitted, in addition to `RegularExpressionLiteral`. And finally, `InputElementTemplateTail` is the goal symbol for contexts where only `TemplateMiddle` and `TemplateTail` are permitted but `RegularExpressionLiteral` is not permitted.++We can imagine the syntactic grammar analyzer ("parser") calling the lexical grammar analyzer ("tokenizer" or "lexer"), passing the goal symbol as a parameter and asking for the next input element suitable for that goal symbol.++### Other grammars

I moved them into the introduction chapter at the top. The long explanations of the lexical & syntactic grammars comes after that.

marjakh

comment created time in 3 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-02-11 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined.  In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++ECMAScript source text is a sequence of Unicode code points. Each Unicode code point is an integral value between `U+0000` and `U+10FFFF`. The actual encoding (for example, UTF-8 or UTF-16) is not important &mdash; we assume that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++The spec contains several grammars which we'll briefly describe next.++### Lexical grammar

Added:

Each grammar is defined as a context-free grammar, consisting of a set of production rules.


(The suggestion " For the lexical grammar, these production rules are called goal symbols." is not true; goal symbol is not the a production; each context free grammar has one "main" symbol, such as "Program", which is the goal symbol.)

marjakh

comment created time in 3 months

Pull request review commentv8/v8.dev

Add blog post: Understanding ecmascript part 3

+---+title: 'Understanding the ECMAScript spec, part 3'+author: '[Marja Hölttä](https://twitter.com/marjakh), speculative specification spectator'+avatars:+  - marja-holtta+date: 2020-02-11 13:33:37+tags:+  - ECMAScript+description: 'Tutorial on reading the ECMAScript specification'+tweet: ''+---++... where we dive deep in the syntax!++## Previous episodes++In [part 2](/blog/understanding-ecmascript-part-2), we examined a simple grammar production and how its runtime semantics are defined.  In [the extra content](/blog/extra/understanding-ecmascript-part-2-extra), we also followed a long grammar production chain from `AssignmentExpression` to `MemberExpression`. In this episode, we'll go deeper in the definition of the ECMAScript (or JavaScript) language and its syntax.++If you're not familiar with [context-free grammars](https://en.wikipedia.org/wiki/Context-free_grammar), now it's a good idea to check out the basics, since the spec uses context-free grammars to define the language.++## ECMAScript grammars++ECMAScript source text is a sequence of Unicode code points. Each Unicode code point is an integral value between `U+0000` and `U+10FFFF`. The actual encoding (for example, UTF-8 or UTF-16) is not important &mdash; we assume that the source code has already been converted into a sequence of Unicode code points according to the encoding it was in.++The spec contains several grammars which we'll briefly describe next.++### Lexical grammar++The [lexical grammar](https://tc39.es/ecma262/#sec-ecmascript-language-lexical-grammar) describes how Unicode code points are translated into a sequence of **input elements** (tokens, line terminators, comments, white space).++There are several cases where the next token cannot be identified purely by looking at the Unicode code point stream, but we need to know where we are in the syntactic grammar. A classic example is `/`. To know whether it's a division or the start of the RegExp, we need to know which one is allowed in the syntactic context we're currently in.

Restructured. New text:

It's not possible to tokenize ECMAScript source code in advance, which makes defining the lexical grammar slightly more complicated.

For example, we cannot determine whether / is the division operator or the start of a RegExp without looking at the larger context it occurs:

marjakh

comment created time in 3 months

push eventv8/v8.dev

Marja Hölttä

commit sha 1ab65d12636e180b59957bf254e7a1c01ec5a26d

review

view details

push time in 3 months

more