profile
viewpoint

DmitrySoshnikov/babel-plugin-transform-modern-regexp 83

Babel plugin for modern RegExp features in JavaScript

DmitrySoshnikov/es-laboratory 53

ECMAScript experiments

DmitrySoshnikov/hdl-js 53

Hardware description language (HDL) parser, and Hardware simulator.

DmitrySoshnikov/eva-source 12

Source code for "Essentials of Interpretation" class

DmitrySoshnikov/at-regexp-machine 8

Automata Theory. Building a RegExp machine

DmitrySoshnikov/coding-interview-university 8

A complete computer science study plan to become a software engineer.

DmitrySoshnikov/javascript-algorithms 8

Algorithms and data structures implemented in JavaScript with explanations and links to further readings

DmitrySoshnikov/es6-computed-properties 2

ES6 Computed properties compiled to ES5/ES3

issue commentDmitrySoshnikov/regexp-tree

certain meta characters shouldn't be allowed in char ranges

Is your reference another parser in the AST explorer, or JS itself? and if JS, how do you know the internal structure? My parents think I should learn more about this topic, and if I’m going to help out, I should probably figure out how to make a good pull request.

From: Dmitry Soshnikov notifications@github.com Sent: Monday, January 25, 2021 3:00 PM To: DmitrySoshnikov/regexp-tree regexp-tree@noreply.github.com Cc: Andrew Levine andruo11@gmail.com; Author author@noreply.github.com Subject: Re: [DmitrySoshnikov/regexp-tree] certain meta characters shouldn't be allowed in char ranges (#219)

Thanks for the report, yeah, the /[\w-z]/.test('-') should actually be parsed as a char class containing \w, - and z.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DmitrySoshnikov/regexp-tree/issues/219#issuecomment-767167417 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGBMZIFFFRT3FO2OILXM2TS3XZVTANCNFSM4WPBGL7A . https://github.com/notifications/beacon/ABGBMZODFVHQIWVZUZKIHI3S3XZVTA5CNFSM4WPBGL7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFW5AXOI.gif

andruo11

comment created time in 2 days

startedDmitrySoshnikov/regexp-tree

started time in 2 days

startedDmitrySoshnikov/syntax

started time in 3 days

fork disco0/syntax

Syntactic analysis toolkit, language-agnostic parser generator.

fork in 4 days

issue commentDmitrySoshnikov/regexp-tree

certain meta characters shouldn't be allowed in char ranges

...and ditto for Unicode properties like /\p{P}-z/u

andruo11

comment created time in 5 days

issue openedDmitrySoshnikov/regexp-tree

meta characters shouldn't be allowed in char ranges

When I parse the regex /[\w-z]/ it should throw an error, but instead parses as a regular character range with \w at the beginning. https://astexplorer.net/#/gist/124dd2c7d464e3cf68b532bf8dacae7f/01a86f60e792112367c9395cf1094b5806dcaab1 If I figure out how to fix it I'll let you know!

created time in 5 days

issue commentDmitrySoshnikov/regexp-tree

Parser: (re)allow duplicate group names (e.g. move the check out of the parser)

thanks for the work... And also for the pointer to the proposal, I also commented there.

hg42

comment created time in 5 days

issue commentDmitrySoshnikov/regexp-tree

Escaped hyphen in character class

I think I figured it out! Insert at line 377 if (s === 'u_class' && yytext.slice(1,1) == "-") return 'ESC_CHAR'

andruo11

comment created time in 5 days

issue commentDmitrySoshnikov/regexp-tree

Escaped hyphen in character class

Although, Regexr throws the same error in Unicode mode: https://regexr.com/5kn5o

andruo11

comment created time in 6 days

issue commentDmitrySoshnikov/regexp-tree

Escaped hyphen in character class

I'm kind of a Github newbie and don't know how to make a PR, but it looks like the character class on that line just needs a dash at the end.

andruo11

comment created time in 6 days

issue commentDmitrySoshnikov/regexp-tree

Escaped hyphen in character class

I can't quite decode how to fix it, but the problem's on line 375 of regexp-tree/src/parser/generated/regexp-tree.js

andruo11

comment created time in 6 days

issue openedDmitrySoshnikov/regexp-tree

Escaped hyphen in character class

When the unicode flag is set, escaped dashes in a character class result in an "invalid Unicode sequence" error. See snippet for an example, /a[a-z]/u https://astexplorer.net/#/gist/4ea2b52f0e546af6fb14f9b2f5671c1c/49dafda5429858220f62387740fd4226cdc3dde0

created time in 6 days

issue commentDmitrySoshnikov/regexp-tree

Parser: (re)allow duplicate group names (e.g. move the check out of the parser)

yes, --loose-mode looks good and the way specific options are added. I assume they would merge with the defaults?

hg42

comment created time in 7 days

issue commentDmitrySoshnikov/regexp-tree

Parser: (re)allow duplicate group names (e.g. move the check out of the parser)

thanks for answering...

well "perl" would be a big claim...perl regexps are quite a lot more (I never got on par with all the useful features).

I think, providing some options for such simple cases would be good. Something like option sets would be another nice feature. So, one could gather some options and set them all at once. Some sets could be part of the distribution. I understand, having too much (especially complicated) options could create a maintenance nightmare.

hg42

comment created time in 7 days

startedDmitrySoshnikov/regexp-tree

started time in 10 days

startedDmitrySoshnikov/regexp-tree

started time in 10 days

issue commentDmitrySoshnikov/regexp-tree

Broken optimize with multiple optional whitespace `\s`

On further investigation, this appears to impact a lot of character classes:

/\w?\w?/ for example, as well as \r, \n, \v, etc.

Cherry

comment created time in 10 days

startedDmitrySoshnikov/regexp-tree

started time in 10 days

startedDmitrySoshnikov/hdl-js

started time in 13 days

startedDmitrySoshnikov/regexp-tree

started time in 13 days

issue openedDmitrySoshnikov/regexp-tree

Optimization breaks "match all" in multiline regexps

This regexp is optimized like this:

- /lorem(?:.|\n)*?ipsum/m
+ /lorem[.\n]*?ipsum/m

This breaks the regex:

  • (.|\n) means "any character, including a newline character"
  • [.\n] means "a period character or a newline character"

See https://github.com/sindresorhus/eslint-plugin-unicorn/issues/895.

created time in 15 days

startedDmitrySoshnikov/syntax

started time in 20 days

fork sachinsonu007/mips-parser

MIPS Assembly parser in JavaScript

fork in 20 days

startedDmitrySoshnikov/scheme-on-coffee

started time in 22 days

startedDmitrySoshnikov/es-laboratory

started time in 22 days

startedDmitrySoshnikov/hdl-js

started time in 22 days

startedDmitrySoshnikov/at-regexp-machine

started time in 22 days

startedDmitrySoshnikov/eva-source

started time in 22 days

startedDmitrySoshnikov/lex-js

started time in 22 days

startedDmitrySoshnikov/letter-rdp-source

started time in 22 days

more