profile
viewpoint
Alexander Shtuchkin ashtuchkin Dropbox San Francisco http://shtuchkin.com

ashtuchkin/iconv-lite 2425

Convert character encodings in pure javascript.

ashtuchkin/ec2-fleet 187

A distributed load test framework using Amazon EC2 instances.

ashtuchkin/node-millenium 120

Node.js 1 million HTTP Comet connections test

ashtuchkin/errTo 13

Simple error handling helper for Node.js/CoffeeScript

ashtuchkin/colorgame 7

Simple multiplayer game to learn & test Node.js/Socket.IO

ashtuchkin/node-detective 2

Find all calls to require() no matter how crazily nested using a proper walk of the AST

ashtuchkin/pongout 2

Pong/Breakout hybrid made on HTML5 Gaming Hackathon by 2Niversity

ashtuchkin/Deep-Learning-Papers-Reading-Roadmap 1

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!

ashtuchkin/DeepLearningBook 1

MIT Deep Learning Book in PDF format

issue commentashtuchkin/iconv-lite

Support using Uint8Array as the "bytes" data type (instead of Buffer)

For internal codec I wanted to drop hex and base64 encodings; utf-16 is implemented, so the only thing left is utf-8. We should probably add custom functions to the backend to use perf-optimized algos from Buffer and TextEncoder/Decoder. Feel free to give it a shot. I haven't had free time this week, sorry.

ashtuchkin

comment created time in 3 days

issue commentashtuchkin/vive-diy-position-sensor

Question about OpenVR

This project is much more low-level. It does give you current 3d coordinates with some precision, but it's nowhere near the stability of the original Vive trackers. You'll also need to get these coordinates to your computer somehow (currently the coordinates are calculated on the device itself).

emo10001

comment created time in 3 days

pull request commentashtuchkin/iconv-lite

convert dbcs codec and some tests

Awesome thank you!

gyzerok

comment created time in 11 days

push eventashtuchkin/iconv-lite

Fedor Nezhivoi

commit sha 5d99a923f2bb9352abf80f8aeb850d924a8a1e38

Convert dbcs codec and tests (#256)

view details

push time in 11 days

PR merged ashtuchkin/iconv-lite

Reviewers
convert dbcs codec and some tests

Here is the migration for dbcs. Again I've tried to convert as much test code as I can. However still I fill that it is better for you to step in where iconv is used.

PS: it looked scarier than it appears to be 😄

+673 -628

0 comment

8 changed files

gyzerok

pr closed time in 11 days

Pull request review commentashtuchkin/iconv-lite

convert dbcs codec and some tests

+{+    "bytes": [

nit: maybe we can use a hex string here? Arrays of bytes don't look great when json-formatted..

gyzerok

comment created time in 13 days

Pull request review commentashtuchkin/iconv-lite

convert dbcs codec and some tests

 require("../sbcs-test"); require("../turkish-test"); require("../utf16-test"); require("../utils-test");+require("../shiftjis-test");

Yep I think it's a great idea. Left a couple of nits and we're ready to merge.

gyzerok

comment created time in 13 days

Pull request review commentashtuchkin/iconv-lite

convert dbcs codec and some tests

+"use strict";

nit: maybe merge this file with gen-gbk-test? They look pretty similar. We could name the merged file smth. like gen-gbk-big5-fixtures.js.

gyzerok

comment created time in 13 days

issue commentashtuchkin/iconv-lite

Any posible to encode Arabic Text

@FazilMuhammed was it helpful?

FazilMuhammed

comment created time in 14 days

Pull request review commentashtuchkin/iconv-lite

convert dbcs codec and some tests

 for (let i = 0; i < 0x100; i++) { }  // Class DBCSCodec reads and initializes mapping tables.-function DBCSCodec(codecOptions, iconv) {-    this.encodingName = codecOptions.encodingName;-    if (!codecOptions) throw new Error("DBCS codec is called without the data.");-    if (!codecOptions.table) throw new Error("Encoding '" + this.encodingName + "' has no data.");--    // Load tables.-    const mappingTable = codecOptions.table();--    // Decode tables: MBCS -> Unicode.--    // decodeTables is a trie, encoded as an array of arrays of integers. Internal arrays are trie nodes and all have len = 256.-    // Trie root is decodeTables[0].-    // Values: >=  0 -> unicode character code. can be > 0xFFFF-    //         == UNASSIGNED -> unknown/unassigned sequence.-    //         == GB18030_CODE -> this is the end of a GB18030 4-byte sequence.-    //         <= NODE_START -> index of the next node in our trie to process next byte.-    //         <= SEQ_START  -> index of the start of a character code sequence, in decodeTableSeq.-    this.decodeTables = [];-    this.decodeTables[0] = UNASSIGNED_NODE.slice(0); // Create root node.--    // Sometimes a MBCS char corresponds to a sequence of unicode chars. We store them as arrays of integers here.-    this.decodeTableSeq = [];--    // Actual mapping tables consist of chunks. Use them to fill up decode tables.-    for (let i = 0; i < mappingTable.length; i++) this._addDecodeChunk(mappingTable[i]);--    // Load & create GB18030 tables when needed.-    if (typeof codecOptions.gb18030 === "function") {-        this.gb18030 = codecOptions.gb18030(); // Load GB18030 ranges.--        // Add GB18030 common decode nodes.-        const commonThirdByteNodeIdx = this.decodeTables.length;-        this.decodeTables.push(UNASSIGNED_NODE.slice(0));--        const commonFourthByteNodeIdx = this.decodeTables.length;-        this.decodeTables.push(UNASSIGNED_NODE.slice(0));--        // Fill out the tree-        const firstByteNode = this.decodeTables[0];-        for (let i = 0x81; i <= 0xfe; i++) {-            const secondNodeIdx = NODE_START - firstByteNode[i];-            const secondByteNode = this.decodeTables[secondNodeIdx];-            for (let j = 0x30; j <= 0x39; j++) {-                if (secondByteNode[j] === UNASSIGNED) {-                    secondByteNode[j] = NODE_START - commonThirdByteNodeIdx;-                } else if (secondByteNode[j] > NODE_START) {-                    throw new Error("gb18030 decode tables conflict at byte 2");-                }--                const thirdNodeIdx = NODE_START - secondByteNode[j];-                const thirdByteNode = this.decodeTables[thirdNodeIdx];-                for (let k = 0x81; k <= 0xfe; k++) {-                    const commonFourthNodeIdx = NODE_START - commonFourthByteNodeIdx;-                    if (thirdByteNode[k] === UNASSIGNED) {-                        thirdByteNode[k] = commonFourthNodeIdx;-                    } else if (thirdByteNode[k] === commonFourthNodeIdx) {-                        continue;-                    } else if (thirdByteNode[k] > NODE_START) {-                        throw new Error("gb18030 decode tables conflict at byte 3");+exports._dbcs = class DBCSCodec {+    constructor(codecOptions, iconv) {+        this.encodingName = codecOptions.encodingName;+        if (!codecOptions) throw new Error("DBCS codec is called without the data.");+        if (!codecOptions.table)+            throw new Error("Encoding '" + this.encodingName + "' has no data.");++        // Load tables.+        const mappingTable = codecOptions.table();++        // Decode tables: MBCS -> Unicode.++        // decodeTables is a trie, encoded as an array of arrays of integers. Internal arrays are trie nodes and all have len = 256.+        // Trie root is decodeTables[0].+        // Values: >=  0 -> unicode character code. can be > 0xFFFF+        //         == UNASSIGNED -> unknown/unassigned sequence.+        //         == GB18030_CODE -> this is the end of a GB18030 4-byte sequence.+        //         <= NODE_START -> index of the next node in our trie to process next byte.+        //         <= SEQ_START  -> index of the start of a character code sequence, in decodeTableSeq.+        this.decodeTables = [];+        this.decodeTables[0] = UNASSIGNED_NODE.slice(0); // Create root node.++        // Sometimes a MBCS char corresponds to a sequence of unicode chars. We store them as arrays of integers here.+        this.decodeTableSeq = [];++        // Actual mapping tables consist of chunks. Use them to fill up decode tables.+        for (let i = 0; i < mappingTable.length; i++) this._addDecodeChunk(mappingTable[i]);++        // Load & create GB18030 tables when needed.+        if (typeof codecOptions.gb18030 === "function") {+            this.gb18030 = codecOptions.gb18030(); // Load GB18030 ranges.++            // Add GB18030 common decode nodes.+            const commonThirdByteNodeIdx = this.decodeTables.length;+            this.decodeTables.push(UNASSIGNED_NODE.slice(0));++            const commonFourthByteNodeIdx = this.decodeTables.length;+            this.decodeTables.push(UNASSIGNED_NODE.slice(0));++            // Fill out the tree+            const firstByteNode = this.decodeTables[0];+            for (let i = 0x81; i <= 0xfe; i++) {+                const secondNodeIdx = NODE_START - firstByteNode[i];+                const secondByteNode = this.decodeTables[secondNodeIdx];+                for (let j = 0x30; j <= 0x39; j++) {+                    if (secondByteNode[j] === UNASSIGNED) {+                        secondByteNode[j] = NODE_START - commonThirdByteNodeIdx;+                    } else if (secondByteNode[j] > NODE_START) {+                        throw new Error("gb18030 decode tables conflict at byte 2");                     } -                    const fourthNodeIdx = NODE_START - thirdByteNode[k];-                    const fourthByteNode = this.decodeTables[fourthNodeIdx];-                    for (let l = 0x30; l <= 0x39; l++) {-                        if (fourthByteNode[l] === UNASSIGNED) fourthByteNode[l] = GB18030_CODE;+                    const thirdNodeIdx = NODE_START - secondByteNode[j];+                    const thirdByteNode = this.decodeTables[thirdNodeIdx];+                    for (let k = 0x81; k <= 0xfe; k++) {+                        const commonFourthNodeIdx = NODE_START - commonFourthByteNodeIdx;+                        if (thirdByteNode[k] === UNASSIGNED) {+                            thirdByteNode[k] = commonFourthNodeIdx;+                        } else if (thirdByteNode[k] === commonFourthNodeIdx) {+                            continue;+                        } else if (thirdByteNode[k] > NODE_START) {+                            throw new Error("gb18030 decode tables conflict at byte 3");+                        }++                        const fourthNodeIdx = NODE_START - thirdByteNode[k];+                        const fourthByteNode = this.decodeTables[fourthNodeIdx];+                        for (let l = 0x30; l <= 0x39; l++) {+                            if (fourthByteNode[l] === UNASSIGNED) fourthByteNode[l] = GB18030_CODE;+                        }                     }                 }             }         }-    } -    this.defaultCharUnicode = iconv.defaultCharUnicode;--    // Encode tables: Unicode -> DBCS.--    // `encodeTable` is array mapping from unicode char to encoded char. All its values are integers for performance.-    // Because it can be sparse, it is represented as array of buckets by 256 chars each. Bucket can be null.-    // Values: >=  0 -> it is a normal char. Write the value (if <=256 then 1 byte, if <=65536 then 2 bytes, etc.).-    //         == UNASSIGNED -> no conversion found. Output a default char.-    //         <= SEQ_START  -> it's an index in encodeTableSeq, see below. The character starts a sequence.-    this.encodeTable = [];--    // `encodeTableSeq` is used when a sequence of unicode characters is encoded as a single code. We use a tree of-    // objects where keys correspond to characters in sequence and leafs are the encoded dbcs values. A special DEF_CHAR key-    // means end of sequence (needed when one sequence is a strict subsequence of another).-    // Objects are kept separately from encodeTable to increase performance.-    this.encodeTableSeq = [];--    // Some chars can be decoded, but need not be encoded.-    const skipEncodeChars = {};-    if (codecOptions.encodeSkipVals)-        for (let i = 0; i < codecOptions.encodeSkipVals.length; i++) {-            const val = codecOptions.encodeSkipVals[i];-            if (typeof val === "number") {-                skipEncodeChars[val] = true;-            } else {-                for (let j = val.from; j <= val.to; j++) skipEncodeChars[j] = true;+        this.defaultCharUnicode = iconv.defaultCharUnicode;++        // Encode tables: Unicode -> DBCS.++        // `encodeTable` is array mapping from unicode char to encoded char. All its values are integers for performance.+        // Because it can be sparse, it is represented as array of buckets by 256 chars each. Bucket can be null.+        // Values: >=  0 -> it is a normal char. Write the value (if <=256 then 1 byte, if <=65536 then 2 bytes, etc.).+        //         == UNASSIGNED -> no conversion found. Output a default char.+        //         <= SEQ_START  -> it's an index in encodeTableSeq, see below. The character starts a sequence.+        this.encodeTable = [];++        // `encodeTableSeq` is used when a sequence of unicode characters is encoded as a single code. We use a tree of+        // objects where keys correspond to characters in sequence and leafs are the encoded dbcs values. A special DEF_CHAR key+        // means end of sequence (needed when one sequence is a strict subsequence of another).+        // Objects are kept separately from encodeTable to increase performance.+        this.encodeTableSeq = [];++        // Some chars can be decoded, but need not be encoded.+        const skipEncodeChars = {};+        if (codecOptions.encodeSkipVals)+            for (let i = 0; i < codecOptions.encodeSkipVals.length; i++) {+                const val = codecOptions.encodeSkipVals[i];+                if (typeof val === "number") {+                    skipEncodeChars[val] = true;+                } else {+                    for (let j = val.from; j <= val.to; j++) skipEncodeChars[j] = true;+                }             }-        } -    // Use decode trie to recursively fill out encode tables.-    this._fillEncodeTable(0, 0, skipEncodeChars);+        // Use decode trie to recursively fill out encode tables.+        this._fillEncodeTable(0, 0, skipEncodeChars);++        // Add more encoding pairs when needed.+        if (codecOptions.encodeAdd) {+            for (const uChar in codecOptions.encodeAdd) {+                if (hasOwnProperty.call(codecOptions.encodeAdd, uChar))+                    this._setEncodeChar(uChar.charCodeAt(0), codecOptions.encodeAdd[uChar]);+            }+        } -    // Add more encoding pairs when needed.-    if (codecOptions.encodeAdd) {-        for (const uChar in codecOptions.encodeAdd) {-            if (hasOwnProperty.call(codecOptions.encodeAdd, uChar))-                this._setEncodeChar(uChar.charCodeAt(0), codecOptions.encodeAdd[uChar]);+        this.defCharSB = this.encodeTable[0][iconv.defaultCharSingleByte.charCodeAt(0)];+        if (this.defCharSB === UNASSIGNED) {+            this.defCharSB = this.encodeTable[0]["?"];+        }+        if (this.defCharSB === UNASSIGNED) {+            this.defCharSB = "?".charCodeAt(0);         }     } -    this.defCharSB = this.encodeTable[0][iconv.defaultCharSingleByte.charCodeAt(0)];-    if (this.defCharSB === UNASSIGNED) {-        this.defCharSB = this.encodeTable[0]["?"];+    get decoder() {+        return DBCSDecoder;     }-    if (this.defCharSB === UNASSIGNED) {-        this.defCharSB = "?".charCodeAt(0);++    get encoder() {+        return DBCSEncoder;     }-} -DBCSCodec.prototype.encoder = DBCSEncoder;-DBCSCodec.prototype.decoder = DBCSDecoder;--// Decoder helpers-DBCSCodec.prototype._getDecodeTrieNode = function (addr) {-    const bytes = [];-    for (; addr > 0; addr >>>= 8) bytes.push(addr & 0xff);-    if (bytes.length === 0) bytes.push(0);--    let node = this.decodeTables[0];-    for (let i = bytes.length - 1; i > 0; i--) {-        // Traverse nodes deeper into the trie.-        const val = node[bytes[i]];--        if (val === UNASSIGNED) {-            // Create new node.-            node[bytes[i]] = NODE_START - this.decodeTables.length;-            this.decodeTables.push((node = UNASSIGNED_NODE.slice(0)));-        } else if (val <= NODE_START) {-            // Existing node.-            node = this.decodeTables[NODE_START - val];-        } else {-            const hexAddr = addr.toString(16);-            throw new Error(`Overwrite byte in ${this.encodingName}, addr: ${hexAddr}`);+    _getDecodeTrieNode(addr) {+        const bytes = [];+        for (; addr > 0; addr >>>= 8) bytes.push(addr & 0xff);+        if (bytes.length === 0) bytes.push(0);++        let node = this.decodeTables[0];+        for (let i = bytes.length - 1; i > 0; i--) {+            // Traverse nodes deeper into the trie.+            const val = node[bytes[i]];++            if (val === UNASSIGNED) {+                // Create new node.+                node[bytes[i]] = NODE_START - this.decodeTables.length;+                this.decodeTables.push((node = UNASSIGNED_NODE.slice(0)));+            } else if (val <= NODE_START) {+                // Existing node.+                node = this.decodeTables[NODE_START - val];+            } else {+                const hexAddr = addr.toString(16);+                throw new Error(`Overwrite byte in ${this.encodingName}, addr: ${hexAddr}`);+            }         }+        return node;     }-    return node;-}; -DBCSCodec.prototype._addDecodeChunk = function (chunk) {-    // First element of chunk is the hex mbcs code where we start.-    let curAddr = parseInt(chunk[0], 16);--    // Choose the decoding node where we'll write our chars.-    const writeTable = this._getDecodeTrieNode(curAddr);-    curAddr = curAddr & 0xff;--    // Write all other elements of the chunk to the table.-    for (let k = 1; k < chunk.length; k++) {-        const part = chunk[k];-        if (typeof part === "string") {-            // String, write as-is.-            for (let l = 0; l < part.length; ) {-                const code = part.charCodeAt(l++);-                if (0xd800 <= code && code < 0xdc00) {-                    // Decode surrogate-                    const codeTrail = part.charCodeAt(l++);-                    if (0xdc00 <= codeTrail && codeTrail < 0xe000) {-                        writeTable[curAddr++] =-                            0x10000 + (code - 0xd800) * 0x400 + (codeTrail - 0xdc00);+    _addDecodeChunk(chunk) {+        // First element of chunk is the hex mbcs code where we start.+        let curAddr = parseInt(chunk[0], 16);++        // Choose the decoding node where we'll write our chars.+        const writeTable = this._getDecodeTrieNode(curAddr);+        curAddr = curAddr & 0xff;++        // Write all other elements of the chunk to the table.+        for (let k = 1; k < chunk.length; k++) {+            const part = chunk[k];+            if (typeof part === "string") {+                // String, write as-is.+                for (let l = 0; l < part.length; ) {+                    const code = part.charCodeAt(l++);+                    if (0xd800 <= code && code < 0xdc00) {+                        // Decode surrogate+                        const codeTrail = part.charCodeAt(l++);+                        if (0xdc00 <= codeTrail && codeTrail < 0xe000) {+                            writeTable[curAddr++] =+                                0x10000 + (code - 0xd800) * 0x400 + (codeTrail - 0xdc00);+                        } else {+                            throw new Error(+                                `Incorrect surrogate pair in ${this.encodingName} at chunk ${chunk[0]}`+                            );+                        }+                    } else if (0x0ff0 < code && code <= 0x0fff) {+                        // Character sequence (our own encoding used)+                        const len = 0xfff - code + 2;+                        const seq = [];+                        for (let m = 0; m < len; m++) {+                            // Simple variation: don't support surrogates or subsequences in seq.+                            seq.push(part.charCodeAt(l++));+                        }++                        writeTable[curAddr++] = SEQ_START - this.decodeTableSeq.length;+                        this.decodeTableSeq.push(seq);                     } else {-                        throw new Error(-                            `Incorrect surrogate pair in ${this.encodingName} at chunk ${chunk[0]}`-                        );+                        writeTable[curAddr++] = code; // Basic char                     }-                } else if (0x0ff0 < code && code <= 0x0fff) {-                    // Character sequence (our own encoding used)-                    const len = 0xfff - code + 2;-                    const seq = [];-                    for (let m = 0; m < len; m++) {-                        // Simple variation: don't support surrogates or subsequences in seq.-                        seq.push(part.charCodeAt(l++));-                    }--                    writeTable[curAddr++] = SEQ_START - this.decodeTableSeq.length;-                    this.decodeTableSeq.push(seq);-                } else {-                    writeTable[curAddr++] = code; // Basic char                 }-            }-        } else if (typeof part === "number") {-            // Integer, meaning increasing sequence starting with prev character.-            let charCode = writeTable[curAddr - 1] + 1;-            for (let l = 0; l < part; l++) {-                writeTable[curAddr++] = charCode++;-            }-        } else+            } else if (typeof part === "number") {+                // Integer, meaning increasing sequence starting with prev character.+                let charCode = writeTable[curAddr - 1] + 1;+                for (let l = 0; l < part; l++) {+                    writeTable[curAddr++] = charCode++;+                }+            } else+                throw new Error(+                    `Incorrect type '${typeof part}' given in ${this.encodingName} at chunk ${+                        chunk[0]+                    }`+                );+        }+        if (curAddr > 0xff)             throw new Error(-                `Incorrect type '${typeof part}' given in ${this.encodingName} at chunk ${chunk[0]}`+                `Incorrect chunk in ${this.encodingName} at addr ${chunk[0]}: too long ${curAddr}`             );     }-    if (curAddr > 0xff)-        throw new Error(-            `Incorrect chunk in ${this.encodingName} at addr ${chunk[0]}: too long ${curAddr}`-        );-};--// Encoder helpers-DBCSCodec.prototype._getEncodeBucket = function (uCode) {-    const high = uCode >> 8; // This could be > 0xFF because of astral characters.-    if (this.encodeTable[high] === undefined) this.encodeTable[high] = UNASSIGNED_NODE.slice(0); // Create bucket on demand.-    return this.encodeTable[high];-}; -DBCSCodec.prototype._setEncodeChar = function (uCode, dbcsCode) {-    const bucket = this._getEncodeBucket(uCode);-    const low = uCode & 0xff;-    if (bucket[low] <= SEQ_START) {-        // There's already a sequence, set a single-char subsequence of it.-        this.encodeTableSeq[SEQ_START - bucket[low]][DEF_CHAR] = dbcsCode;-    } else if (bucket[low] === UNASSIGNED) {-        bucket[low] = dbcsCode;+    _getEncodeBucket(uCode) {+        const high = uCode >> 8; // This could be > 0xFF because of astral characters.+        if (this.encodeTable[high] === undefined) this.encodeTable[high] = UNASSIGNED_NODE.slice(0); // Create bucket on demand.+        return this.encodeTable[high];     }-}; -DBCSCodec.prototype._setEncodeSequence = function (seq, dbcsCode) {-    // Get the root of character tree according to first character of the sequence.-    const uCode = seq[0];-    const bucket = this._getEncodeBucket(uCode);-    const low = uCode & 0xff;--    let node;-    if (bucket[low] <= SEQ_START) {-        // There's already a sequence with  - use it.-        node = this.encodeTableSeq[SEQ_START - bucket[low]];-    } else {-        // There was no sequence object - allocate a new one.-        node = {};-        if (bucket[low] !== UNASSIGNED) node[DEF_CHAR] = bucket[low]; // If a char was set before - make it a single-char subsequence.-        bucket[low] = SEQ_START - this.encodeTableSeq.length;-        this.encodeTableSeq.push(node);+    _setEncodeChar(uCode, dbcsCode) {+        const bucket = this._getEncodeBucket(uCode);+        const low = uCode & 0xff;+        if (bucket[low] <= SEQ_START) {+            // There's already a sequence, set a single-char subsequence of it.+            this.encodeTableSeq[SEQ_START - bucket[low]][DEF_CHAR] = dbcsCode;+        } else if (bucket[low] === UNASSIGNED) {+            bucket[low] = dbcsCode;+        }     } -    // Traverse the character tree, allocating new nodes as needed.-    for (let j = 1; j < seq.length - 1; j++) {-        const oldVal = node[uCode];-        if (typeof oldVal === "object") {-            node = oldVal;+    _setEncodeSequence(seq, dbcsCode) {+        // Get the root of character tree according to first character of the sequence.+        const uCode = seq[0];+        const bucket = this._getEncodeBucket(uCode);+        const low = uCode & 0xff;++        let node;+        if (bucket[low] <= SEQ_START) {+            // There's already a sequence with  - use it.+            node = this.encodeTableSeq[SEQ_START - bucket[low]];         } else {-            node = node[uCode] = {};-            if (oldVal !== undefined) node[DEF_CHAR] = oldVal;+            // There was no sequence object - allocate a new one.+            node = {};+            if (bucket[low] !== UNASSIGNED) node[DEF_CHAR] = bucket[low]; // If a char was set before - make it a single-char subsequence.+            bucket[low] = SEQ_START - this.encodeTableSeq.length;+            this.encodeTableSeq.push(node);         }-    } -    // Set the leaf to given dbcsCode.-    const uCode2 = seq[seq.length - 1];-    node[uCode2] = dbcsCode;-};+        // Traverse the character tree, allocating new nodes as needed.+        for (let j = 1; j < seq.length - 1; j++) {+            const oldVal = node[uCode];+            if (typeof oldVal === "object") {+                node = oldVal;+            } else {+                node = node[uCode] = {};+                if (oldVal !== undefined) node[DEF_CHAR] = oldVal;+            }+        } -DBCSCodec.prototype._fillEncodeTable = function (nodeIdx, prefix, skipEncodeChars) {-    const node = this.decodeTables[nodeIdx];-    let hasValues = false;-    const subNodeEmpty = {};-    for (let i = 0; i < 0x100; i++) {-        const uCode = node[i];-        const mbCode = prefix + i;-        if (skipEncodeChars[mbCode]) continue;--        if (uCode >= 0) {-            this._setEncodeChar(uCode, mbCode);-            hasValues = true;-        } else if (uCode <= NODE_START) {-            const subNodeIdx = NODE_START - uCode;-            if (!subNodeEmpty[subNodeIdx]) {-                // Skip empty subtrees (they are too large in gb18030).-                var newPrefix = (mbCode << 8) >>> 0; // NOTE: '>>> 0' keeps 32-bit num positive.-                if (this._fillEncodeTable(subNodeIdx, newPrefix, skipEncodeChars)) {-                    hasValues = true;-                } else {-                    subNodeEmpty[subNodeIdx] = true;+        // Set the leaf to given dbcsCode.+        const uCode2 = seq[seq.length - 1];+        node[uCode2] = dbcsCode;+    }++    _fillEncodeTable(nodeIdx, prefix, skipEncodeChars) {+        const node = this.decodeTables[nodeIdx];+        let hasValues = false;+        const subNodeEmpty = {};+        for (let i = 0; i < 0x100; i++) {+            const uCode = node[i];+            const mbCode = prefix + i;+            if (skipEncodeChars[mbCode]) continue;++            if (uCode >= 0) {+                this._setEncodeChar(uCode, mbCode);+                hasValues = true;+            } else if (uCode <= NODE_START) {+                const subNodeIdx = NODE_START - uCode;+                if (!subNodeEmpty[subNodeIdx]) {+                    // Skip empty subtrees (they are too large in gb18030).+                    var newPrefix = (mbCode << 8) >>> 0; // NOTE: '>>> 0' keeps 32-bit num positive.+                    if (this._fillEncodeTable(subNodeIdx, newPrefix, skipEncodeChars)) {+                        hasValues = true;+                    } else {+                        subNodeEmpty[subNodeIdx] = true;+                    }                 }+            } else if (uCode <= SEQ_START) {+                this._setEncodeSequence(this.decodeTableSeq[SEQ_START - uCode], mbCode);+                hasValues = true;             }-        } else if (uCode <= SEQ_START) {-            this._setEncodeSequence(this.decodeTableSeq[SEQ_START - uCode], mbCode);-            hasValues = true;         }+        return hasValues;     }-    return hasValues; };  // == Encoder ================================================================== -function DBCSEncoder(options, codec) {-    // Encoder state-    this.leadSurrogate = -1;-    this.seqObj = undefined;--    // Static data-    this.encodeTable = codec.encodeTable;-    this.encodeTableSeq = codec.encodeTableSeq;-    this.defaultCharSingleByte = codec.defCharSB;-    this.gb18030 = codec.gb18030;-}+class DBCSEncoder {+    constructor(options, codec, backend) {+        this.backend = backend;+        // Encoder state+        this.leadSurrogate = -1;+        this.seqObj = undefined; -DBCSEncoder.prototype.write = function (str) {-    const newBuf = Buffer.alloc(str.length * (this.gb18030 ? 4 : 3));-    let leadSurrogate = this.leadSurrogate,-        seqObj = this.seqObj,-        nextChar = -1,-        i = 0,-        j = 0;--    for (;;) {-        // 0. Get next character.-        let uCode;-        if (nextChar === -1) {-            if (i === str.length) break;-            uCode = str.charCodeAt(i++);-        } else {-            uCode = nextChar;-            nextChar = -1;-        }+        // Static data+        this.encodeTable = codec.encodeTable;+        this.encodeTableSeq = codec.encodeTableSeq;+        this.defaultCharSingleByte = codec.defCharSB;+        this.gb18030 = codec.gb18030;+    } -        // 1. Handle surrogates.-        if (0xd800 <= uCode && uCode < 0xe000) {-            // Char is one of surrogates.-            if (uCode < 0xdc00) {-                // We've got a lead surrogate.-                if (leadSurrogate === -1) {-                    leadSurrogate = uCode;-                    continue;-                } else {-                    leadSurrogate = uCode;-                    // Double lead surrogate found.-                    uCode = UNASSIGNED;-                }+    write(str) {+        const bytes = this.backend.allocBytes(str.length * (this.gb18030 ? 4 : 3));+        let leadSurrogate = this.leadSurrogate,+            seqObj = this.seqObj,+            nextChar = -1,+            i = 0,+            bytePos = 0;++        for (;;) {+            // 0. Get next character.+            let uCode;+            if (nextChar === -1) {+                if (i === str.length) break;+                uCode = str.charCodeAt(i++);             } else {-                // We've got trail surrogate.-                if (leadSurrogate !== -1) {-                    uCode = 0x10000 + (leadSurrogate - 0xd800) * 0x400 + (uCode - 0xdc00);-                    leadSurrogate = -1;-                } else {-                    // Incomplete surrogate pair - only trail surrogate found.-                    uCode = UNASSIGNED;-                }+                uCode = nextChar;+                nextChar = -1;             }-        } else if (leadSurrogate !== -1) {-            // Incomplete surrogate pair - only lead surrogate found.-            nextChar = uCode;-            uCode = UNASSIGNED; // Write an error, then current char.-            leadSurrogate = -1;-        } -        // 2. Convert uCode character.-        let dbcsCode = UNASSIGNED;-        if (seqObj !== undefined && uCode !== UNASSIGNED) {-            // We are in the middle of the sequence-            let resCode = seqObj[uCode];-            if (typeof resCode === "object") {-                // Sequence continues.-                seqObj = resCode;-                continue;-            } else if (typeof resCode == "number") {-                // Sequence finished. Write it.-                dbcsCode = resCode;-            } else if (resCode === undefined) {-                // Current character is not part of the sequence.--                // Try default character for this sequence-                resCode = seqObj[DEF_CHAR];-                if (resCode !== undefined) {-                    dbcsCode = resCode; // Found. Write it.-                    nextChar = uCode; // Current character will be written too in the next iteration.+            // 1. Handle surrogates.+            if (0xd800 <= uCode && uCode < 0xe000) {+                // Char is one of surrogates.+                if (uCode < 0xdc00) {+                    // We've got a lead surrogate.+                    if (leadSurrogate === -1) {+                        leadSurrogate = uCode;+                        continue;+                    } else {+                        leadSurrogate = uCode;+                        // Double lead surrogate found.+                        uCode = UNASSIGNED;+                    }                 } else {-                    // TODO: What if we have no default? (resCode == undefined)-                    // Then, we should write first char of the sequence as-is and try the rest recursively.-                    // Didn't do it for now because no encoding has this situation yet.-                    // Currently, just skip the sequence and write current char.+                    // We've got trail surrogate.+                    if (leadSurrogate !== -1) {+                        uCode = 0x10000 + (leadSurrogate - 0xd800) * 0x400 + (uCode - 0xdc00);+                        leadSurrogate = -1;+                    } else {+                        // Incomplete surrogate pair - only trail surrogate found.+                        uCode = UNASSIGNED;+                    }                 }-            }-            seqObj = undefined;-        } else if (uCode >= 0) {-            // Regular character-            const subtable = this.encodeTable[uCode >> 8];-            if (subtable !== undefined) dbcsCode = subtable[uCode & 0xff];--            if (dbcsCode <= SEQ_START) {-                // Sequence start-                seqObj = this.encodeTableSeq[SEQ_START - dbcsCode];-                continue;+            } else if (leadSurrogate !== -1) {+                // Incomplete surrogate pair - only lead surrogate found.+                nextChar = uCode;+                uCode = UNASSIGNED; // Write an error, then current char.+                leadSurrogate = -1;             } -            if (dbcsCode === UNASSIGNED && this.gb18030) {-                // Use GB18030 algorithm to find character(s) to write.-                const idx = findIdx(this.gb18030.uChars, uCode);-                if (idx !== -1) {-                    dbcsCode = this.gb18030.gbChars[idx] + (uCode - this.gb18030.uChars[idx]);-                    newBuf[j++] = 0x81 + Math.floor(dbcsCode / 12600);-                    dbcsCode = dbcsCode % 12600;-                    newBuf[j++] = 0x30 + Math.floor(dbcsCode / 1260);-                    dbcsCode = dbcsCode % 1260;-                    newBuf[j++] = 0x81 + Math.floor(dbcsCode / 10);-                    dbcsCode = dbcsCode % 10;-                    newBuf[j++] = 0x30 + dbcsCode;+            // 2. Convert uCode character.+            let dbcsCode = UNASSIGNED;+            if (seqObj !== undefined && uCode !== UNASSIGNED) {+                // We are in the middle of the sequence+                let resCode = seqObj[uCode];+                if (typeof resCode === "object") {+                    // Sequence continues.+                    seqObj = resCode;                     continue;+                } else if (typeof resCode == "number") {+                    // Sequence finished. Write it.+                    dbcsCode = resCode;+                } else if (resCode === undefined) {+                    // Current character is not part of the sequence.++                    // Try default character for this sequence+                    resCode = seqObj[DEF_CHAR];+                    if (resCode !== undefined) {+                        dbcsCode = resCode; // Found. Write it.+                        nextChar = uCode; // Current character will be written too in the next iteration.+                    } else {+                        // TODO: What if we have no default? (resCode == undefined)+                        // Then, we should write first char of the sequence as-is and try the rest recursively.+                        // Didn't do it for now because no encoding has this situation yet.+                        // Currently, just skip the sequence and write current char.+                    }+                }+                seqObj = undefined;+            } else if (uCode >= 0) {+                // Regular character+                const subtable = this.encodeTable[uCode >> 8];+                if (subtable !== undefined) dbcsCode = subtable[uCode & 0xff];++                if (dbcsCode <= SEQ_START) {+                    // Sequence start+                    seqObj = this.encodeTableSeq[SEQ_START - dbcsCode];+                    continue;+                }++                if (dbcsCode === UNASSIGNED && this.gb18030) {+                    // Use GB18030 algorithm to find character(s) to write.+                    const idx = findIdx(this.gb18030.uChars, uCode);+                    if (idx !== -1) {+                        dbcsCode = this.gb18030.gbChars[idx] + (uCode - this.gb18030.uChars[idx]);+                        bytes[bytePos++] = 0x81 + Math.floor(dbcsCode / 12600);+                        dbcsCode = dbcsCode % 12600;+                        bytes[bytePos++] = 0x30 + Math.floor(dbcsCode / 1260);+                        dbcsCode = dbcsCode % 1260;+                        bytes[bytePos++] = 0x81 + Math.floor(dbcsCode / 10);+                        dbcsCode = dbcsCode % 10;+                        bytes[bytePos++] = 0x30 + dbcsCode;+                        continue;+                    }                 }             }-        } -        // 3. Write dbcsCode character.-        if (dbcsCode === UNASSIGNED) {-            dbcsCode = this.defaultCharSingleByte;-        }+            // 3. Write dbcsCode character.+            if (dbcsCode === UNASSIGNED) {+                dbcsCode = this.defaultCharSingleByte;+            } -        if (dbcsCode < 0x100) {-            newBuf[j++] = dbcsCode;-        } else if (dbcsCode < 0x10000) {-            newBuf[j++] = dbcsCode >> 8; // high byte-            newBuf[j++] = dbcsCode & 0xff; // low byte-        } else if (dbcsCode < 0x1000000) {-            newBuf[j++] = dbcsCode >> 16;-            newBuf[j++] = (dbcsCode >> 8) & 0xff;-            newBuf[j++] = dbcsCode & 0xff;-        } else {-            newBuf[j++] = dbcsCode >>> 24;-            newBuf[j++] = (dbcsCode >>> 16) & 0xff;-            newBuf[j++] = (dbcsCode >>> 8) & 0xff;-            newBuf[j++] = dbcsCode & 0xff;+            if (dbcsCode < 0x100) {+                bytes[bytePos++] = dbcsCode;+            } else if (dbcsCode < 0x10000) {+                bytes[bytePos++] = dbcsCode >> 8; // high byte+                bytes[bytePos++] = dbcsCode & 0xff; // low byte+            } else if (dbcsCode < 0x1000000) {+                bytes[bytePos++] = dbcsCode >> 16;+                bytes[bytePos++] = (dbcsCode >> 8) & 0xff;+                bytes[bytePos++] = dbcsCode & 0xff;+            } else {+                bytes[bytePos++] = dbcsCode >>> 24;+                bytes[bytePos++] = (dbcsCode >>> 16) & 0xff;+                bytes[bytePos++] = (dbcsCode >>> 8) & 0xff;+                bytes[bytePos++] = dbcsCode & 0xff;+            }         }-    } -    this.seqObj = seqObj;-    this.leadSurrogate = leadSurrogate;-    return newBuf.slice(0, j);-};--DBCSEncoder.prototype.end = function () {-    if (this.leadSurrogate === -1 && this.seqObj === undefined) {-        return undefined; // All clean. Most often case.+        this.seqObj = seqObj;+        this.leadSurrogate = leadSurrogate;+        return this.backend.bytesToResult(bytes, bytePos);     } -    const newBuf = Buffer.alloc(10);-    let j = 0;+    end() {+        if (this.leadSurrogate === -1 && this.seqObj === undefined) {+            return undefined; // All clean. Most often case.+        } -    if (this.seqObj) {-        // We're in the sequence.-        const dbcsCode = this.seqObj[DEF_CHAR];-        if (dbcsCode !== undefined) {-            // Write beginning of the sequence.-            if (dbcsCode < 0x100) {-                newBuf[j++] = dbcsCode;+        const bytes = this.backend.allocBytes(10);+        let bytePos = 0;++        if (this.seqObj) {+            // We're in the sequence.+            const dbcsCode = this.seqObj[DEF_CHAR];+            if (dbcsCode !== undefined) {+                // Write beginning of the sequence.+                if (dbcsCode < 0x100) {+                    bytes[bytePos++] = dbcsCode;+                } else {+                    bytes[bytePos++] = dbcsCode >> 8; // high byte+                    bytes[bytePos++] = dbcsCode & 0xff; // low byte+                }             } else {-                newBuf[j++] = dbcsCode >> 8; // high byte-                newBuf[j++] = dbcsCode & 0xff; // low byte+                // See todo above.             }-        } else {-            // See todo above.+            this.seqObj = undefined;         }-        this.seqObj = undefined;-    } -    if (this.leadSurrogate !== -1) {-        // Incomplete surrogate pair - only lead surrogate found.-        newBuf[j++] = this.defaultCharSingleByte;-        this.leadSurrogate = -1;-    }+        if (this.leadSurrogate !== -1) {+            // Incomplete surrogate pair - only lead surrogate found.+            bytes[bytePos++] = this.defaultCharSingleByte;+            this.leadSurrogate = -1;+        } -    return newBuf.slice(0, j);-};+        return this.backend.bytesToResult(bytes, bytePos);+    } -// Export for testing-DBCSEncoder.prototype.findIdx = findIdx;+    // Export for testing+    findIdx(table, val) {+        return findIdx(table, val);+    }+}  // == Decoder ================================================================== -function DBCSDecoder(options, codec) {-    // Decoder state-    this.nodeIdx = 0;-    this.prevBytes = [];+class DBCSDecoder {+    constructor(options, codec, backend) {+        this.backend = backend; -    // Static data-    this.decodeTables = codec.decodeTables;-    this.decodeTableSeq = codec.decodeTableSeq;-    this.defaultCharUnicode = codec.defaultCharUnicode;-    this.gb18030 = codec.gb18030;-}+        // Decoder state+        this.nodeIdx = 0;+        this.prevBytes = []; -DBCSDecoder.prototype.write = function (buf) {-    const newBuf = Buffer.alloc(buf.length * 2),-        prevBytes = this.prevBytes,-        prevOffset = this.prevBytes.length;--    let nodeIdx = this.nodeIdx,-        seqStart = -this.prevBytes.length, // idx of the start of current parsed sequence.-        j = 0;--    for (let i = 0; i < buf.length; i++) {-        const curByte = i >= 0 ? buf[i] : prevBytes[i + prevOffset];--        // TODO: Check curByte is number 0 <= < 256--        // Lookup in current trie node.-        let uCode = this.decodeTables[nodeIdx][curByte];--        if (uCode >= 0) {-            // Normal character, just use it.-        } else if (uCode === UNASSIGNED) {-            // Unknown char.-            // TODO: Callback with seq.-            uCode = this.defaultCharUnicode.charCodeAt(0);-            i = seqStart; // Skip one byte ('i' will be incremented by the for loop) and try to parse again.-        } else if (uCode === GB18030_CODE) {-            const b1 = i >= 3 ? buf[i - 3] : prevBytes[i - 3 + prevOffset];-            const b2 = i >= 2 ? buf[i - 2] : prevBytes[i - 2 + prevOffset];-            const b3 = i >= 1 ? buf[i - 1] : prevBytes[i - 1 + prevOffset];-            const ptr =-                (b1 - 0x81) * 12600 + (b2 - 0x30) * 1260 + (b3 - 0x81) * 10 + (curByte - 0x30);-            const idx = findIdx(this.gb18030.gbChars, ptr);-            uCode = this.gb18030.uChars[idx] + ptr - this.gb18030.gbChars[idx];-        } else if (uCode <= NODE_START) {-            // Go to next trie node.-            nodeIdx = NODE_START - uCode;-            continue;-        } else if (uCode <= SEQ_START) {-            // Output a sequence of chars.-            const seq = this.decodeTableSeq[SEQ_START - uCode];-            for (let k = 0; k < seq.length - 1; k++) {-                uCode = seq[k];-                newBuf[j++] = uCode & 0xff;-                newBuf[j++] = uCode >> 8;-            }-            uCode = seq[seq.length - 1];-        } else-            throw new Error(-                `iconv-lite internal error: invalid decoding table value ${uCode} at ${nodeIdx}/${curByte}`-            );+        // Static data+        this.decodeTables = codec.decodeTables;+        this.decodeTableSeq = codec.decodeTableSeq;+        this.defaultCharUnicode = codec.defaultCharUnicode;+        this.gb18030 = codec.gb18030;+    } -        // Write the character to buffer, handling higher planes using surrogate pair.-        if (uCode >= 0x10000) {-            uCode -= 0x10000;-            const uCodeLead = 0xd800 | (uCode >> 10);-            newBuf[j++] = uCodeLead & 0xff;-            newBuf[j++] = uCodeLead >> 8;+    write(buf) {+        const chars = this.backend.allocRawChars(buf.length),+            prevBytes = this.prevBytes,+            prevOffset = this.prevBytes.length;++        let nodeIdx = this.nodeIdx,+            seqStart = -this.prevBytes.length, // idx of the start of current parsed sequence.+            charPos = 0;++        for (let i = 0; i < buf.length; i++) {+            const curByte = i >= 0 ? buf[i] : prevBytes[i + prevOffset];++            // TODO: Check curByte is number 0 <= < 256++            // Lookup in current trie node.+            let uCode = this.decodeTables[nodeIdx][curByte];++            if (uCode >= 0) {+                // Normal character, just use it.+            } else if (uCode === UNASSIGNED) {+                // Unknown char.+                // TODO: Callback with seq.+                uCode = this.defaultCharUnicode.charCodeAt(0);+                i = seqStart; // Skip one byte ('i' will be incremented by the for loop) and try to parse again.+            } else if (uCode === GB18030_CODE) {+                const b1 = i >= 3 ? buf[i - 3] : prevBytes[i - 3 + prevOffset];+                const b2 = i >= 2 ? buf[i - 2] : prevBytes[i - 2 + prevOffset];+                const b3 = i >= 1 ? buf[i - 1] : prevBytes[i - 1 + prevOffset];+                const ptr =+                    (b1 - 0x81) * 12600 + (b2 - 0x30) * 1260 + (b3 - 0x81) * 10 + (curByte - 0x30);+                const idx = findIdx(this.gb18030.gbChars, ptr);+                uCode = this.gb18030.uChars[idx] + ptr - this.gb18030.gbChars[idx];+            } else if (uCode <= NODE_START) {+                // Go to next trie node.+                nodeIdx = NODE_START - uCode;+                continue;+            } else if (uCode <= SEQ_START) {+                // Output a sequence of chars.+                const seq = this.decodeTableSeq[SEQ_START - uCode];+                for (let k = 0; k < seq.length - 1; k++) {+                    uCode = seq[k];+                    chars[charPos++] = uCode;+                }+                uCode = seq[seq.length - 1];+            } else+                throw new Error(+                    `iconv-lite internal error: invalid decoding table value ${uCode} at ${nodeIdx}/${curByte}`+                );++            // Write the character to buffer, handling higher planes using surrogate pair.+            if (uCode >= 0x10000) {+                uCode -= 0x10000;+                const uCodeLead = 0xd800 | (uCode >> 10);+                chars[charPos++] = uCodeLead;++                uCode = 0xdc00 | (uCode & 0x3ff);+            }+            chars[charPos++] = uCode;

This is what I've hoped for - all decoders had to do this byte arithmetic themselves; now they just write 16-bit codes directly.

gyzerok

comment created time in 15 days

Pull request review commentashtuchkin/iconv-lite

convert dbcs codec and some tests

 require("../sbcs-test"); require("../turkish-test"); require("../utf16-test"); require("../utils-test");+require("../shiftjis-test");

I'm wondering if we can add test/big5-test.js here too? For gbk-test.js I assume you skipped it because it reads a file, maybe we can inline it or convert it to a JSON file with a single string, so that it can be require()-d?

gyzerok

comment created time in 15 days

Pull request review commentashtuchkin/iconv-lite

convert dbcs codec and some tests

  var fs = require("fs"),     assert = require("assert"),-    Buffer = require("safer-buffer").Buffer,-    iconv = require("../");+    utils = require("./utils"),+    iconv = utils.requireIconv();  var testString = "中国abc", //unicode contains GBK-code and ascii-    testStringGBKBuffer = Buffer.from([0xd6, 0xd0, 0xb9, 0xfa, 0x61, 0x62, 0x63]);+    testStringGBKBuffer = utils.bytes("d6 d0 b9 fa 61 62 63");  describe("GBK tests", function () {     it("GBK correctly encoded/decoded", function () {         assert.strictEqual(-            iconv.encode(testString, "GBK").toString("binary"),-            testStringGBKBuffer.toString("binary")+            utils.hex(iconv.encode(testString, "GBK")),+            utils.hex(testStringGBKBuffer)         );         assert.strictEqual(iconv.decode(testStringGBKBuffer, "GBK"), testString);     });      it("GB2312 correctly encoded/decoded", function () {         assert.strictEqual(-            iconv.encode(testString, "GB2312").toString("binary"),-            testStringGBKBuffer.toString("binary")+            utils.hex(iconv.encode(testString, "GB2312")),+            utils.hex(testStringGBKBuffer)

yep you're right.

gyzerok

comment created time in 15 days

pull request commentashtuchkin/iconv-lite

convert sbcs codec and some tests

Utf32 or dbcs would be next I guess. Both will require some diving into, so I'd understand if you wouldn't want to spend more time than you planned on it.

On Sat, Jul 18, 2020, 00:00 Fedor Nezhivoi notifications@github.com wrote:

@ashtuchkin https://github.com/ashtuchkin this is great news! I was missing prettier in the workflow very much :)

Which codec do you think would be better for me to take? Probably in my case better == simplier to migrate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/pull/255#issuecomment-660420080, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHNJUSVOPRFMUIRVCBLR4EM5TANCNFSM4O3IWYEQ .

gyzerok

comment created time in 17 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha a1bd8f7c95a854d75637a2845a9de37c441ff77d

Removed dependency on 'iconv' from sbcs-test.js and added it to web suite To do that I've added a generation step and store the data in test/tables/ folder.

view details

push time in 18 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 2be15b038afa435aae0542ff05b73ac60e3bb52a

Added more eslint rules and fixed errors

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha f49c5843ec8687fa14e3e7b788a5b9eee837e93f

(minor) Rename utils.bytesFrom() to bytes() and make it accept hex strings.

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 9bb9d83b5fd51800be309eb334db6b0be4a4d877

Add strict mode everywhere.

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 67f91b98283e72a801a734186eb8b3e8e47609c6

(minor) remove __dirname from require()-s

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 5da0746f46b3fa92d94eeedaedf2a7e53fb49ff6

Apply ESLint to tests

view details

Alexander Shtuchkin

commit sha c16052d9a5974330f24d86bf2efca435749671f6

Apply prettier to tests

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 0d01f158da7d1b319ac359216dd95985df638877

Move generation-specific deps to a separate package.json

view details

push time in 19 days

pull request commentashtuchkin/iconv-lite

convert sbcs codec and some tests

Thank you! Note I've added eslint/prettier integration on the latest master, so be sure to rebase if you decide to convert something else.

gyzerok

comment created time in 19 days

delete branch ashtuchkin/iconv-lite

delete branch : linters

delete time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 141a8dd042c5bc7aa1d92a2105b11e232cfe0fd2

Added ESLint and Prettier.

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Fedor Nezhivoi

commit sha 228af9c51bee45f4221177f276d6cbd4c0dbbfd4

Convert sbcs codec and some tests to use backend (#255)

view details

Alexander Shtuchkin

commit sha 141a8dd042c5bc7aa1d92a2105b11e232cfe0fd2

Added ESLint and Prettier.

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Fedor Nezhivoi

commit sha 228af9c51bee45f4221177f276d6cbd4c0dbbfd4

Convert sbcs codec and some tests to use backend (#255)

view details

push time in 19 days

PR merged ashtuchkin/iconv-lite

convert sbcs codec and some tests

Here is the update for SBCS.

My only problem is - I am not sure how to convert sbcs-test.js because it uses iconv and I am not sure what to do with some usages of Buffer there. Maybe it'd be better for you to take a look at this suite?

Otherwise I think everything is there.

+211 -154

3 comments

5 changed files

gyzerok

pr closed time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha b96bb11686befd9f3451cab589e735fd87c7e8ea

Added ESLint and Prettier.

view details

push time in 19 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha d22aeee0218a709fc25815f717078b3ea225701d

Added ESLint and Prettier.

view details

push time in 19 days

create barnchashtuchkin/iconv-lite

branch : linters

created branch time in 19 days

issue commentashtuchkin/iconv-lite

Any posible to encode Arabic Text

Looks like there's someone reporting success using similar method: https://github.com/song940/node-escpos/issues/136#issuecomment-401030947

The encoding number will be different (see linked PDF file for list), but overall approach looks feasible.

On Thu, Jul 16, 2020, 09:32 Alexander Shtuchkin ashtuchkin@gmail.com wrote:

You probably need to switch encoding of the printer itself. Have you seen https://stackoverflow.com/a/61836924/325300 ? Try sending these bytes and then encoded string. Use "cp864" as encoding for iconv-lite.

On Thu, Jul 16, 2020, 06:46 FazilMuhammed notifications@github.com wrote:

@ashtuchkin https://github.com/ashtuchkin Thanks for your response

I'm a react native developer. I need print Arabic Text in payment slip so with help of this library react Native Bluetooth serial next ( https://github.com/nuttawutmalee/react-native-bluetooth-serial-next) I can print English text easily but I cannot print Arabic text properly. this is my Bluetooth thermal printer( https://www.amazon.com/Portable-Thermal-Printer-Wireless-Bluetooth/dp/B075VKXBJW). Arabic text are static i generated program when I connection between Bluetooth device and my phone success after I send my text and messages to this function

I cannot send message directly printer device after encode then only Its works

first of I import the

var iconv = require("iconv-lite");

to my project.

writePackets = async (id, message, packetSize = 64) => { let toWrite; try { const device = BluetoothSerial.device(id);

toWrite = iconv.encode(message, "win1251");

const writePromises = []; const packetCount = Math.ceil(toWrite.length / packetSize);

for (var i = 0; i < packetCount; i++) { const packet = new Buffer(packetSize); packet.fill(" "); toWrite.copy(packet, 0, i * packetSize, (i + 1) * packetSize); writePromises.push(device.write(packet)); }

await Promise.all(writePromises).then(() => Toast.showShortBottom("Writed packets") ); } catch (e) { Toast.showShortBottom(e.message); }

}; English text is already printing. Cannot print other language mainly Arabic text if you please help me ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/254#issuecomment-659331347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHO2W5YQWBYM6MHAAJDR33LBTANCNFSM4O2SQXHA .

FazilMuhammed

comment created time in 20 days

issue commentashtuchkin/iconv-lite

Any posible to encode Arabic Text

You probably need to switch encoding of the printer itself. Have you seen https://stackoverflow.com/a/61836924/325300 ? Try sending these bytes and then encoded string. Use "cp864" as encoding for iconv-lite.

On Thu, Jul 16, 2020, 06:46 FazilMuhammed notifications@github.com wrote:

@ashtuchkin https://github.com/ashtuchkin Thanks for your response

I'm a react native developer. I need print Arabic Text in payment slip so with help of this library react Native Bluetooth serial next ( https://github.com/nuttawutmalee/react-native-bluetooth-serial-next) I can print English text easily but I cannot print Arabic text properly. this is my Bluetooth thermal printer( https://www.amazon.com/Portable-Thermal-Printer-Wireless-Bluetooth/dp/B075VKXBJW). Arabic text are static i generated program when I connection between Bluetooth device and my phone success after I send my text and messages to this function

I cannot send message directly printer device after encode then only Its works

first of I import the

var iconv = require("iconv-lite");

to my project.

writePackets = async (id, message, packetSize = 64) => { let toWrite; try { const device = BluetoothSerial.device(id);

toWrite = iconv.encode(message, "win1251");

const writePromises = []; const packetCount = Math.ceil(toWrite.length / packetSize);

for (var i = 0; i < packetCount; i++) { const packet = new Buffer(packetSize); packet.fill(" "); toWrite.copy(packet, 0, i * packetSize, (i + 1) * packetSize); writePromises.push(device.write(packet)); }

await Promise.all(writePromises).then(() => Toast.showShortBottom("Writed packets") ); } catch (e) { Toast.showShortBottom(e.message); }

}; English text is already printing. Cannot print other language mainly Arabic text if you please help me ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/254#issuecomment-659331347, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHO2W5YQWBYM6MHAAJDR33LBTANCNFSM4O2SQXHA .

FazilMuhammed

comment created time in 20 days

pull request commentashtuchkin/iconv-lite

convert sbcs codec and some tests

Looks like the master is green now, so could you rebase and I'll merge tomorrow.

gyzerok

comment created time in 21 days

Pull request review commentashtuchkin/iconv-lite

convert sbcs codec and some tests

 "use strict";-var Buffer = require("safer-buffer").Buffer;  // Single-byte codec. Needs a 'chars' string parameter that contains 256 or 128 chars that-// correspond to encoded bytes (if 128 - then lower half is ASCII). --exports._sbcs = SBCSCodec;-function SBCSCodec(codecOptions, iconv) {-    if (!codecOptions)-        throw new Error("SBCS codec is called without the data.")-    -    // Prepare char buffer for decoding.-    if (!codecOptions.chars || (codecOptions.chars.length !== 128 && codecOptions.chars.length !== 256))-        throw new Error("Encoding '"+codecOptions.type+"' has incorrect 'chars' (must be of len 128 or 256)");-    -    if (codecOptions.chars.length === 128) {-        var asciiString = "";-        for (var i = 0; i < 128; i++)-            asciiString += String.fromCharCode(i);-        codecOptions.chars = asciiString + codecOptions.chars;-    }+// correspond to encoded bytes (if 128 - then lower half is ASCII). -    this.decodeBuf = Buffer.from(codecOptions.chars, 'ucs2');-    -    // Encoding buffer.-    var encodeBuf = Buffer.alloc(65536, iconv.defaultCharSingleByte.charCodeAt(0));+exports._sbcs = class SBCSCodec {+    constructor(codecOptions, iconv) {+        if (!codecOptions)+            throw new Error("SBCS codec is called without the data.") -    for (var i = 0; i < codecOptions.chars.length; i++)-        encodeBuf[codecOptions.chars.charCodeAt(i)] = i;+        // Prepare char buffer for decoding.+        if (!codecOptions.chars || (codecOptions.chars.length !== 128 && codecOptions.chars.length !== 256))+            throw new Error("Encoding '"+codecOptions.type+"' has incorrect 'chars' (must be of len 128 or 256)"); -    this.encodeBuf = encodeBuf;-}+        if (codecOptions.chars.length === 128) {+            var asciiString = "";+            for (let i = 0; i < 128; i++)+                asciiString += String.fromCharCode(i);+            codecOptions.chars = asciiString + codecOptions.chars;+        } -SBCSCodec.prototype.encoder = SBCSEncoder;-SBCSCodec.prototype.decoder = SBCSDecoder;+        const decodeBuf = new Uint16Array(codecOptions.chars.length); +        for (let i = 0; i < codecOptions.chars.length; i++)+            decodeBuf[i] = codecOptions.chars.charCodeAt(i); -function SBCSEncoder(options, codec) {-    this.encodeBuf = codec.encodeBuf;-}+        this.decodeBuf = decodeBuf; -SBCSEncoder.prototype.write = function(str) {-    var buf = Buffer.alloc(str.length);-    for (var i = 0; i < str.length; i++)-        buf[i] = this.encodeBuf[str.charCodeAt(i)];-    -    return buf;-}+        // Encoding buffer.+        const encodeBuf = iconv.backend.allocBytes(65536, iconv.defaultCharSingleByte.charCodeAt(0)); -SBCSEncoder.prototype.end = function() {+        for (let i = 0; i < codecOptions.chars.length; i++)+            encodeBuf[codecOptions.chars.charCodeAt(i)] = i;++        this.encodeBuf = encodeBuf;+    }+    get encoder() { return SBCSEncoder; }+    get decoder() { return SBCSDecoder; } } +class SBCSEncoder {+    constructor(opts, codec, backend) {+        this.backend = backend;+        this.encodeBuf = codec.encodeBuf;+    } -function SBCSDecoder(options, codec) {-    this.decodeBuf = codec.decodeBuf;-}+    write(str) {+        const bytes = this.backend.allocBytes(str.length); -SBCSDecoder.prototype.write = function(buf) {-    // Strings are immutable in JS -> we use ucs2 buffer to speed up computations.-    var decodeBuf = this.decodeBuf;-    var newBuf = Buffer.alloc(buf.length*2);-    var idx1 = 0, idx2 = 0;-    for (var i = 0; i < buf.length; i++) {-        idx1 = buf[i]*2; idx2 = i*2;-        newBuf[idx2] = decodeBuf[idx1];-        newBuf[idx2+1] = decodeBuf[idx1+1];+        for (let i = 0; i < str.length; i++)+            bytes[i] = this.encodeBuf[str.charCodeAt(i)];

love how concise that is

gyzerok

comment created time in 21 days

issue commentashtuchkin/iconv-lite

Any posible to encode Arabic Text

Ok, could you answer each of these these questions please:

  1. What operating system do you use? Can you see this string in arabic -> اللغة العربية?
  2. Where are you getting your Arabic text from? Is it generated by your program or some external resource (like a website or a text file)? If you can share it, please copy-paste a short sample of it here or attach a file.
  3. Do you know what encoding is used in that source? Utf8? Win1256? Something else?
  4. Where exactly are you trying to print the Arabic text? Is it a website, console, something else? Please describe.
  5. How are you trying to print Arabic text there? Please copy-paste the code you use.
  6. What have you tried to do already? Did you get to the point where you see arabic characters or not?
FazilMuhammed

comment created time in 21 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 9aa082fa7f7fb459b579fb44f859d416a1159019

Implement UTF-16LE encoding, update tests, adjust codec interface Three major reasons for reimplementing UTF-16 and not use native codec: 1. We want to remove StringDecoder & Buffer references due to #235. 2. StringDecoder is inconsistent with handling surrogates on Node v6-9 3. NPM module string_decoder gives strange results when processing chunks - it sometimes prepends '\u0000', likely due to a bug. Performance was and is a major concern here. Decoder shouldn't be affected because it uses backend methods directly. Encoder is affected due to introducing character-level loop. It's still very fast (~450Mb/s), so I'm not too worried. If needed, we can make it about 4x faster in Node.js by introducing a dedicated backend method. Browser speeds will be the same.

view details

push time in 21 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 84ee65055124659c9f26a906eadd575e71f01b8a

Implement UTF-16LE encoding, update tests, adjust codec interface Three major reasons for reimplementing UTF-16 and not use native codec: 1. We want to remove StringDecoder & Buffer references due to #235. 2. StringDecoder is inconsistent with handling surrogates on Node v6-9 3. NPM module string_decoder gives strange results when processing chunks - it sometimes prepends '\u0000', likely due to a bug. Performance was and is a major concern here. Decoder shouldn't be affected because it uses backend methods directly. Encoder is affected due to introducing character-level loop. It's still very fast (~450Mb/s), so I'm not too worried. If needed, we can make it about 4x faster in Node.js by introducing a dedicated backend method. Browser speeds will be the same.

view details

push time in 21 days

issue commentashtuchkin/iconv-lite

Any posible to encode Arabic Text

There are several Arabic encodings supported, what exactly do you need?

FazilMuhammed

comment created time in 21 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha d634166c3a792d9a735669852496e43ea761b4bf

Implement UTF-16LE encoding, update tests, adjust codec interface Two major reasons for reimplementing UTF-16 and not use native codec: 1. We want to remove StringDecoder & Buffer references due to #235. 2. Npm-based StringDecoder gives strange results when processing chunks - it sometimes prepends '\u0000', likely due to a bug. Performance was and is a major concern here. Decoder shouldn't be affected because it uses backend methods directly. Encoder is affected due to introducing character-level loop. It's still very fast (~450Mb/s), so I'm not too worried. If needed, we can make it about 4x faster in Node.js by introducing a dedicated backend method. Browser speeds will be the same.

view details

push time in 21 days

issue openedashtuchkin/iconv-lite

Mechanism to add encodings from external npm packages

Some encodings are very rare (e.g. utf7 and iso-2022-jp #60), so it's always a hard decision to include them in iconv-lite, as they add space and memory requirements. To make this choice simpler, it would be nice to add an extension mechanism that would make it easy to add them as separate npm packages.

Something like this:

const iconv = require("iconv-lite");
iconv.addEncoding(require("encoding-iso-2022-jp"));

What we'll need:

  • [ ] Solidify & publish codec interface.
  • [ ] Solidify & publish backend interface.
  • [ ] Add addEncoding() function. It will by default check that the new package does not override any existing encodings/aliases. This behavior can be disabled by passing {replace: true} as a second arg. Function should be idempotent.
  • [ ] Extract utf7 as an example.
  • [ ] Create a test harness that would help authors make sure their codecs work on the whole range of supported environments.

Question: would it be better to use immutable-like interface iconv = iconv.withEncoding(require('...'));? It would help with testability, but overall I think it's not worth it, as users will have to either do it in every file that needs iconv, or pass it around in some kind of a global.

created time in 22 days

issue commentashtuchkin/iconv-lite

Support using Uint8Array as the "bytes" data type (instead of Buffer)

It's been a long time coming :)

Alexander Shtuchkin

On Tue, Jul 14, 2020 at 5:47 PM Fedor Nezhivoi notifications@github.com wrote:

Awesome! Love the ES6 vibe 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/235#issuecomment-658430269, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHMZGCKFXXJCMCU3S4DR3TG7BANCNFSM4NPAMD5Q .

ashtuchkin

comment created time in 22 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha e5678496c0637df5591a1b097add9c6770854d3e

Introduce the concept of backends * Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.

view details

push time in 22 days

delete branch ashtuchkin/safer-buffer

delete branch : patch-1

delete time in 22 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 50c4f14b91096148b120dd8936449c102fc645bf

Introduce the concept of backends * Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.

view details

push time in 22 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha c7956deb31aaa4ff2c200f0cddd484c5fa7b1bf6

Introduce the concept of backends * Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.

view details

push time in 22 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha abe46baf8fdff94bb380fbf8482eef1aae1b0e1a

Introduce the concept of backends * Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.

view details

push time in 22 days

issue commentashtuchkin/iconv-lite

Support using ArrayBuffer as the "bytes" data type (instead of Buffer)

@gyzerok I've created "backends" concept, see 3ea654675d696338fe9ea56be13122f49a8aac07. This will allow you to help with converting other codecs.

ashtuchkin

comment created time in 22 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 3ea654675d696338fe9ea56be13122f49a8aac07

Introduce the concept of backends * Add two backends: node & web * Convert core lib files to use the backends (and not use Buffer) * Convert utf16 codec as an example * Add testing for both node side and webpack * Bump Node.js minimal supported version to 4.5.0 and modernize some existing code. This will allow us to get rid of safer-buffer, our only dependency.

view details

push time in 22 days

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 9627ecf3dd35d72f0a764fcc31f083ffbcb044b1

Fix webpack-test

view details

push time in 23 days

PR opened ChALkeR/safer-buffer

Adjust initial Node 5.x version to support new Buffer APIs

See https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V5.md#5.10.0

+1 -1

0 comment

1 changed file

pr created time in 23 days

push eventashtuchkin/safer-buffer

Alexander Shtuchkin

commit sha 1b123eef5890b05605edd9395965ce1b2244ae46

Adjust initial Node 5.x version to support new Buffer APIs See https://github.com/nodejs/node/blob/master/doc/changelogs/CHANGELOG_V5.md#5.10.0

view details

push time in 23 days

fork ashtuchkin/safer-buffer

Modern Buffer API polyfill without footguns

fork in 23 days

issue commentashtuchkin/iconv-lite

Support using ArrayBuffer as the "bytes" data type (instead of Buffer)

Sounds good. I'll create the foundation.

Alexander Shtuchkin

On Sat, Jul 11, 2020 at 9:16 PM Fedor Nezhivoi notifications@github.com wrote:

Overall your plan sounds very solid. I think we can go with it and adjust along the way if need be.

Ideally I would chunk it into smaller pieces which could be done independently. At least for me it makes it easier to move forward when I can complete something within 2-4 hours timeframe when I have some free time.

Would it be possible for you to draft this smaller pieces? At least some initial ones. For example I can look at the implementation of the BufferAPI you've proposed in the browser.

First, let's clearly define a success criteria for this project.

Your criteria sounds exactly right, let's go with it 👍

Second, to make webpack/browserify not include Buffer shim, we need to be careful about dependencies and dependency injections.

Honestly I don't know how well browser entry point is supported by the tooling. In any case consumers will be able to import the version they need directly.

We also need a "concat" operation on the Buffer/Uint8Array type in several places

Hm, I've thought you removed Buffer.concat usage all together.

The only hard part here would be streaming utf8, but we can use StringDecoder implementation like you proposed.

Actually I completely forgot to mention that TextDecoder supports streaming, so technically we might not even need StringDecoder.

Also I've discovered that Node has support for TextDecoder https://nodejs.org/api/util.html#util_class_util_textdecoder and TextEncoder https://nodejs.org/api/util.html#util_class_util_textencoder APIs from version 8.3.

This in practice means that we get utf8 encode/decode for free in both environments. Of course good for you to double-check since I am no pro in encoding business :) Is

Also TextDecoder in both environments does seem to support a lot of other encodings. I am wondering if it can be used to our benefit. I would at least hypothesize that it is implemented in C and thus can give performance improvements over JavaScript implementation. Anyway this a bit off-topic to the current goal.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/235#issuecomment-657158328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHMTPWQCPKM2HDKEYDLR3EFGFANCNFSM4NPAMD5Q .

ashtuchkin

comment created time in 24 days

issue commentmicrosoft/vscode

File with Chinese chars appears as changed right after opening

TextDecoder is definitely not stateless afaik. It assumes decode() is being called on successive chunks.

mkvoya

comment created time in 24 days

issue commentmicrosoft/vscode

File with Chinese chars appears as changed right after opening

Is this utf8 decoding? The problem could be that a utf byte sequence that corresponds to a single character is split into 2 chunks, so naive decoding of chunks doesn't work. iconv-lite uses StringDecoder class to handle situations like this.

On Sat, Jul 11, 2020, 11:29 Benjamin Pasero notifications@github.com wrote:

Thanks, I can reproduce with the following steps:

[image: image] https://user-images.githubusercontent.com/900690/87227509-cde67080-c39b-11ea-82cb-6ef1a952d321.png

This is a regression from d3c4b4b https://github.com/microsoft/vscode/commit/d3c4b4b4a0683af01748942a004871ae99ef69a5 where I am no longer decoding via iconv-lite when the encoding is UTF-8.

@gyzerok https://github.com/gyzerok @ashtuchkin https://github.com/ashtuchkin I might need your help here. Is there anything iconv-lite does for UTF-8 that would explain it? My naive assumption would be that for UTF-8, iconv-lite is not decoding anything and simply returns the buffer that was given, but stepping through the code with a debugger, I am seeing a few lines where I am not sure if they could be an explanation. For example:

// Returns all complete UTF-8 characters in a Buffer. If the Buffer ended on a// partial character, the character's bytes are buffered until the required// number of bytes are available.function utf8Text(buf, i) { var total = utf8CheckIncomplete(this, buf, i); if (!this.lastNeed) return buf.toString('utf8', i); this.lastTotal = total; var end = buf.length - (total - this.lastNeed); buf.copy(this.lastChar, 0, end); return buf.toString('utf8', i, end);}

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/vscode/issues/102202#issuecomment-657081151, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHNLBBKICXQFNUONOIDR3CAMNANCNFSM4OXHIWWA .

mkvoya

comment created time in 25 days

issue commentashtuchkin/iconv-lite

Support using ArrayBuffer as the "bytes" data type (instead of Buffer)

Thank you for the kind words! and I'm happy you'd like to move forward with this, lets do it :)

The plan overall looks reasonable. Here's a couple of thoughts I had for this migration:

First, let's clearly define a success criteria for this project. For me it's the following: 1. Make iconv-lite work in the browser without Buffer shims (use Uint8Array as the 'binary data' type). 2. Make all tests work in the browser and add a Travis CI integration for it. 3. No regression wrt Node.js environments. Is it what you're also thinking? Did I miss anything?

Second, to make webpack/browserify not include Buffer shim, we need to be careful about dependencies and dependency injections. I see it something like this:

// lib/node-index.js   // node.js entrypoint; set as "main" field in package.json
var nodeBufferAPI = require("./node-api")
module.exports = require("./index-common.js")(nodeBufferAPI)

// lib/web-index.js   // browser entrypoint; set as "browser" field in package.json
var webBufferAPI = require("./web-api")
module.exports = require("./index-common.js")(webBufferAPI)

// lib/index-common.js 
module.exports = function(bufferAPI) {
    // initialize iconv-lite like it's happening today
    return iconv;
}

Third, what's this bufferAPI? In the code we have the following cases:

  1. decode input (bytes) processing - this already supports both Buffer and Uint8Array after our recent commits.
  2. decode output (string) creation - currently this mostly works by allocating a Buffer/Uint8Array, filling it, then slice() it and decode as UCS2 (specifically buf.toString('ucs2'). So we need alloc_bytes and decode_ucs2 operations.
  3. encode inputs (strings) processing - this is currently handled mostly by converting strings to buffers in UCS2 encoding and then working on them (specifically Buffer.from(str, 'ucs2'). To support that we need a encode_ucs2 operation that would convert a string to a Buffer/Uint8Array. We will then need to make sure the codecs are not using any Buffer-specific stuff on this binary data.
  4. encode output (bytes) creation - here we just need alloc_bytes operation that would return Buffer/Uint8Array of corresponding size. The codecs will then fill it byte-by-byte, .slice() it at the end and return. We'll need to make sure there's no Buffer-specific operations.

We also need a "concat" operation on the Buffer/Uint8Array type in several places.

So this leaves us with the following interface (I think this is the simplest we can do):

// Pseudocode
Bytes = Buffer | Uint8Array   

interface BufferAPI {
    alloc_bytes(size: int): Bytes
    concat_bytes(bufs: Bytes[]): Bytes
    encode_ucs2(str: string): Bytes
    decode_ucs2(b: Bytes): string
}

Now this will probably be enough for iconv-lite codecs, but we also need to think about internal codecs that use Buffer encodings and StringDecoder. They are:

  • utf8 - this needs to be very fast. We can either reimplement it (see CESU codec for similar code), or can introduce encode_utf8 and decode_utf8 to the BufferAPI above. In the browser they can be implemented with TextEncoder/TextDecoder.
  • utf16 - this is handled by encode_ucs2 and decode_ucs2.
  • binary - this is just alias to ASCII
  • base64 and hex - these I think we could remove, given #231. Unfortunately 'utf7' requires 'base64', so we'll have to create a js-only implementation or use an external module. Alternatively, we can add encode_base64 and decode_base64 to BufferAPI, which is a bit sad. For browsers we can use atob and btoa there.

The only hard part here would be streaming utf8, but we can use StringDecoder implementation like you proposed.

As for the tests, I think it would make sense to use the same BufferAPI interface to create corresponding data, convert and check it. A lot of tests use 'hex' encoding to compare bytes, so it might make sense to create test-only helpers to convert between hex and bytes.

One other direction I'm thinking is that (probably after this project), we can make "string" type also switchable to something else (e.g. see #242), plus make string processing more efficient by using Uint16Array-s. This would require more changes to BufferAPI (add RawChars = Uint16Array, alloc_raw_chars(), concat_raw_chars() and make encode_ucs2/decode_ucs2 work with RawChars instead of Bytes).

Does it make any sense? What do you think?

ashtuchkin

comment created time in a month

issue commentmicrosoft/iconv-lite-umd

Tests from iconv-lite are not passing

No objections from me, I think that might be the best we can do until I support UMD natively.

Alexander Shtuchkin

On Wed, Jul 8, 2020 at 11:25 AM Benjamin Pasero notifications@github.com wrote:

@ashtuchkin https://github.com/ashtuchkin if there are no objections, I wonder if we should simply copy the tests folder from iconv-lite over into this repo to be able to run them? Let me know if you have a better idea...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/microsoft/iconv-lite-umd/issues/7#issuecomment-655588576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHPDLWD6DGA6UI2PMOTR2SFVDANCNFSM4OUHVAQA .

bpasero

comment created time in a month

create barnchashtuchkin/iconv-lite

branch : types

created branch time in a month

created tagashtuchkin/iconv-lite

tagv0.6.2

Convert character encodings in pure javascript.

created time in a month

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha efbad0a92edf1b09c111278abb104d935c6c0482

Release 0.6.2: Actually support Uint8Array decoding

view details

push time in a month

issue closedashtuchkin/iconv-lite

Decoding cp862 produces reversed string

Hi,

I'm decoding text file encoded with cp862/ibm862 (hebrew codepage) and encoding it back to utf8. For some reason the utf8 strings are reversed. I can fix it using esrever library with its reverse method but it feels not right :/

Any idea why this might be happening ?

Thank you

closed time in a month

igorthy

issue commentashtuchkin/iconv-lite

Decoding cp862 produces reversed string

Ok, let's check what we should expect.

// read source file in binary mode; it's rather big: 356 kb.
var buf = fs.readFileSync("pricelist.txt");  

// Extract one line as a string in binary encoding; let's take the 5th one as an example, to avoid headers, etc.
var lines = buf.toString("binary").split("\r\n");
var line = lines[4];
console.log(line);  // we see some numbers separated by lots of spaces, then weird symbols. 

// Let's get rid of the spaces and extract columns
var columns = line.split(/ +/).filter(Boolean);
console.log(columns);  // => ['0', '2', '7', '0', '0', '0.00', '0.00', '€',  'š
ƒ
’']   // first 7 columns are numbers, then some space-separated characters.

// Try to decode the characters manually using https://en.wikipedia.org/wiki/Code_page_862
console.log(Buffer.from(columns.slice(7).join(' '), "binary"));
// => <Buffer 8f 81 80 20 9a 85 83 85 81 92>

// Note, we have 3 character word, then space, then 5 character word.
var decodedChars = [
 /* 8f */ 'ן',
 /* 81 */ 'ב',
 /* 80 */ 'א',
 /* 20 */ ' ',
 /* 9a */ 'ת',
 /* 85 */ 'ו',
 /* 83 */ 'ד',
 /* 85 */ 'ו',
 /* 81 */ 'ב',
 /* 92 */ 'ע'
];

// Now try to join them into a single string:
var decodedStr = decodedChars.join('');
console.log(decodedStr); // => 'ןבא תודובע'   // Note this is printed in reverse by the browser (try to copy-paste it somewhere and move text cursor using left and right arrows)

// Decoded string above is printed in reverse in the browser and console, although the order of characters is still the same:
console.log(decodedStr[0]);  // => 'ן'
console.log(decodedStr[1]);  // => 'ב'
...

Okay, so what iconv-lite is producing?

var buf = fs.readFileSync("pricelist.txt");
var lines = iconv.decode(buf, "cp862").split('\r\n').replace("  ", " ");
console.log(lines[4].replace(/ +/g, " ")) // => ' 0 2 7 0 0 0.00 0.00 ןבא תודובע'

Looks exactly the same as what we have manually decoded above.

So, ultimately the reversion happens when you output the strings either to the console or to the browser. See https://en.wikipedia.org/wiki/Bidirectional_text for more info.

Let me know if you have any more questions!

igorthy

comment created time in a month

issue commentashtuchkin/iconv-lite

Decoding cp862 produces reversed string

Nice, thank you! Could you also post the code you're using, what it prints and what you expect it to print? Or, ideally, an assert that fails.

To be clear, there's very little chance the reversal happens inside iconv-lite - there's just no code inside to do it. It's very likely the reversal happens either before or after.

igorthy

comment created time in a month

issue commentashtuchkin/iconv-lite

Decoding cp862 produces reversed string

Hard to say; the codec definitely doesn't do any reversals. Care to post a code example? Maybe the strings are shown in reverse in your console/editor/IDE due to an RTL character? I don't know much about it though.

igorthy

comment created time in a month

pull request commentashtuchkin/iconv-lite

add some EBCDIC encodings

Thanks for the research @RovoMe! Any specific action items you would like to add here, or is it mostly additional info?

I always try to generate the encodings directly from authoritative sources, e.g. see in https://github.com/ashtuchkin/iconv-lite/blob/master/generation/gen-dbcs.js we download corresponding tables from unicode.org or encoding.spec.whatwg.org.

To support EBCDIC, ideally I'd want something like gen-ebcdic.js that downloads the tables from unicode.org and transforms it to iconv-lite format. Java sources are not work great for that purpose, unfortunately.

Also I think the NL concern by @devin122 is valid (see https://en.wikipedia.org/wiki/Newline#Representation). We might want to address it by 1) encoding/decoding without changes by default, this would keep 1:1 representation of all latin1 characters, but then 2) add a codec option like EBCDICNLConversion: '\n', which would enable conversion of NL char to corresponding char(s). This conversion can probably be a separate PR.

Finally, FYI, we do work on integrating iconv-lite into VS Code, but it hasn't happened yet.

Mithgol

comment created time in a month

issue commentashtuchkin/iconv-lite

Re-check internal codec's "trivial" encoder on surrogates

in streams-test.js

    it("Encoding using internal modules: utf8 with surrogates in separate chunks", checkEncodeStream({
        encoding: "utf8",
        input: ["\uD83D", "\uDE3B"],
        output: "f09f98bb",
    }));
ashtuchkin

comment created time in a month

issue closedashtuchkin/iconv-lite

Remove Buffer.concat() from codebase

They don't allow using Uint8Array directly.

Specifically look into dbcs codec.

closed time in a month

ashtuchkin

issue commentashtuchkin/iconv-lite

Remove Buffer.concat() from codebase

Fixed in 21004dd4c642b76575449d6bdd4de72a834b965b

ashtuchkin

comment created time in a month

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 21004dd4c642b76575449d6bdd4de72a834b965b

Ensure all decoders support Uint8Array-s directly

view details

push time in a month

issue closedashtuchkin/iconv-lite

Question about decoding and encoding in single step

Hi,

I use streaming support within library like this:

fs.createReadStream(file)
.pipe(iconv.decodeStream('win1255'))
.pipe(iconv.encodeStream('utf8'))
...

Is it possible to use single pipe? (I know I can use through module to create such transform stream but it is still 2 separate streams behind it) Something like this:

fs.createReadStream(file)
.pipe(iconv.decodeEncodeStream('win1255', 'utf8'))
...

Thanks

closed time in a month

igorthy

issue commentashtuchkin/iconv-lite

Stumped on error with requiring ../encodings

@octalmage there are several different problems discussed in this issue, could you provide more info about your setup to help me understand what's happening? Thanks!

leanderlee

comment created time in a month

issue openedashtuchkin/iconv-lite

Remove Buffer.concat() from codebase

They don't allow using Uint8Array directly.

Specifically look into dbcs codec.

created time in a month

issue commentashtuchkin/iconv-lite

Cannot find module 'iconv-lite' or its corresponding type declarations

Can you try importing it like this:

import * as iconv from 'iconv-lite'
dogtopus

comment created time in a month

issue commentashtuchkin/iconv-lite

Not spec conform dependency versioning in package.json

just published v0.6.1 with this fix.

priv-kweihmann

comment created time in a month

pull request commentashtuchkin/iconv-lite

ensure support for Uint8Array

ok, it's published as 0.6.1

gyzerok

comment created time in a month

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 724829e8fc39525fbeded0f837da53c13de179ae

Release 0.6.1: Support Uint8Array when decoding.

view details

push time in a month

created tagashtuchkin/iconv-lite

tagv0.6.1

Convert character encodings in pure javascript.

created time in a month

pull request commentashtuchkin/iconv-lite

ensure support for Uint8Array

Yep, will publish in a couple mins

gyzerok

comment created time in a month

push eventashtuchkin/iconv-lite

Fedor Nezhivoi

commit sha dd72d9d5238f84c104d5ee4f93748365d308e60b

Support Uint8Array-s instead of Buffers when decoding (#246)

view details

push time in a month

PR merged ashtuchkin/iconv-lite

ensure support for Uint8Array

I went ahead and created this PR for https://github.com/microsoft/vscode/issues/79275#issuecomment-647224220 so you don't need to spend your time if you agree with the idea.

No pressure though, feel free to close it if you disagree 😄

+53 -6

2 comments

6 changed files

gyzerok

pr closed time in a month

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 3331bbc3ba02e15d11935326dc320d15ee5add43

Fix minor issue in UTF-32 decoder. In streaming mode, if the first chunk is < 32 bytes, but there are more chunks written with total size of the stream > 32 bytes, then we were losing initial chunk. This looks like an unlikely scenario, so not a major issue. Note, this is happening only in UTF-32 decoder, not UTF-32LE or BE, as the problem was in the code that detects encoding.

view details

push time in a month

Pull request review commentashtuchkin/iconv-lite

ensure support for Uint8Array

 if (!StringDecoder.prototype.end) // Node v0.8 doesn't have this method.   function InternalDecoder(options, codec) {-    StringDecoder.call(this, codec.enc);+    this.decoder = new StringDecoder(codec.enc); } -InternalDecoder.prototype = StringDecoder.prototype;+InternalDecoder.prototype.write = function(buf) {+    if (Buffer.isBuffer(buf)) {+        return this.decoder.write(buf);+    }++    return this.decoder.write(Buffer.from(buf));

Yeah it seems internal encoder requires conversion to Buffer-s. A bit sad, but understandable, as otherwise they'd have to replicate all encoding/decoding logic from Buffer.toString.

nit: maybe structure it like this to highlight that we're just converting it into a buffer?

    if (!Buffer.isBuffer(buf)) {
        buf = Buffer.from(buf);
    }

    return this.decoder.write(buf);
gyzerok

comment created time in a month

Pull request review commentashtuchkin/iconv-lite

ensure support for Uint8Array

 function Utf16Decoder(options, codec) {  Utf16Decoder.prototype.write = function(buf) {     if (!this.decoder) {+        // Support Uint8Array+        if (!Buffer.isBuffer(buf)) {+            buf = Buffer.from(buf)+        }

Ultimately I'd want to get rid of Buffer.concat so that we can avoid conversion to Buffers at all. Meanwhile, I think this stop gap is fine. (Same with utf32).

gyzerok

comment created time in a month

Pull request review commentashtuchkin/iconv-lite

ensure support for Uint8Array

 module.exports = function(config) {      // start these browsers     // available browser launchers: https://npmjs.org/browse/keyword/karma-launcher-    browsers: ['PhantomJS'],+    browsers: ['ChromeHeadless'],

I agree, Chrome Headless might be better here. Thank you!

gyzerok

comment created time in a month

Pull request review commentashtuchkin/iconv-lite

ensure support for Uint8Array

 describe("iconv-lite", function() {         var str = iconv.decode(buf, "utf8");         assert.equal(str, "💩");     });++    it("supports passing Uint8Array to decode for all encodings", function() {+        iconv.encode('', 'utf8'); // Load all encodings.++        var encodings = Object.keys(iconv.encodings)+        encodings+            .filter(encoding => !encoding.startsWith('_') && encoding !== '0')+            .forEach(function(encoding) {+                // remove base64 and hex temporarily, because https://github.com/ashtuchkin/iconv-lite/issues/247+                if (['base64', 'hex'].indexOf(encoding) >= 0) {+                    return;+                }++                var expected = 'Lorem ipsum';++                var encoded = iconv.encode(expected, encoding);+                var byteArray = [];+                for (var i = 0; i < encoded.length; i++) {+                    byteArray[i] = encoded[i];+                }+                var uint8Array = Uint8Array.from(byteArray);

nit: Uint8Array can be created directly from a buffer (I think)

                var uint8Array = Uint8Array.from(encoded);
gyzerok

comment created time in a month

Pull request review commentashtuchkin/iconv-lite

ensure support for Uint8Array

 describe("iconv-lite", function() {         var str = iconv.decode(buf, "utf8");         assert.equal(str, "💩");     });++    it("supports passing Uint8Array to decode for all encodings", function() {+        iconv.encode('', 'utf8'); // Load all encodings.++        var encodings = Object.keys(iconv.encodings)+        encodings+            .filter(encoding => !encoding.startsWith('_') && encoding !== '0')

nit: hmm I'm not sure why you're filtering out "0"? I don't see this key in encodings object..

            .filter(encoding => !encoding.startsWith('_'))
gyzerok

comment created time in a month

issue openedashtuchkin/iconv-lite

Re-check internal codec's "trivial" encoder on surrogates

I have a suspicion that it might not be so trivial. Test with 2 string chunks, where the first one ends on a high surrogate and the second begins with low surrogate.

created time in a month

issue commentashtuchkin/iconv-lite

Possible bug with base64 and hex encodings

This is due to base64 and hex being binary encodings, not string encodings :) See #231

gyzerok

comment created time in a month

pull request commentashtuchkin/iconv-lite

ensure support for Uint8Array

That's great work Fedor! I'm a bit busy currently, so expect maybe 1-2 days delays before I can look deeper. At the high level I like this approach.

Alexander Shtuchkin

On Mon, Jun 22, 2020 at 12:56 AM Fedor Nezhivoi notifications@github.com wrote:

@gyzerok commented on this pull request.

In test/webpack/basic-test.js https://github.com/ashtuchkin/iconv-lite/pull/246#discussion_r443316876:

@@ -36,6 +36,29 @@ describe("iconv-lite", function() {

     var str = iconv.decode(buf, "utf8");

     assert.equal(str, "💩");

 });
  • it("supports passing Uint8Array to decode for all encodings", function() {

  •    iconv.encode('', 'utf8'); // Load all encodings.
    
  •    const encodings = Object.keys(iconv.encodings)
    
  •    encodings.forEach(function(encoding) {
    
  •        if (['base64', 'hex', '_internal', '0'].indexOf(encoding) >= 0) {
    

Excluded temporarily base64 and hex since they don't seem to work even with buffers: #247 https://github.com/ashtuchkin/iconv-lite/issues/247

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/pull/246#pullrequestreview-434602945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHKONMGN6DPPZDM7CZDRX3QA3ANCNFSM4OEF3A4Q .

gyzerok

comment created time in a month

issue commentashtuchkin/iconv-lite

Question about decoding and encoding in single step

I see. You're right, it's decoded to an intermediate representation, which is JS strings (which are internally utf16), then encode converts it to utf8. It might be more efficient to convert it to utf8 directly, but in that case I'd need to implement N^2 codecs (every input x every output matrix), which is not feasible. In practice, all conversion libraries use some kind of intermediate representation internally, even if they don't expose it to the client. In iconv-lite the internal representation is exposed because it's convenient to use directly in a lot of cases.

With pipelines, the conversion is still happening in a single pass (i.e. it's not reading everything into memory at any stage), even if there are 2 transformations (encode and decode), so memory-wise it should be ok.

That being said, if you don't like seeing two transformations here, skipping encoding part should work (node.js is converting strings to utf8 by default):

fs.createReadStream(...).pipe(iconv.decodeStream("win1255")).pipe(fs.createWriteStream(outfile))

-- Alexander Shtuchkin

On Sat, Jun 20, 2020 at 3:44 AM igorthy notifications@github.com wrote:

I'm writing it to file (converting file with custom format to CSV formatted text file) It is probably something I don't know about decoding/encoding but the reason I asked this question is because I assumed that when decoding text it is being decoded to some intermediate representation (Unicode?) Then the encode step converts this intermediate representation to utf8 (in my case). So I was thinking that it could be more efficient to decode the stream to utf8 right away. So the entire operation is done in single pass.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/244#issuecomment-646958865, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHJVZCY3Z5JMUSCLUKLRXRSFLANCNFSM4OCPS3OA .

igorthy

comment created time in 2 months

issue commentashtuchkin/iconv-lite

Question about decoding and encoding in single step

Currently there's no interface to do that. Moreover, internally it'll still be 2 operations (decoding, then encoding) due to how the library is structured.

What are you piping this into? If it's a file or network socket, you can likely skip the encodeStream("utf8") as it'll be done automatically by Node.js.

On Fri, Jun 19, 2020 at 3:49 AM igorthy notifications@github.com wrote:

Hi,

I use streaming support within library like this:

fs.createReadStream(file) .pipe(iconv.decodeStream('win1255')) .pipe(iconv.encodeStream('utf8')) ...

Is it possible to use single pipe? (I know I can use through module to create such transform stream but it is still 2 separate streams behind it) Something like this:

fs.createReadStream(file) .pipe(iconv.decodeEncodeStream('win1255', 'utf8')) ...

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ashtuchkin/iconv-lite/issues/244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEZKHJOLFUEDCSBARFFFF3RXMJ7HANCNFSM4OCPS3OA .

igorthy

comment created time in 2 months

issue commentashtuchkin/iconv-lite

Support using ArrayBuffer as the "bytes" data type (instead of Buffer)

Node versions 0.10+ Browsers - I don't have a good sense for now. I'll probably target the ones that have native Uint8Array-s. TextEncoder/TextDecoder-s can probably be polyfilled as needed.

ashtuchkin

comment created time in 2 months

issue commentashtuchkin/iconv-lite

Support a non-string based EncoderStream/DecoderStream

This is in line with StreamDecoder in Node.js and TextDecoder/TextEncoder in WHATWG Encodings standard. Conceptually encoding/decoding is a conversion between bytes and strings and in vast majority of cases it makes sense to expose strings to clients. In rare cases where you convert from bytes in encoding A to bytes in encoding B, we recommend just calling encode(decode()) or piping the streams.

Also, btw, Node.js streams can also work with strings just fine, using "object" mode.

bpasero

comment created time in 2 months

issue commentmicrosoft/vscode

Web: support other encodings than UTF-8

+1. Also for decoding, I'm 99% sure you can pass Uint8Array instead of Buffer and it'll work fine.

bpasero

comment created time in 2 months

pull request commentashtuchkin/iconv-lite

Use iconv.Buffer instead of global Buffer

Thanks @gyzerok! Please see my comment https://github.com/ashtuchkin/iconv-lite/issues/235#issuecomment-645917932 - unfortunately I think just making Buffer class would not help. We need something more involved.

gyzerok

comment created time in 2 months

issue commentashtuchkin/iconv-lite

Support using ArrayBuffer as the "bytes" data type (instead of Buffer)

After another look at the codebase, I think we'd need something more involved than what I described above. Current usages of Buffer class/instances are the following:

Migratable:

  • Buffer.from([bytes]) -> Uint8Array.from([bytes])
  • Buffer.alloc(n) -> new Uint8Array(n)
  • Buffer.alloc(n, default) -> new Uint8Array(n); buf.fill(default)
  • Buffer.concat([buf, buf]) -> helper function creating a new buf and copy.
  • buf[i] -> arr[i]
  • buf.slice(a, b) -> arr.slice(a, b)

Tricky:

  • Buffer.from(str, encoding), where encoding is 'utf8', cesu, ucs2, 'hex'. base64.
  • buf.toString(encoding) // same encodings as above
  • StringDecoder class that is Node.js-specific.

I'll need to think more about it, but for now it looks like just replacing Buffer with Uint8Array would not help (specifically buf.toString(encoding) would be tricky). We probably need to create a different class that would provide hooks for all the functionality above and have a Buffer-based implementation and a Uint8Array-based one. We'll also likely need to involve TextDecoder/TextEncoder classes to support "native" encodings.

ashtuchkin

comment created time in 2 months

issue commentashtuchkin/iconv-lite

Not spec conform dependency versioning in package.json

Fixed in 148b6bc82ce69a1c89643db55110e83513a262ce, thanks for reporting!

priv-kweihmann

comment created time in 2 months

push eventashtuchkin/iconv-lite

Alexander Shtuchkin

commit sha 148b6bc82ce69a1c89643db55110e83513a262ce

Unify package.json dependency version formats. Fixes #241

view details

push time in 2 months

issue closedashtuchkin/iconv-lite

Not spec conform dependency versioning in package.json

https://github.com/ashtuchkin/iconv-lite/blob/0e5377a9ca84923e41a81f94fefef8b36b75843d/package.json#L42

does use

 "safer-buffer": ">= 2.1.2 < 3"

after reading https://docs.npmjs.com/files/package.json#dependencies, I think it should be

 "safer-buffer": ">= 2.1.2 <3.0.0"

At least the following tool https://pypi.org/project/semantic-version/ says currently used one is not spec conform

closed time in 2 months

priv-kweihmann

issue commentcrazy-max/swarm-cronjob

Concatenate jobs

You could probably start both containers on schedule and then the uploader could wait for the exporter to finish by e.g. watching for a special file in the shared volume.

For more advanced logic a separate scheduler like Airflow should work, not sure cronjob is the right place to put it into.

colthreepv

comment created time in 2 months

more