profile
viewpoint
If you are wondering where the data of this site comes from, please visit https://api.github.com/users/dginev/events. GitMemory does not store any data, but only uses NGINX to cache data for a period of time. The idea behind GitMemory is simply to give users a better reading experience.
Deyan Ginev dginev @KWARC New York http://prodg.org PhD student at @KWARC, building NLP tools for science docs. LaTeXML dev. Currently hyped about @rust-lang

brucemiller/LaTeXML 337

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.

arXiv/arxiv-readability 23

Pilot project to render HTML5 from arXiv LaTeX sources

dginev/CorTeX 23

A general purpose processing framework for corpora of scientific documents

dginev/arxiv-statement-classification 2

Paragraph Classifiers for AMS-tagged statements, over 1.2m arXiv.org documents

dginev/deep-daze 2

Deep Daze - Simple command line tool for text to image generation using CLIP and Siren

dginev/deprecated-CorTeX 2

Now deprecated, see main CorTeX repository

dginev/LaTeXML-Plugin-ltxmojo 2

A Mojolicious web service and showcase for LaTeXML

dginev/LaTeXML-Plugin-MathSyntax 2

An alternative grammar for parsing mathematical expressions for LaTeXML, implemented in Marpa::R2

dginev/deprecated-CorTeX-Peripheral 1

Peripheral management for the CorTeX system - Worker spawner and Peripheral template classes

dginev/LaTeXML 1

LaTeXML is a TeX and LaTeX to XML translator.

pull request commentbrucemiller/LaTeXML

[WiP] Manual update

As I understood it, #1216 was about documenting how to install dependencies when you want to run the development version of LaTeXML. (But then I didn't understand where mktexlsr came in, so perhaps I don't understand exactly which situation the linked "recipe" solves)

Well that issue is rather annoying for lay users to the point where it is unclear if documenting how to mod a file under /etc really "solves" the usability problem. I'd rather keep it open until we find something a bit more satisfying...

brucemiller

comment created time in 3 days

pull request commentbrucemiller/LaTeXML

[WiP] Manual update

Why/how does this PR fix #1216 ? I thought the only way out of that was some system-level change from the OS, or us using something different altogether.

brucemiller

comment created time in 4 days

issue commentbrucemiller/LaTeXML

Modernize and streamline documentation flows

Just to neutralize the perception that I'm arguing best practices from my personal taste:

"Best Practices and Tools for Documenting APIs", 2017 https://www.programmableweb.com/news/best-practices-and-tools-documenting-apis/analysis/2017/09/27

Slate is an open source, responsive API documentation generator that was originally built by TripIt and later adopted by many leading API providers. Over 9,700 GitHub users have forked the Slate repository, indicating many API providers are reusing this template (not all of them are production use cases, but it is used by notable providers for their APIs such as Travis-CI, Best Buy, and Clearbit). Content for Slate is written in Markdown and documentation is delivered in the industry-standard format where the menu is on the left pane, documentation in the middle pane, and an interactive sandbox in the right pane. While API providers must handcraft the documentation, they can link the sandbox to multiple bindings in an API so that tabbed windows are provided to interact with the API in over 100 programming languages, with programming syntax throughout the documentation automatically highlighted, just like in code editors. Slate automatically creates a GitHub repo with the documentation, which can then be published as GitHub pages, or downloaded as HTML, CSS, and JavaScript and hosted on any server.

There are now over 21,000 forks of the repository, more than double the number since that article was written in 2017.

As I mentioned on the top, I'm definitely open for counter-offers of similar quality. Reinventing everything in latexml departs from the cool "dogfooding" label and enters solipsistic territory. And we haven't really been trying to do that, e.g. we already have (difficult to find) github wiki pages, and - much more successfully - the github issue pages, which are obviously in markdown syntax and serve a useful simple purpose.

dginev

comment created time in 9 days

push eventdginev/LaTeXML

push time in 9 days

push eventdginev/LaTeXML

bruce miller

commit sha 0b073bad2a93f44ded01520631a8174b800a2a79

Digested refinement (#1647) * Revise/revert changes to Digested Parameter type; do NOT use Digested for \overbrace,\underbrace * cleanup \GenericError (and friends) output * Don't make Error for unknown keys (since we're sometimes cavalier about defining them) * Enhance accent tests to distinguish Digested and {} * Use {} instead of Digested on various resize/rotate macros; refine and extend the covered set of macros; update test results * Avoid incorrect use of Digested; correct Unicodepoints for under/over bracket * Correct several wrong uses of Digested instead of {} arg type * adapt \noexpand to LaTeXML's digestion

view details

Bruce Miller

commit sha 5b51eb058fc256e31beeb07f6ce99cc2900a8c04

fix listings with empty-looking strings

view details

bruce miller

commit sha f6c4bf4b859fb1a36479814807d5300a37b5ce6e

Robust prelims (#1650) * Add a robust option (and allow protected more broadly) as a preliminary extension of robust and protected commands * Define several commands as robust * Extended math accents test to test robust behaviour * Pass all options to defRobustCS, so it can be locked as well

view details

Vincenzo Mantova

commit sha f7ea14f29069856f08135adb4b4a406bffc78626

add missing space in HTML title

view details

Vincenzo Mantova

commit sha d27a8e102d460831be51b9d94cc462cb775325d7

epub: add document titles to navigation document

view details

push time in 9 days

push eventdginev/LaTeXML-Plugin-Cortex

Deyan Ginev

commit sha 46afd8938a3a75012433216160bba24a5129c7ff

latest master with robust defmaths

view details

push time in 10 days

pull request commentbrucemiller/LaTeXML

Robust prelims

on second thought, locking only the munged cs is probably pretty pointless.

well, one way or another, PR is good to merge in my book :> Thanks!

brucemiller

comment created time in 10 days

Pull request review commentbrucemiller/LaTeXML

Robust prelims

 sub DefMacroI {     $STATE->assignMathcode($cs => 0x8000, $options{scope}); }   $cs        = coerceCS($cs);   $paramlist = parseParameters($paramlist, $cs) if defined $paramlist && !ref $paramlist;-  $STATE->installDefinition(LaTeXML::Core::Definition::Expandable->new($cs, $paramlist, $expansion, %options),+  my $defcs = ($options{robust} ? defRobustCS($cs, $options{scope}) : $cs);+  $STATE->installDefinition(LaTeXML::Core::Definition::Expandable->new($defcs, $paramlist, $expansion, %options),     $options{scope});   AssignValue(ToString($cs) . ":locked" => 1, 'global') if $options{locked};

should we also lock $defcs here - when available - or is that too much safety for no benefit?

brucemiller

comment created time in 11 days

PullRequestReviewEvent
PullRequestReviewEvent

pull request commentbrucemiller/LaTeXML

create proper EPUB3 navigation document

What I am hoping for, is that whatever mechanism you adopt for customising postprocessing, it is just as easy to modify as it currently is for bindings and stylesheets.

We're completely on the same page here 👍🏻

xworld21

comment created time in 11 days

pull request commentbrucemiller/LaTeXML

create proper EPUB3 navigation document

As a user, I am in favour of XSLT-based solutions

I wish it was that simple. The other considerations that go in here are: what other formats are likely to get supported? Are there other post-processors that are similar to Manifest, and are there other manifests? Is this something to keep encouraging :> Do we want to enable those as plug-in modules, which allows some extra freedom without the constraints of the velocity and direction of the main LaTeXML project, etc.

And there is a very subjective toolchain "taste" that goes in when deciding between two turing-complete languages. Why invest in perl post-processors based around CrossRef's current implementation, rather than redo as much as possible in XMP and XSLT? Well, depends on who's working on the codebase, and the specific problems getting solved.

Then we get into a branch of discussions referred to as "plugins and profiles" where we have some unfinished plans about refactoring post-processing in a way where it is possible to easily and predictably add/swap in custom post-processors (as perl class files). Also from external repositories in the LaTeXML::Plugin:: namespace.

So the more we pull here, the more strands we'll get in our hands, and the more carefully we should be making choices...

xworld21

comment created time in 12 days

pull request commentbrucemiller/LaTeXML

create proper EPUB3 navigation document

More for @dginev: is there a reason not to create, at an earlier stage, a nav.xml with an appropriately attributed ltx:TOC, and let CrossRef and XSLT do their thing?

That would make sense. Also for the reason that we need a special sanitization step. I have seen some metadata sneaking in that erroneously contains footnotes/sidenotes, also in titles. There's also the open question of what exactly metadata fields should keep for math, as MathML has a poor chance of ever being catalog-friendly... All of which fit nicely in CrossRef's purview.

Now, as to assembling a nav.xml, that depends on whether we believe navigation entries are easier to create via XSLT or via perl. I'm somewhat impartial on that end. But having the right API to grab any metadata of interest from any post-processor following CrossRef -- that would be great.

xworld21

comment created time in 13 days

issue openedbrucemiller/LaTeXML

Some math macros need to be robust

A follow-up to our recently discovered discrepancy for the argument reads of \underbrace in the #1647 comments

It appears in my texlive 2020 fontmath.ltx there are 46 definitions using \DeclareRobustCommand, among which the \overbrace and \underbrace examples. In the protected cases, some low-level programmatic uses become a bit subtle, as in the example:

\[ \expandafter\underbrace\underbrace foo \]

where the outer underbrace receives no arguments, while the inner one receives the letter f. Without the \protect machinery, latexml interprets the example as two nested underbraces over a missing argument, which may end up problematic.

It may be worthwhile adding a protected => 1 flag to the DefMath API, and maybe even make the definitions protected by default? The default irrespective, going through the .ltx definitions and marking all protected macros as such in the latexml definitions would be a good upgrade to stay closer to pdflatex parity.

created time in 14 days

Pull request review commentbrucemiller/LaTeXML

Digested refinement

 $\widetilde{aaa}$   & $\mathaccent"0365{aaa}$ \\ $\widehat{aaa}$     & $\mathaccent"0362{aaa}$ \\ \end{tabular}++\def\abc{ABC}+\def\nothing{}+Test what arguments hat accepts+(should be same with check, breve, acute, grave, tilde, bar, vec,+dot, ddot, overline, widehat, widetilde)+\def\testmacro{\hat}+\[ [\testmacro A] \quad+ [\testmacro{}] \quad+ [\testmacro{A}] \quad+ [\testmacro{ABC}] \quad+ [\testmacro\nothing A] \quad+ [\testmacro\relax A] \quad+ [\testmacro{\verb+$+}] \quad+ [\testmacro\bgroup A \egroup] \quad+ [\testmacro\bgroup \verb+$+ \egroup] \]+Note that this only accents the A+\[ [\testmacro\abc] \]++Other accent-like macros (underbrace, overbrace) only accept regular style arguments

Also, feel free to merge the PR as-is, I can open an issue for this newly discovered discrepancy.

brucemiller

comment created time in 14 days

PullRequestReviewEvent

Pull request review commentbrucemiller/LaTeXML

Digested refinement

 $\widetilde{aaa}$   & $\mathaccent"0365{aaa}$ \\ $\widehat{aaa}$     & $\mathaccent"0362{aaa}$ \\ \end{tabular}++\def\abc{ABC}+\def\nothing{}+Test what arguments hat accepts+(should be same with check, breve, acute, grave, tilde, bar, vec,+dot, ddot, overline, widehat, widetilde)+\def\testmacro{\hat}+\[ [\testmacro A] \quad+ [\testmacro{}] \quad+ [\testmacro{A}] \quad+ [\testmacro{ABC}] \quad+ [\testmacro\nothing A] \quad+ [\testmacro\relax A] \quad+ [\testmacro{\verb+$+}] \quad+ [\testmacro\bgroup A \egroup] \quad+ [\testmacro\bgroup \verb+$+ \egroup] \]+Note that this only accents the A+\[ [\testmacro\abc] \]++Other accent-like macros (underbrace, overbrace) only accept regular style arguments

Thanks for the \noexpand bit. For the \expandafter bit I went to check the source, and indeed there is a difference - macros such as \int, \eq, \neq, \mapsto, \cong, \longrightarrow, \cdots, \overbrace, \underbrace ... in latex/base/fontmath.ltx are declared as protected, while our DefMath declarations do not have the protection added (for now?)

I think every argument-accepting math macro in fontmath.ltx ends up declared as protected.

Indeed, just quickly changing the definition to:

DefMath('\lx@math@underbrace {}', "\x{23DF}", operator_role => 'UNDERACCENT',     # BOTTOM CURLY BRACKET
  scriptpos => 'mid');
DefMacroI('\underbrace', undef, Tokens(T_CS('\protect'), T_CS('\lx@math@underbrace')));

leads to my last example correctly finding the letter f as the argument of underbrace, as well as the rest of the details that come up through pdflatex. So that may be the only fix needed to get all my examples working - though what the "Best" way to implement the protection is, I am unsure. Maybe we need a protected => 1 flag for DefMath, which is turned on by default?

brucemiller

comment created time in 14 days

PullRequestReviewEvent

issue commentbrucemiller/LaTeXML

Generated XHTML fails (a few) validity checks

At the very least I genuinely thank you for the detailed examination of the question in the comments here @nxg , truly appreciated.

I think both me and Bruce can take some time to carefully consider what kinds of upgrades are worth investing time and maintenance into, and which directions reap the most benefits for effort invested. There are indeed some existing solution in latexml that can be made to evolve in various directions, and the ePub support has plenty of room to grow in sophistication... Ideally we can get a lot on the generation side with as little as possible technical investment however. In my experience metadata-related bits can be kept quite compact most of the time, but as usual the devil is in the details.

nxg

comment created time in 15 days

Pull request review commentbrucemiller/LaTeXML

Digested refinement

 $\widetilde{aaa}$   & $\mathaccent"0365{aaa}$ \\ $\widehat{aaa}$     & $\mathaccent"0362{aaa}$ \\ \end{tabular}++\def\abc{ABC}+\def\nothing{}+Test what arguments hat accepts+(should be same with check, breve, acute, grave, tilde, bar, vec,+dot, ddot, overline, widehat, widetilde)+\def\testmacro{\hat}+\[ [\testmacro A] \quad+ [\testmacro{}] \quad+ [\testmacro{A}] \quad+ [\testmacro{ABC}] \quad+ [\testmacro\nothing A] \quad+ [\testmacro\relax A] \quad+ [\testmacro{\verb+$+}] \quad+ [\testmacro\bgroup A \egroup] \quad+ [\testmacro\bgroup \verb+$+ \egroup] \]+Note that this only accents the A+\[ [\testmacro\abc] \]++Other accent-like macros (underbrace, overbrace) only accept regular style arguments

An extension to my previous note, if you do:

\[ [\expandafter\underbrace\underbrace\noexpand\relax] \]

pdflatex will produce two adjacent empty undebraces fenced in with the square brackets, without any errors. latexml instead emits a warning for a missing argument, and produces HTML with a missing word with two vertical underbraces under it. So this may be more directly related to the {} argument, even if my test is admittedly tortured.

brucemiller

comment created time in 15 days

PullRequestReviewEvent

Pull request review commentbrucemiller/LaTeXML

Digested refinement

 $\widetilde{aaa}$   & $\mathaccent"0365{aaa}$ \\ $\widehat{aaa}$     & $\mathaccent"0362{aaa}$ \\ \end{tabular}++\def\abc{ABC}+\def\nothing{}+Test what arguments hat accepts+(should be same with check, breve, acute, grave, tilde, bar, vec,+dot, ddot, overline, widehat, widetilde)+\def\testmacro{\hat}+\[ [\testmacro A] \quad+ [\testmacro{}] \quad+ [\testmacro{A}] \quad+ [\testmacro{ABC}] \quad+ [\testmacro\nothing A] \quad+ [\testmacro\relax A] \quad+ [\testmacro{\verb+$+}] \quad+ [\testmacro\bgroup A \egroup] \quad+ [\testmacro\bgroup \verb+$+ \egroup] \]+Note that this only accents the A+\[ [\testmacro\abc] \]++Other accent-like macros (underbrace, overbrace) only accept regular style arguments

Alright, maybe its point is indeed too different for this PR. I tried a few other tests, one of which had a discrepancy with pdflatex, but again in an unrelated way:

\testmacro\noexpand\relax

which produces:

Error:expected:Token Missing argument Token for Core::Definition::Expandable[\noexpand Token] at ubrace.tex; line 4 col 24

This ran just fine in pdflatex and produced an underbrace over some empty space - for which I wonder how the width was determined. Leaving it to you what to do with this partial information :>

brucemiller

comment created time in 15 days

PullRequestReviewEvent

Pull request review commentbrucemiller/LaTeXML

Digested refinement

 DefParameterType('GraphixDimension', sub {  # Should probably be more clever about whether to create an ltx:inline-block vs just ltx:text # perhaps somethinb based on modes?-DefConstructor('\Gscale@box{}[] Digested',-  "<ltx:inline-block angle='#angle' width='#width' height='#height' depth='#depth'"-    . " innerwidth='#innerwidth' innerheight='#innerheight' innerdepth='#innerdepth'"-    . " xscale='#xscale' yscale='#yscale'"-    . " xtranslate='#xtranslate' ytranslate='#ytranslate'>"-    . "#3"-    . "</ltx:inline-block>",+# NOTE: Need to arrange for \width,\height,\depth,\totalheight to be bound!!!+# Record the box in \@tempboxa ???+sub graphics_scaledbox_props {+  my ($box, $xscale, $yscale) = @_;+  #  my ($w,   $h,      $d)      = $box->getSize;+  my ($rw, $rh, $rd, $w, $h, $d) = $box->getSize;+  my ($sw, $sh, $sd) =+    ($w && $w->multiply($xscale), $h && $h->multiply($yscale), $d && $d->multiply($yscale));+  return (+    box         => $box,+    xscale      => $xscale, yscale => $yscale,+    innerwidth  => $w,+    innerheight => $h,+    innerdepth  => $d,+    width       => $sw,+    height      => $sh,+    depth       => $sd,+    totalheight => $h  && ($d ? $h->add($d) : $h)->multiply($yscale),+    xtranslate  => $sw && $w && $sw->subtract($w)->multiply(+0.5),+    ytranslate  => $sh && $h && $sh->subtract($h)->multiply(-0.5),+  ); }++sub graphics_scaledbox_insert {+  my ($document, %props) = @_;+  $document->openElement('ltx:inline-block',+    xscale     => $props{xscale},     yscale     => $props{yscale},+    width      => $props{width},      height     => $props{height}, depth => $props{depth},+    xtranslate => $props{xtranslate}, ytranslate => $props{ytranslate});+  $document->absorb($props{box});+  $document->closeElement('ltx:inline-block');+  return; }++DefConstructor('\Gscale@box {Float} [Float] {}', sub {+    my ($document, $scale, $yscale, $box, %props) = @_;+    graphics_scaledbox_insert($document, %props); },   properties => sub {     my ($stomach, $xscale, $yscale, $box) = @_;-    $xscale = ToString($xscale);-    $yscale = ($yscale ? ToString($yscale) : $xscale);-    if    ($xscale eq '!') { $xscale = $yscale; }-    elsif ($yscale eq '!') { $yscale = $xscale; }-    my ($w, $h, $d) = $box->getSize;-    return () unless $w;-    # Some issue with double scaling if BOTH size & scales are given!?!?-    (width => $w->multiply($xscale),-      height     => $h->multiply($yscale),-      depth      => $d->multiply($yscale),-      xscale     => $xscale,-      yscale     => $yscale,-      xtranslate => $w->multiply(($xscale - 1) / 2),-      ytranslate => $h->add($d)->multiply(($yscale - 1) / 2)); },+    graphics_scaledbox_props($box, $xscale, $yscale || $xscale); },   mode => 'text'); Let('\scalebox', '\Gscale@box'); -DefConstructor('\Gscale@@box OptionalMatch:* {GraphixDimension}{GraphixDimension} Digested', sub {

great, good to merge then!

brucemiller

comment created time in 15 days

PullRequestReviewEvent
PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentbrucemiller/LaTeXML

detect presence of toc nav in EPUB

 sub convert {     $$opts{sitedirectory} = $sandbox_directory;      if ($$opts{format} eq 'epub') {+      # give main document a predictable name, avoid clash with nav.xhtml+      $sandbox_destination = 'index.xhtml';

Great piece of information to add here, thanks!

xworld21

comment created time in 15 days

Pull request review commentbrucemiller/LaTeXML

detect presence of toc nav in EPUB

 sub process {       my $itemref = $spine->addNewChild(undef, 'itemref');       $itemref->setAttribute('idref', $item_id); -      # Add to navigation-      my $nav_map = $$self{nav_map};-      my $nav_li  = $nav_map->addNewChild(undef, 'li');-      my $nav_a   = $nav_li->addNewChild(undef, 'a');-      $nav_a->setAttribute('href', URI::file->new($relative_destination));-      $nav_a->appendText($file); } }+      # Add to default navigation document+      if (!$nav_xhtml && !$tocfound) {+        my $nav_li = $nav_map->addNewChild(undef, 'li');+        my $nav_a  = $nav_li->addNewChild(undef, 'a');+        $nav_a->setAttribute('href', URI::file->new($relative_destination));+        my ($title) = split(/ \x{2023} /, $doc->findnode('/xhtml:html/xhtml:head/xhtml:title')->textContent, 2);

i suspect this may be a bit more reliable to build from the CrossRef-provided metadata, but I'll let @brucemiller brainstorm that. If the \x{2023} delimiter, or its spacing, changes down the road, the regex will fail, which is a bit tricky.

xworld21

comment created time in 15 days

Pull request review commentbrucemiller/LaTeXML

detect presence of toc nav in EPUB

 sub convert {     $$opts{sitedirectory} = $sandbox_directory;      if ($$opts{format} eq 'epub') {+      # give main document a predictable name, avoid clash with nav.xhtml+      $sandbox_destination = 'index.xhtml';

that's ... a bit too standard? if the desired effect is to avoid a clash, why not check for a clash and avoid it?

I'm unsure what to think about every single ePub generated by latexml having the same name for its main xhtml file.

xworld21

comment created time in 15 days