profile
viewpoint
Alexander Gude agude Bay Area https://alexgude.com Data scientist, cyclists, and occasional photographer. Formerly a high energy particle physicist at CERN (@UMN-CMS).

agude/vim-eldar 41

A dark color scheme for vim based on Elflord.

agude/UMN-PhD-Thesis-Template 14

The LaTeX thesis template provided by the University of Minnesota, with various improvements.

agude/wayback-machine-archiver 14

A Python script to submit web pages to the Wayback Machine for archiving.

agude/Dungeon-World-Markdown 5

A tabletop roleplaying game, now translated to Markdown via aaronsw/html2text.

agude/Alex-Hadd 3

Python wrapper around CERN Root's hadd. Useful for adding together large amounts of histograms.

agude/agude.github.io 2

My personal website

agude/Jupyter-Notebook-Template-Library 2

A library of Jupyter Notebook Templates for Data Science

agude/hermod-ansible 1

Ansible to set up Prosody on Lightsail

agude/raspberry-pi-twitter-bot 1

A Twitter bot for reporting Raspberry Pi system status

push eventagude/dotfiles

Alexander Gude

commit sha b7472af10c2d0d6bb93206900f40ea4836df836e

Add git attributes file

view details

push time in 13 hours

issue commentagude/wayback-machine-archiver

TooManyRedirects: Exceeded 30 redirects

Can you make sure you're running 1.9.0? I added more logging that prints the version number, and I don't see it in the above log. 1.9.0 should fix the 520 issue (or at least if they don't show up more than 5 times in a row).

Melonadev

comment created time in 2 days

issue closedagude/wayback-machine-archiver

Issues in updating

Is there a parameter for updating archiver? Like a --update or something.

Originally posted by @Melonadev in https://github.com/agude/wayback-machine-archiver/issue_comments/711156395

I'm using the latest version of pip (20.2.4), and I'm still having troubles with reinstalling archiver:

C:\Users\yewhe>pip install wayback-machine-archiver
Collecting wayback-machine-archiver
  Using cached wayback_machine_archiver-1.9.0-py3-none-any.whl (7.1 kB)
Collecting requests
  Using cached requests-2.24.0-py2.py3-none-any.whl (61 kB)
Collecting idna<3,>=2.5
  Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in c:\python38\lib\site-packages (from requests->wayback-machine-archiver) (1.25.10)
Collecting certifi>=2017.4.17
  Using cached certifi-2020.6.20-py2.py3-none-any.whl (156 kB)
Requirement already satisfied: chardet<4,>=3.0.2 in c:\python38\lib\site-packages (from requests->wayback-machine-archiver) (3.0.4)
Installing collected packages: idna, certifi, requests, wayback-machine-archiver
  WARNING: Failed to write executable - trying to use .deleteme logic
ERROR: Could not install packages due to an EnvironmentError: [WinError 2] The system cannot find the file specified: 'c:\\python38\\Scripts\\archiver.exe' -> 'c:\\python38\\Scripts\\archiver.exe.deleteme'

closed time in 2 days

Melonadev

issue commentagude/wayback-machine-archiver

Issues in updating

Awesome!

Melonadev

comment created time in 2 days

issue commentagude/wayback-machine-archiver

Issues in updating

Hi @Melonadev!

This isn't a bug with Archiver, but I'm happy to try to help.

Permissions Issue

The problem you're having is a filesystem permissions issue. pip is being run by your user account but is trying to write to a location that only an administrator can write to.

You have two options:

  1. Run pip install --user wayback-machine-archiver. This will install the packages to your user directory instead of the system directory, so only you will be able to use them, but you don't need administrator privileges.
  2. Run pip install wayback-machine-archiver but from an administrator shell/terminal. This will install it for everyone, and since you're the admin you can install to the system directory instead of your user directory. The DOWNSIDE of this method is that you might have some issues running the script as your normal user. I suggest going with the \1. option above.

Other Notes

Since you have already installed Archiver, you should be able to run pip with --upgrade, which upgrades an already installed package. Something like pip install --upgrade --user wayback-machine-archiver.

Let me know how it goes!

Melonadev

comment created time in 2 days

create barnchagude/wayback-machine-archiver

branch : feature-redirect_max

created branch time in 3 days

issue commentagude/wayback-machine-archiver

520 Server Error for some URLS

@Melonadev I've released 1.9.0, which should fix the 520 errors you are seeing. It got rid of them in a test run I did. Let me know if you see 520s again!

agude

comment created time in 3 days

issue commentagude/wayback-machine-archiver

TooManyRedirects: Exceeded 30 redirects

I haven't been able to reproduce this. Can you use pip to upgrade to the newest version (1.9.0) and run:

archiver --file ./fest.txt --rate-limit-wait 30 --log DEBUG > out.log 2>&1

That's what it would be on Linux, not sure on Windows. The > out.log 2>&1 is just saving the debug and error log to a file, but you can leave those out and copy/paste the output here as well.

Melonadev

comment created time in 3 days

created tagagude/wayback-machine-archiver

tag1.9.0

A Python script to submit web pages to the Wayback Machine for archiving.

created time in 3 days

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 221074b1b4e07b087b060f74ebba9c4c08fb4e10

Update --help

view details

Alexander Gude

commit sha 0770b7937746fce08b3f39cccac9bb9f3b2460ee

Add version output to logging at Debug level

view details

Alexander Gude

commit sha 1f7f2ee5ccdd4eea8f338d31e938353d8478fc0a

Add 520 as a retry status

view details

Alexander Gude

commit sha 13789b26847db85137607646811d0414e518512a

Merge branch 'feature-error_handling' This fixes #19.

view details

Alexander Gude

commit sha 1dfe3295fa1962c3e1daf44cf2ea3a663eba9ab0

Update Python versions on Travis

view details

push time in 3 days

issue closedagude/wayback-machine-archiver

520 Server Error for some URLS

I tried 60 for this file: jass.txt

...then 520 Server Error: UNKNOWN for url

Microsoft Windows [Version 10.0.18363.1082]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\yewhe\Downloads>archiver --file jass.txt --rate-limit-wait 60
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2013-uk-vocalist-of-the-year/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2013-uk-vocalist-of-the-year/
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/and-the-judges-are/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/and-the-judges-are/
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2020-prs-for-music-uk-jazz-act-of-the-year/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2020-prs-for-music-uk-jazz-act-of-the-year/
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 243, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/

Originally posted by @Melonadev in https://github.com/agude/wayback-machine-archiver/issue_comments/710761066

closed time in 3 days

agude

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 1f7f2ee5ccdd4eea8f338d31e938353d8478fc0a

Add 520 as a retry status

view details

push time in 3 days

issue commentagude/wayback-machine-archiver

520 Server Error for some URLS

Looks like retry is not triggering for these! I'm actually surprised any passed. Adding 520 to the list of codes to retry for.

agude

comment created time in 3 days

create barnchagude/wayback-machine-archiver

branch : feature-error_handling

created branch time in 3 days

issue commentagude/wayback-machine-archiver

520 Server Error for some URLS

Logging in text:

DEBUG:root:Arguments: Namespace(archive_sitemap=False, file='./fest.txt', jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=30, sitemaps=[], urls=[])
INFO:root:Parsing sitemaps
INFO:root:Reading urls from file: ./fest.txt
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/wellness
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/hub
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2019-2
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-lineup
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/personnel
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/videos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/photos-2020
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/past-shows
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-saturday-jan-11th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/marathon-map
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/archive
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/about
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/sponsorship
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/contact
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/talks-2018
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:root:Creating archive URL for https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year
DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://www.winterjazzfest.com/past-shows', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/artemis-just-added-to-eubanks-evans-experience-allison-miller-boom-tic-boom-at-lpr-on-jan-13th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/revive-yo-feelings-a-musicians-wellness-benefit-with-robert-glasper-terrace-martin-more-just-added-to-jan-11th-manhattan-marathon', 'https://web.archive.org/save/https://www.winterjazzfest.com/piedmont-blues-a-search-for-salvation', 'https://web.archive.org/save/https://www.winterjazzfest.com/', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/29/marathon-artist-lineups-by-day-just-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/were-proud-to-be-part-of-prs-foundations-international-keychange-program', 'https://web.archive.org/save/https://www.winterjazzfest.com/videos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/hub', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come', 'https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020', 'https://web.archive.org/save/https://www.winterjazzfest.com/contact', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018', 'https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added', 'https://web.archive.org/save/https://www.winterjazzfest.com/archive', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup', 'https://web.archive.org/save/https://www.winterjazzfest.com/about', 'https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/31/new-shows-just-announced-tickets-on-sale-friday-nov-1-at-12-pm-et', 'https://web.archive.org/save/https://www.winterjazzfest.com/wellness', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/12/13/opening-night-dj-set-just-added-on-jan-8th-gilles-peterson-lefto-and-kassa-overall-at-nublu', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/thx-for-attending-2019-wjf-see-you-next-year', 'https://web.archive.org/save/https://www.winterjazzfest.com/sponsorship', 'https://web.archive.org/save/https://www.winterjazzfest.com/2020-saturday-jan-11th', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced', 'https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd', 'https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax', 'https://web.archive.org/save/https://www.winterjazzfest.com/personnel', 'https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map'}
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/past-shows
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): web.archive.org:443
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/past-shows HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175158/https://www.winterjazzfest.com/past-shows HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-10th
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/2020-friday-jan-10th HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175246/https://www.winterjazzfest.com/2020-friday-jan-10th HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015062242/https://www.winterjazzfest.com/2020-friday-jan-10th HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall HTTP/1.1" 520 0
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
Traceback (most recent call last):
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/wayback_machine_archiver/archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/hub
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): web.archive.org:443
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/hub HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175409/https://www.winterjazzfest.com/hub HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015062614/https://www.winterjazzfest.com/hub HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175457/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015063403/https://www.winterjazzfest.com/news/2019/10/30/tickets-on-sale-and-new-shows-announced HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/2020-friday-jan-17th
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/2020-friday-jan-17th HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175542/https://www.winterjazzfest.com/2020-friday-jan-17th HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015063821/https://www.winterjazzfest.com/2020-friday-jan-17th HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175628/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064736/https://www.winterjazzfest.com/news/2020/1/16/wjf-closing-night-show-with-mark-guiliana-beat-music-improvisations-at-nublu-is-sold-out HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175714/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064342/https://www.winterjazzfest.com/news/2019/10/6/2020-wjf-dates-announced-january-9-18-more-details-to-come HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/photos-2020
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/photos-2020 HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175758/https://www.winterjazzfest.com/photos-2020 HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/contact
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/contact HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175848/https://www.winterjazzfest.com/contact HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015062327/https://www.winterjazzfest.com/contact HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017175931/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015062507/https://www.winterjazzfest.com/news/2019/10/24/tickets-on-sale-friday-1025-at-12-noon-et HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/talks-2018
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/talks-2018 HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180014/https://www.winterjazzfest.com/talks-2018 HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180057/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015063904/https://www.winterjazzfest.com/amendola-vs-blades-w/-skerik-mark-guiliana-space-heroes HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180141/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064037/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): web.archive.org:443
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180230/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/archive
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/archive HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180313/https://www.winterjazzfest.com/archive HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064904/https://www.winterjazzfest.com/archive HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/2020-lineup
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/2020-lineup HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180415/https://www.winterjazzfest.com/2020-lineup HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/about
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/about HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180518/https://www.winterjazzfest.com/about HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015063007/https://www.winterjazzfest.com/about HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/jazz-for-kids HTTP/1.1" 520 0
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids
Traceback (most recent call last):
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/wayback_machine_archiver/archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/jazz-for-kids
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): web.archive.org:443
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475 HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180657/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475 HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015063738/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475 HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180746/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064122/https://www.winterjazzfest.com/news/2020/1/23/thank-you-for-joining-us-at-2020-winter-jazzfest HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180831/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064948/https://www.winterjazzfest.com/news/2019/10/16/2020-nyc-winter-jazzfest-initial-lineup-announced HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/talks-2019-2
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/talks-2019-2 HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017180915/https://www.winterjazzfest.com/talks-2019-2 HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015062105/https://www.winterjazzfest.com/talks-2019-2 HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017181011/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax-mygwj-jm475-bj3fd HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017181055/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064515/https://www.winterjazzfest.com/news/2020/1/2/new-wjf-talks-just-added-gcpax HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/personnel
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/personnel HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017181139/https://www.winterjazzfest.com/personnel HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015064820/https://www.winterjazzfest.com/personnel HTTP/1.1" 200 0
DEBUG:root:Sleeping for 30
INFO:root:Calling archive url https://web.archive.org/save/https://www.winterjazzfest.com/marathon-map
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.winterjazzfest.com/marathon-map HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201017181229/https://www.winterjazzfest.com/marathon-map HTTP/1.1" 302 0
DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /web/20201015062702/https://www.winterjazzfest.com/marathon-map HTTP/1.1" 200 0
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/user/bin/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/user/bin/anaconda3/lib/python3.8/multiprocessing/pool.py", line 48, in mapstar
    return list(map(*args))
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/wayback_machine_archiver/archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/requests/models.py", line 941, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/user/bin/anaconda3/bin/archiver", line 8, in <module>
    sys.exit(main())
  File "/home/user/bin/anaconda3/lib/python3.8/site-packages/wayback_machine_archiver/archiver.py", line 243, in main
    pool.map(partial_call, archive_urls)
  File "/home/user/bin/anaconda3/lib/python3.8/multiprocessing/pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/user/bin/anaconda3/lib/python3.8/multiprocessing/pool.py", line 771, in get
    raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.winterjazzfest.com/news/2019/11/7/just-announced-seu-jorge-at-the-town-hall
agude

comment created time in 3 days

issue commentagude/wayback-machine-archiver

520 Server Error for some URLS

In testing for #18, I have gotten a few 520s as well, the first few resolved successfully, but the last one blew up the program. Logging here.

I think I'm not handling error statuses in the best way possible, probably here: https://github.com/agude/wayback-machine-archiver/blob/master/wayback_machine_archiver/archiver.py#L38

One thing to do is handle statuses differently. 520 generally means a transient error, but 500 generally means rate limiting (which is unrecoverable).

agude

comment created time in 3 days

issue closedagude/wayback-machine-archiver

Server Error: Internal Server Error for url

Around 10 minutes into running archiver for a txt file:

Here's the text file: bluemaxima.txt

This happens: 500 Server Error: Internal Server Error for url

C:\Users\yewhe\Downloads>archiver --file bluemaxima.txt
ERROR:root:500 Server Error: Internal Server Error for url: https://web.archive.org/save/https://bluemaxima.org/flashpoint/datahub/Game_Master_List
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://web.archive.org/save/https://bluemaxima.org/flashpoint/datahub/Game_Master_List
ERROR:root:500 Server Error: Internal Server Error for url: https://web.archive.org/save/https://bluemaxima.org/flashpoint/datahub/Special:RecentChangesLinked/Unity_Curation
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://web.archive.org/save/https://bluemaxima.org/flashpoint/datahub/Special:RecentChangesLinked/Unity_Curation
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://web.archive.org/save/https://bluemaxima.org/flashpoint/datahub/Game_Master_List
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 243, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: https://web.archive.org/save/https://bluemaxima.org/flashpoint/datahub/Game_Master_List

Does this mean that after the error link, the archiver doesn't run anymore? In other words, does archiver interrupt the process and ignore the remaining urls in the text file? Does archiver retry the link after a period of time?

closed time in 3 days

Melonadev

issue commentagude/wayback-machine-archiver

Server Error: Internal Server Error for url

That looks like a separate issue, I've opened a new bug: #19

Melonadev

comment created time in 3 days

issue openedagude/wayback-machine-archiver

520 Server Error for some URLS

I tried 60 for this file: jass.txt

...then 520 Server Error: UNKNOWN for url

Microsoft Windows [Version 10.0.18363.1082]
(c) 2019 Microsoft Corporation. All rights reserved.

C:\Users\yewhe\Downloads>archiver --file jass.txt --rate-limit-wait 60
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2013-uk-vocalist-of-the-year/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2013-uk-vocalist-of-the-year/
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/and-the-judges-are/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/and-the-judges-are/
ERROR:root:520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2020-prs-for-music-uk-jazz-act-of-the-year/
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2020-prs-for-music-uk-jazz-act-of-the-year/
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 38, in call_archiver
    r.raise_for_status()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\requests\models.py", line 928, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\yewhe\AppData\Local\Programs\Python\Python38-32\Scripts\archiver.exe\__main__.py", line 7, in <module>
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\site-packages\wayback_machine_archiver\archiver.py", line 243, in main
    pool.map(partial_call, archive_urls)
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\users\yewhe\appdata\local\programs\python\python38-32\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
requests.exceptions.HTTPError: 520 Server Error: UNKNOWN for url: https://web.archive.org/save/https://www.jazzfmawards.com/awards/2017-ppl-lifetime-achievement-award/

Originally posted by @Melonadev in https://github.com/agude/wayback-machine-archiver/issue_comments/710761066

created time in 3 days

issue commentagude/wayback-machine-archiver

TooManyRedirects: Exceeded 30 redirects

Interesting error! I've never seen this one before.

I would guess it's a single URL from your list that has some weird script redirection, or something else like that. I'll rerun with logging and see what happens.

Melonadev

comment created time in 3 days

delete branch agude/agude.github.io

delete branch : posts-gab41

delete time in 4 days

push eventagude/agude.github.io

Alexander Gude

commit sha e32ae4efe39fc265169538cd0b4c5af840fd34df

Starting to write 2020 TDF article using 2019 base

view details

Alexander Gude

commit sha 26438c46950ada6e9a6c952f444cb5ce9d3c9385

More work on tour post

view details

Alexander Gude

commit sha 65613151265f088a7709ad56d5697465db5771cc

Update TdF post

view details

Alexander Gude

commit sha 58311bc54919ee3782d75615b44897736c55a892

Some more writing

view details

Alexander Gude

commit sha b207043271fac741333c664b18fbef2e93d27dbd

Add plots and plotting notebook

view details

Alexander Gude

commit sha adb135076c4c4ddbf5590e63a05174227b62d56a

Add data

view details

Alexander Gude

commit sha 910497c8404f3308cbec671e14d7a8e101a692e4

Add header image

view details

Alexander Gude

commit sha 640e6632cbcb77670b9f54a6be40468b283e8d31

Writing updates

view details

Alexander Gude

commit sha 46cbb2855e5dc1bc6739cd267ce6e4039c1d5b7d

Finish rough out

view details

Alexander Gude

commit sha 9733ede7f67d02e6e92e45813d355da05ca4b512

Some final edits

view details

Alexander Gude

commit sha bb65bf8e353798d70da5355e0b7fabe38cbae268

Apply suggestions from code review Co-authored-by: Charles Fyfe <charles.a.mceachern@gmail.com> Co-authored-by: Veldrina <wualank+github@gmail.com>

view details

Alexander Gude

commit sha abf94ad4cc2db2ab7c8a1548d268af03dfa3c2d7

Update post after review

view details

Alexander Gude

commit sha 939dba53bb7a0496a4456663c2096645dca5e52b

Final edits

view details

Alexander Gude

commit sha db4fcea9475ee3ea3b03eb0b77c0e94a0c3a3f58

Move post to _posts

view details

Alexander Gude

commit sha 39dbc923f8680ba35c63f6ab759e810b824ef09e

Publish 2020-10-16: TDF Plots

view details

push time in 4 days

delete branch agude/agude.github.io

delete branch : post-2020_tdf

delete time in 4 days

PR merged agude/agude.github.io

Tour de France 2020 Post edits needed
+58271 -0

0 comment

7 changed files

agude

pr closed time in 4 days

push eventagude/agude.github.io

Alexander Gude

commit sha 939dba53bb7a0496a4456663c2096645dca5e52b

Final edits

view details

Alexander Gude

commit sha db4fcea9475ee3ea3b03eb0b77c0e94a0c3a3f58

Move post to _posts

view details

push time in 4 days

push eventagude/agude.github.io

Alexander Gude

commit sha abf94ad4cc2db2ab7c8a1548d268af03dfa3c2d7

Update post after review

view details

push time in 4 days

issue commentagude/wayback-machine-archiver

Server Error: Internal Server Error for url

The problem with that is that it's not the specific URL, it's that the Internet Archive is rate limiting you.

Archiver used to just chug through the full list even when one failed, but as soon as one URL failed all subsequent URLs would also fail. The only solution was to stop, wait for the rate limit to end, and try again.

Melonadev

comment created time in 8 days

push eventagude/agude.github.io

Alexander Gude

commit sha bb65bf8e353798d70da5355e0b7fabe38cbae268

Apply suggestions from code review Co-authored-by: Charles Fyfe <charles.a.mceachern@gmail.com> Co-authored-by: Veldrina <wualank+github@gmail.com>

view details

push time in 8 days

issue commentagude/wayback-machine-archiver

Server Error: Internal Server Error for url

Hi @Melonadev!

When Archiver errors (as above) that means it does halts and does not back up the remaining items on the list.

It should retry multiple times, but I wonder if I'm short-circuiting that here: https://github.com/agude/wayback-machine-archiver/blob/master/wayback_machine_archiver/archiver.py#L38

Anyway, the most common reason for a 500 error is the Internet Archive rate-limiting you. My suggestion is to turn the --rate-limit-wait parameter higher! It defaults to 5 seconds; I'd try 30 or even 60.

Melonadev

comment created time in 8 days

PR opened agude/agude.github.io

Reviewers
Tour de France 2020 Post edits needed
+58265 -0

0 comment

7 changed files

pr created time in 13 days

push eventagude/agude.github.io

Alexander Gude

commit sha 9733ede7f67d02e6e92e45813d355da05ca4b512

Some final edits

view details

push time in 13 days

push eventagude/agude.github.io

Alexander Gude

commit sha 46cbb2855e5dc1bc6739cd267ce6e4039c1d5b7d

Finish rough out

view details

push time in 13 days

push eventagude/agude.github.io

Alexander Gude

commit sha adb135076c4c4ddbf5590e63a05174227b62d56a

Add data

view details

Alexander Gude

commit sha 910497c8404f3308cbec671e14d7a8e101a692e4

Add header image

view details

Alexander Gude

commit sha 640e6632cbcb77670b9f54a6be40468b283e8d31

Writing updates

view details

push time in 14 days

push eventagude/agude.github.io

Alexander Gude

commit sha 58311bc54919ee3782d75615b44897736c55a892

Some more writing

view details

Alexander Gude

commit sha b207043271fac741333c664b18fbef2e93d27dbd

Add plots and plotting notebook

view details

push time in 14 days

create barnchagude/Jupyter-Notebook-Template-Library

branch : cicd-update_python

created branch time in 14 days

push eventagude/agude.github.io

Alexander Gude

commit sha 26438c46950ada6e9a6c952f444cb5ce9d3c9385

More work on tour post

view details

Alexander Gude

commit sha 65613151265f088a7709ad56d5697465db5771cc

Update TdF post

view details

push time in 15 days

create barnchagude/agude.github.io

branch : post-2020_tdf

created branch time in 20 days

PR opened iNPUTmice/Conversations

Change handling of empty lines in Paste as Quote

I'm trying to fix the problem I reported in #3876, although we haven't agreed that it is a problem. :-) This modifies code originally committed as #2127.

This PR removes the first replaceAll() in insertAsQuote() which looks for two or more "line feed" (\n) characters optionally separated by spaces with a single line feed. If this PR is merged, text will be pasted with the same white space it has before quoting.

We might want to think about how this function should work (that is, for a set of input texts, what should the output be) and then rework it to do that. For example, I'm not sure the final .replaceAll("\n$", "") is doing what it should be. I think it's supposed to strip remaining empty lines at the end of the text but for my tests strings it doesn't seem to. (It does do that if you move it before the "insert >" step.)

So that is to say, I'm happy to keep tweaking this PR until it meets your approval!

+1 -1

0 comment

1 changed file

pr created time in 21 days

create barnchagude/Conversations

branch : feature-paste_as_quote_newlines

created branch time in 21 days

fork agude/Conversations

Conversations is an open source XMPP/Jabber client for Android

https://conversations.im

fork in 21 days

issue commentmate-desktop/mate-panel

Windows maximize 'under' panels

On 20.04 Mate I'm seeing this issue with dual horizontal monitors.

My monitors are next to each other, so I have a left and a right monitor. Using the Mutiny panel layout, if the left monitor is primary (so the dock on the left side of the left monitor) the windows maximize and snap without overlapping the dock.

If I make the right monitor primary (so the dock is on the left side of the right monitor, hence at the "seem" between the two monitors) the windows maximize under the dock.

I have tried switching between nvidia-driver-450 and Nouveau and also between Marco Adaptive and Marco (No compositing) but the bug persists.

sanderboom

comment created time in 21 days

issue commentagude/wayback-machine-archiver

use within python script

Glad to hear it! :-)

ChocoTonic

comment created time in 23 days

issue commentagude/wayback-machine-archiver

use within python script

I get the following error message: "NameError: global name 'Retry' is not defined". Any idea how to fix it?

😬 Sorry, that code was a quick sketch, and it looks like I missed some imports.

You'll need:

from urllib3.util.retry import Retry

P.S. Plex will only run python 2.7

I feel your pain! My Raspberry Pis only support 2.7 right now, so I'm trying to keep Archiver working on it as well!

ChocoTonic

comment created time in 24 days

delete branch agude/agude.github.io

delete branch : post-2020_interviews

delete time in a month

push eventagude/agude.github.io

Alexander Gude

commit sha b568ee36f0d070af4c58cef8fb225d3b7767ad9d

Very rough draft of 2020 interviewing post

view details

Alexander Gude

commit sha fdc75829336b0aba792faa8faeffba3280ac120f

Clean up draft

view details

Alexander Gude

commit sha 3ed3f16fecfc9fb19c24d1cac8871841f075a353

Further updates and add lede image

view details

Alexander Gude

commit sha e761462dd55e4565ad74566afe8eafebfc419e13

More polish

view details

Alexander Gude

commit sha e42bfa663a4e56834e9fe0f9c6764303e331f317

Apply suggestions from code review Co-authored-by: Veldrina <wualank+github@gmail.com> Co-authored-by: Charles Fyfe <charles.a.mceachern@gmail.com>

view details

Alexander Gude

commit sha 42f0f2f5a2b7c09deee78052e7eeadf657b84974

Update from review

view details

Alexander Gude

commit sha a4cae6230ccdd91d80e7266db799b998f485033f

Further edits and tweaks

view details

Alexander Gude

commit sha b6bc403381509bcb192d1ecc18f2626c2d3782a1

Signed at Square!

view details

Alexander Gude

commit sha f0bf47abe55182a6d11149401ebd62866b3a21e2

Move post to posts folder

view details

Alexander Gude

commit sha ae9a9eb734714a4fd6e9c2f10eb40dfe410949cc

Some final tweaks

view details

Alexander Gude

commit sha 9d21f58739560de9ab6ee3049adae2ac23320e66

Publish 2020-09-21 Interviewing Post

view details

Alexander Gude

commit sha cb90c49ac5da1012aa45cc16f90ad033f26e3bb4

Fix categories

view details

push time in a month

PR merged agude/agude.github.io

Post: 2020 Interview Retro edits needed

Why write a data post when I can ramble for 2 pages?!

+181 -0

0 comment

2 changed files

agude

pr closed time in a month

push eventagude/agude.github.io

Alexander Gude

commit sha 42f0f2f5a2b7c09deee78052e7eeadf657b84974

Update from review

view details

Alexander Gude

commit sha a4cae6230ccdd91d80e7266db799b998f485033f

Further edits and tweaks

view details

push time in a month

push eventagude/agude.github.io

Alexander Gude

commit sha e42bfa663a4e56834e9fe0f9c6764303e331f317

Apply suggestions from code review Co-authored-by: Veldrina <wualank+github@gmail.com> Co-authored-by: Charles Fyfe <charles.a.mceachern@gmail.com>

view details

push time in a month

PR opened agude/agude.github.io

Reviewers
Post: 2020 Interview Retro edits needed

Why make a data post when I can ramble for 2 pages?!

+175 -0

0 comment

2 changed files

pr created time in a month

push eventagude/agude.github.io

Alexander Gude

commit sha e761462dd55e4565ad74566afe8eafebfc419e13

More polish

view details

push time in a month

push eventagude/agude.github.io

Alexander Gude

commit sha fdc75829336b0aba792faa8faeffba3280ac120f

Clean up draft

view details

Alexander Gude

commit sha 3ed3f16fecfc9fb19c24d1cac8871841f075a353

Further updates and add lede image

view details

push time in a month

create barnchagude/agude.github.io

branch : post-2020_interviews

created branch time in a month

issue openedagude/agude.github.io

Back-link to the negotiating post.

There are several posts that should link to my negotiating post (PR #32).

Here is an incomplete list:

  • The 2020 interviewing post
  • The salary posts

created time in a month

issue commentiNPUTmice/Conversations

'Paste as Quote' strips empty lines

Looks like this is the line the does it: https://github.com/iNPUTmice/Conversations/blob/c9e6653e33676c3df8502eb7a18d5dc24b7e7750/src/main/java/eu/siacs/conversations/ui/widget/EditMessage.java#L145

agude

comment created time in a month

issue openediNPUTmice/Conversations

'Paste as Quote' strips empty lines

General information

  • Version: 2.8.10+pcr
  • Device: Pixel 3
  • Android Version: Android 11 (stock)
  • Server name: self hosted
  • Server software: prosody 0.11.5
  • Installed server modules: I'd have to look...
  • Conversations source: PlayStore

Steps to reproduce

  1. Copy text with empty lines
  2. Select 'Paste as Quote'
  3. All empty lines are stripped

Expected result

If I copy and paste the following text:

This is the first line.

This is the second line.

Gets pasted as: I would expect:

> This is the first line.
>
> This is the second line.

Actual result

> This is the first line.
> This is the second line.

Preserving the empty lines would be nice during paste, especially for multi-paragraph quotes where the empty lines separate the paragraphs.

created time in a month

push eventagude/agude.github.io

Alexander Gude

commit sha 8f9e0babbd62caab3efc17f22db26d6be4c87b18

Edit Nook Desk article for grammar, clarity.

view details

Alexander Gude

commit sha 0a9524d7d1e71e5ed2f568ffb7c3340f8d738df5

Minor reword in TDF article

view details

Alexander Gude

commit sha 3e0b7fab9f47a45e9ee9f2dd264878f9603f8dd8

Dash to en-dash for number ranges

view details

Alexander Gude

commit sha ff41d7c545d01e7e9a3d967be603309d5700e3aa

Move from "Accident" to "crash" or "collision" People have agency, they crash cars, not "have accidents".

view details

Alexander Gude

commit sha fb6657f177c5c370c64262dafcb5164d910491d8

Minor edits

view details

Alexander Gude

commit sha 4dec6ec2bb1956ef62d0ae5ce5b89855ef93510a

Add link to my grey band drawing code!

view details

Alexander Gude

commit sha d066eb2bcbd5654ab17e43d339656976690a7181

Merge branch 'fixup-edits'

view details

Alexander Gude

commit sha a89327359deca35804d2cc576ad0b83f7f559c44

Fixed a bug, remove mention of it

view details

push time in a month

push eventagude/Jupyter-Notebook-Template-Library

Alexander Gude

commit sha 689f8cfa411cf5431c0681a286f35088eb0e99f1

Add a second way to get legend color for lines This fixes #3.

view details

push time in a month

issue closedagude/Jupyter-Notebook-Template-Library

draw_colored_legend() is brittle

It breaks for some objects (like lines).

closed time in a month

agude

delete branch agude/agude.github.io

delete branch : post-py_map

delete time in 2 months

push eventagude/agude.github.io

Alexander Gude

commit sha 8ab6a0be792560fbb8e80c57b29841744f6646d2

Initial draft

view details

Alexander Gude

commit sha 86154510209a322d5b3e589d458ef386388624ce

Add image and update Python Map post

view details

Alexander Gude

commit sha 72e8fb8d6ae60786dd2c8bac660fc0cc109434ae

Major rewrite of Map post

view details

Alexander Gude

commit sha 89c0cfebec1ab33a52305a52ec229aaa83057a82

Add comprehensions

view details

Alexander Gude

commit sha ea43d53f1f4e931ececa91e9929209bc6462c630

Add link to list comprehension

view details

Alexander Gude

commit sha 0c03468327546747d61e0167300729400ba60498

Apply suggestions from code review Co-authored-by: Veldrina <wualank+github@gmail.com>

view details

Alexander Gude

commit sha 1f0a4dee8505a13ccd57f137ac1e817f7699f886

Merge branch 'post-py_map'

view details

Alexander Gude

commit sha 0a51840d49a97b3db67f7dbd03834d0b85040557

Move Map and Filter to posts

view details

push time in 2 months

PR merged agude/agude.github.io

Map, Filter, ~Reduce~ Post edits needed

Most recent post at the last minute...

+177 -0

0 comment

2 changed files

agude

pr closed time in 2 months

push eventagude/agude.github.io

Alexander Gude

commit sha 0c03468327546747d61e0167300729400ba60498

Apply suggestions from code review Co-authored-by: Veldrina <wualank+github@gmail.com>

view details

push time in 2 months

PullRequestReviewEvent

Pull request review commentagude/agude.github.io

Map, Filter, ~Reduce~ Post

+---+layout: post+title: "Python Patterns: Map and Filter"+description: >+  For loops are great, but I am a big fan of replacing them with simple+  functions. Python provides a couple of building blocks.+image: /files/patterns/naturalists_misc_vol_1_painted_snake.jpg+image_alt: >+  A drawing of an orange and black snake from The Naturalist's Miscellany+  Volume 1.+categories: python_patterns+---++{% include lead_image.html %}++Computers are great at doing a simple action over and over again. A common+way to make them do such a task is to store data in a list and iterate over it+with a for loop, calling a function for each item.++But Python has some great functions to replace for loops, which I will cover+below after a quick example.++## Playing Cards++Given a list of playing cards as tuples, like so:++```python+cards = [+  ("Spades", 14),+  ("Diamonds", 13),+  ("Hearts", 2),+  ("Spades", 8),+  ("Clubs", 11),+  ...  # etc.+]+```++We want to convert them to `PlayingCard` objects as [defined in my previous+post on `enums`][enums]. We need a function to convert our tuple into the+class:++[enums]: {% post_url 2019-01-22-python_patterns_enum %}#playing-cards-with-enums++```python+def tuple_to_card(card_tuple):+  suit, rank = card_tuple+  +  card = PlayingCard(+    CardSuit(suit),+    CardRank(rank)
    CardRank(rank),
agude

comment created time in 2 months

PR opened agude/agude.github.io

Reviewers
Map, Filter, ~Reduce~ Post edits needed

Most recent post at the last minute...

+177 -0

0 comment

2 changed files

pr created time in 2 months

push eventagude/agude.github.io

Alexander Gude

commit sha 89c0cfebec1ab33a52305a52ec229aaa83057a82

Add comprehensions

view details

Alexander Gude

commit sha ea43d53f1f4e931ececa91e9929209bc6462c630

Add link to list comprehension

view details

push time in 2 months

push eventagude/agude.github.io

Alexander Gude

commit sha 72e8fb8d6ae60786dd2c8bac660fc0cc109434ae

Major rewrite of Map post

view details

push time in 2 months

push eventagude/agude.github.io

Alexander Gude

commit sha 8ab6a0be792560fbb8e80c57b29841744f6646d2

Initial draft

view details

Alexander Gude

commit sha 86154510209a322d5b3e589d458ef386388624ce

Add image and update Python Map post

view details

push time in 2 months

create barnchagude/agude.github.io

branch : post-py_map

created branch time in 2 months

CommitCommentEvent
CommitCommentEvent
CommitCommentEvent
CommitCommentEvent
CommitCommentEvent
CommitCommentEvent
CommitCommentEvent
CommitCommentEvent
CommitCommentEvent

created tagagude/wayback-machine-archiver

tag1.8.1

A Python script to submit web pages to the Wayback Machine for archiving.

created time in 2 months

issue commentmadewithml/utterances

https://madewithml.com/projects/2136/machine-learning-deployment-shadow-mode/

Yup @tataganesh! Looks like TF Serving will do it for you naturally!

utterances-bot

comment created time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 86e90caca37a13b41ab6f897dca27ed9fd2c19e8

Add tests for `load_local_sitemap`

view details

Alexander Gude

commit sha 2eaa0c7dcdf1a8bf8d8599d94256b17f8c4d6e17

Add remote test

view details

Alexander Gude

commit sha d8f48658b2966f7d3a8ed0fd7cd70b3741090056

Add requests-mock to requirements.txt for testing

view details

Alexander Gude

commit sha 94cb03552c74b24954381c32e2ca310073c3b04f

Add raise for status and update requirements

view details

Alexander Gude

commit sha 195d8a9f5036740db6e8adc799368313ebac8c8f

Add tests for `call_archiver`

view details

Alexander Gude

commit sha 271fce7a1f2b390aa01df7cbfb28b5743f2cbc8d

Bump version

view details

Alexander Gude

commit sha 8b5fd3c4455ff6bc552ed208af74ef354fedf979

Merge branch 'test_improvement'

view details

push time in 2 months

created tagagude/wayback-machine-archiver

tag1.8.0

A Python script to submit web pages to the Wayback Machine for archiving.

created time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 195d8a9f5036740db6e8adc799368313ebac8c8f

Add tests for `call_archiver`

view details

push time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 94cb03552c74b24954381c32e2ca310073c3b04f

Add raise for status and update requirements

view details

push time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha d8f48658b2966f7d3a8ed0fd7cd70b3741090056

Add requests-mock to requirements.txt for testing

view details

push time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 2eaa0c7dcdf1a8bf8d8599d94256b17f8c4d6e17

Add remote test

view details

push time in 2 months

create barnchagude/wayback-machine-archiver

branch : test_improvement

created branch time in 2 months

issue commentagude/wayback-machine-archiver

flag for setting no to "save error pages"

@lauhaide: The program will throw an error and terminate when it fails. Right here:

https://github.com/agude/wayback-machine-archiver/blob/master/wayback_machine_archiver/archiver.py#L36-L41

test2a

comment created time in 2 months

issue commentagude/wayback-machine-archiver

flag for setting no to "save error pages"

I run this script to back up my personal site every evening. It's about 100 pages, and I run with --rate-limit-wait=60. It completes most of the time, but every few weeks it'll error out due to rate limiting from the Internet Archive.

So I don't have an exact number for you, but I would say closer to 30-60 seconds than 1-2. :-)

test2a

comment created time in 2 months

issue commentagude/wayback-machine-archiver

flag for setting no to "save error pages"

Hi @lauhaide!

This script doesn't overwrite, as it were, because it asks The Wayback Machine to save a snapshot of the current page. As you can see here with the Yahoo.com archive there are multiple snapshots stored each day.

As for recovering the timestamp, you could get that from the Wayback Machine itself (you'll see each snapshot is timestamped on the Yahoo page for example), but that's not something this tool supports.

If you are on Linux, you could do something like this:

archiver https://yahoo.com --log DEBUG 2>&1 | ts '[%Y-%m-%d %H:%M:%S]'

That would timestamp every line of the debug program like this:

[2020-08-11 13:17:12] DEBUG:root:Arguments: Namespace(archive_sitemap=False, file=None, jobs=1, log_file=None, log_level='DEBUG', rate_limit_in_sec=5, sitemaps=[], urls=['https://yahoo.com'])
[2020-08-11 13:17:12] INFO:root:Adding page URLs to archive
[2020-08-11 13:17:12] DEBUG:root:Page URLs to archive: ['https://yahoo.com']
[2020-08-11 13:17:12] DEBUG:root:Creating archive URL for https://yahoo.com
[2020-08-11 13:17:12] INFO:root:Parsing sitemaps
[2020-08-11 13:17:12] DEBUG:root:Archive URLs: {'https://web.archive.org/save/https://yahoo.com'}
[2020-08-11 13:17:13] DEBUG:root:Sleeping for 5
[2020-08-11 13:17:18] INFO:root:Calling archive url https://web.archive.org/save/https://yahoo.com
[2020-08-11 13:17:18] DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): web.archive.org:443
[2020-08-11 13:17:18] DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://yahoo.com HTTP/1.1" 301 0
[2020-08-11 13:17:30] DEBUG:urllib3.connectionpool:https://web.archive.org:443 "HEAD /save/https://www.yahoo.com/ HTTP/1.1" 200 0

Which you could use to get a rough timestamp from.

test2a

comment created time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 8d4662583b7c0daf46812b02d878baa4215ab7e2

Bump version

view details

push time in 2 months

created tagagude/wayback-machine-archiver

tag1.7.3

A Python script to submit web pages to the Wayback Machine for archiving.

created time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 96ba31aa32ded7565bc936adb8eae30df4764d3f

Fix Python2 incompatible type hinting

view details

push time in 2 months

CommitCommentEvent

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 89e343fad4a240ad70edfa03a8d72df494c1a48e

Update test slightly

view details

push time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha 369940e157f18495fb4621001fbbdbe472c42532

Fix bug with local sitemaps and archiving them This fixes #16.

view details

Alexander Gude

commit sha e6c980aed5f227094264023ad3bba2d9d3a86187

Bump version

view details

push time in 2 months

issue closedagude/wayback-machine-archiver

Failure of `--archive-sitemap-also` when using local sitemaps

Local sitemaps (prefixed with file://) are not handled correctly when also using --archive-sitemap-also.

Remote sitemaps should be separated out and only they should be backed up.

This bug is in 1.7.0 and 1.7.1.

closed time in 2 months

agude

created tagagude/wayback-machine-archiver

tag1.7.2

A Python script to submit web pages to the Wayback Machine for archiving.

created time in 2 months

created tagagude/wayback-machine-archiver

tag1.7.1

A Python script to submit web pages to the Wayback Machine for archiving.

created time in 2 months

push eventagude/wayback-machine-archiver

Alexander Gude

commit sha cffa75d171afb9155f84e78463009304272c3a9b

Update docs on local sitemaps and file bug

view details

push time in 2 months

more