1. 09 Jul, 2020 2 commits
  2. 05 Jul, 2020 1 commit
  3. 12 Jun, 2020 2 commits
  4. 06 Jun, 2020 1 commit
  5. 25 May, 2020 3 commits
  6. 21 May, 2020 2 commits
  7. 07 May, 2020 2 commits
  8. 06 May, 2020 1 commit
  9. 30 Apr, 2020 4 commits
  10. 29 Apr, 2020 1 commit
  11. 28 Apr, 2020 2 commits
  12. 27 Apr, 2020 4 commits
  13. 23 Apr, 2020 1 commit
  14. 22 Apr, 2020 2 commits
  15. 21 Apr, 2020 3 commits
  16. 20 Apr, 2020 2 commits
    • Dan Staples's avatar
      Refactored spider to use new AsyncSessionPool class. Removed obsolete config... · df36a7b7
      Dan Staples authored
      Refactored spider to use new AsyncSessionPool class. Removed obsolete config options and module imports. Made a workaround for a bug where occasionally a saved S3 object would fail to return the version_id property, which seemed to be a timing issue.
      df36a7b7
    • Dan Staples's avatar
      I noticed that when running the spider, most of the time there was only 1-2... · 7c57b5e8
      Dan Staples authored
      I noticed that when running the spider, most of the time there was only 1-2 open connections to MJCS, with occasional brief bursts up to 10 when a node would spawn child tasks. To better optimize concurrency, I reworked the spider to run 10 simultaneous root nodes at a time, which all spawn their own child tasks in separate trio nurseries. While this will slow the time it takes to complete a single root node, the spider is now almost always at 10 open connections to MJCS, meaning an overall shorter run.
Other changes:
      - Removed Session (synchronous) class, along with requests dependency.
      - Moved some of the HTTP request error handling and state logic into the AsyncSession class.
      - Updated versions of asks and trio dependencies; replaced session_pool implementation using obsolete trio.Queue with the AsyncSessionPool class, which uses trio’s new memory channels feature.
      7c57b5e8
  17. 17 Apr, 2020 5 commits
  18. 14 Apr, 2020 2 commits