close
Warning:
AdminModule failed with TracError: Unable to instantiate component <class 'trac.ticket.admin.PriorityAdminPanel'> (super(type, obj): obj must be an instance or subtype of type)
- Timestamp:
-
Jan 17, 2017, 11:59:13 PM (9 years ago)
- Author:
-
xsuchom2
- Comment:
-
SpiderLing crawler documentation and source
Legend:
- Unmodified
- Added
- Removed
- Modified
-
v3
|
v4
|
|
34 | 34 | - Better performance (more pages downloaded per second) and less resources used (approx. 25 % less operational memory consumed) achieved by better spreading of domains in the crawling queue, switching to !PyPy from Python (the script is compiled before execution instead of interpreting during execution), rewriting chunked HTTP reponse and URL handling methods and generally improving the code overall. |
35 | 35 | |
| 36 | == !SpiderLing crawler documentation and source == |
| 37 | http://corpus.tools/wiki/SpiderLing |
| 38 | |
36 | 39 | == References == |
37 | 40 | - [1] Suchomel, Vít, and Jan Pomikálek. "Efficient web crawling for large text corpora." In Proceedings of the seventh Web as Corpus Workshop (WAC7), pp. 39-43. 2012. |