torequests package¶

torequests.main¶

class torequests.main.Pool(n=None, timeout=None, default_callback=None, catch_exception=True, *args, **kwargs)[source]¶

Let ThreadPoolExecutor use NewFuture instead of origin concurrent.futures.Future.

WARNING: NewFutures in Pool will not block main thread without NewFuture.x.

Basic Usage:

from torequests.main import Pool
import time

pool = Pool()


def use_submit(i):
    time.sleep(i)
    result = 'use_submit: %s' % i
    print(result)
    return result


@pool.async_func
def use_decorator(i):
    time.sleep(i)
    result = 'use_decorator: %s' % i
    print(result)
    return result


tasks = [pool.submit(use_submit, i) for i in (2, 1, 0)
        ] + [use_decorator(i) for i in (2, 1, 0)]
# pool.x can be ignore
pool.x
results = [i.x for i in tasks]
print(results)

# use_submit: 0
# use_decorator: 0
# use_submit: 1
# use_decorator: 1
# use_submit: 2
# use_decorator: 2
# ['use_submit: 2', 'use_submit: 1', 'use_submit: 0', 'use_decorator: 2', 'use_decorator: 1', 'use_decorator: 0']

all_tasks¶: Keep the same api for dummy, return self._all_futures actually

catch_exception = None¶: catch_exception=True will not raise exceptions, return object FailureException(exception)

default_callback = None¶: set the default_callback if not set single task’s callback

submit(func, *args, **kwargs)[source]¶: Submit a function to the pool, self.submit(function,arg1,arg2,arg3=3)

class torequests.main.ProcessPool(n=None, timeout=None, default_callback=None, catch_exception=True, *args, **kwargs)[source]¶

Simple ProcessPool covered ProcessPoolExecutor.

from torequests.main import ProcessPool
import time

pool = ProcessPool()


def use_submit(i):
    time.sleep(i)
    result = 'use_submit: %s' % i
    print(result)
    return result


def main():
    tasks = [pool.submit(use_submit, i) for i in (2, 1, 0)]
    # pool.x can be ignore
    pool.x
    results = [i.x for i in tasks]
    print(results)


if __name__ == '__main__':
    main()

# ['use_submit: 2', 'use_submit: 1', 'use_submit: 0']
# use_submit: 0
# use_submit: 1
# use_submit: 2

async_func(*args)[source]¶: Decorator mode not support for ProcessPool for _pickle.PicklingError.

submit(func, *args, **kwargs)[source]¶: Submit a function to the pool, self.submit(function,arg1,arg2,arg3=3)

class torequests.main.NewFuture(timeout=None, args=None, kwargs=None, callback=None, catch_exception=True)[source]¶

Add .x attribute and timeout args for original Future class

WARNING: Future thread will not stop running until function finished or pid killed.

Attr task_start_time:
Attr cx:	blocking until the task finish and return the callback_result.
Attr x:	blocking until the task finish and return the value as coro returned.
	timestamp when the task start up.
Attr task_end_time:
	timestamp when the task end up.
Attr task_cost_time:
	seconds of task costs.
Parameters:	catch_exception – True will catch all exceptions and return as `FailureException`

callback_result¶: Block the main thead until future finish, return the future.callback_result.

cx¶: Block the main thead until future finish, return the future.callback_result.

x¶: Block the main thead until future finish, return the future.result().

torequests.main.Async(f, n=None, timeout=None)[source]¶

Concise usage for pool.submit.

Basic Usage Asnyc & threads

from torequests.main import Async, threads
import time


def use_submit(i):
    time.sleep(i)
    result = 'use_submit: %s' % i
    print(result)
    return result


@threads()
def use_decorator(i):
    time.sleep(i)
    result = 'use_decorator: %s' % i
    print(result)
    return result


new_use_submit = Async(use_submit)
tasks = [new_use_submit(i) for i in (2, 1, 0)
        ] + [use_decorator(i) for i in (2, 1, 0)]
print([type(i) for i in tasks])
results = [i.x for i in tasks]
print(results)

# use_submit: 0
# use_decorator: 0
# [<class 'torequests.main.NewFuture'>, <class 'torequests.main.NewFuture'>, <class 'torequests.main.NewFuture'>, <class 'torequests.main.NewFuture'>, <class 'torequests.main.NewFuture'>, <class 'torequests.main.NewFuture'>]
# use_submit: 1
# use_decorator: 1
# use_submit: 2
# use_decorator: 2
# ['use_submit: 2', 'use_submit: 1', 'use_submit: 0', 'use_decorator: 2', 'use_decorator: 1', 'use_decorator: 0']

torequests.main.threads(n=None, timeout=None)[source]¶: Decorator usage like Async.

torequests.main.get_results_generator(future_list, timeout=None, sort_by_completed=False)[source]¶: Return as a generator of tasks order by completed sequence.

torequests.main.run_after_async(seconds, func, *args, **kwargs)[source]¶: Run the function after seconds asynchronously.

class torequests.main.tPool(n=None, interval=0, timeout=None, session=None, catch_exception=True, default_callback=None)[source]¶

Async wrapper for requests.

Parameters:

n – thread pool size for concurrent limit.
interval – time.sleep(interval) after each task finished.
timeout – timeout for each task.result(timeout). But it will not shutdown the raw funtion.
session – individually given a available requests.Session instance if necessary.
catch_exception – True will catch all exceptions and return as FailureException
default_callback – default_callback for tasks which not set callback param.

Usage:

from torequests.main import tPool
from torequests.logs import print_info

trequests = tPool(2, 1)
test_url = 'http://p.3.cn'
ss = [
    trequests.get(
        test_url,
        retry=2,
        callback=lambda x: (len(x.content), print_info(len(x.content))))
    for i in range(3)
]
# or [i.x for i in ss]
trequests.x
ss = [i.cx for i in ss]
print_info(ss)

# [2020-02-11 11:36:33] temp_code.py(10): 612
# [2020-02-11 11:36:33] temp_code.py(10): 612
# [2020-02-11 11:36:34] temp_code.py(10): 612
# [2020-02-11 11:36:34] temp_code.py(16): [(612, None), (612, None), (612, None)]

all_tasks¶: Return self.pool._all_futures

close(wait=False)[source]¶: Close session, shutdown pool.

delete(url, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.delete, but return as NewFuture.

get(url, params=None, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.get, but return as NewFuture.

head(url, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.head, but return as NewFuture.

options(url, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.options, but return as NewFuture.

patch(url, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.patch, but return as NewFuture.

post(url, data=None, json=None, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.post, but return as NewFuture.

put(url, data=None, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.put, but return as NewFuture.

request(method, url, callback=None, retry=0, **kwargs)[source]¶: Similar to requests.request, but return as NewFuture.

x¶: Return self.pool.x

torequests.main.get(url, params=None, callback=None, retry=0, **kwargs)[source]¶

torequests.main.post(url, data=None, json=None, callback=None, retry=0, **kwargs)[source]¶

torequests.main.options(url, callback=None, retry=0, **kwargs)[source]¶

torequests.main.delete(url, callback=None, retry=0, **kwargs)[source]¶

torequests.main.put(url, data=None, callback=None, retry=0, **kwargs)[source]¶

torequests.main.head(url, callback=None, retry=0, **kwargs)[source]¶

torequests.main.patch(url, callback=None, retry=0, **kwargs)[source]¶

torequests.main.request(method, url, callback=None, retry=0, **kwargs)[source]¶

torequests.main.disable_warnings(category=<class 'urllib3.exceptions.HTTPWarning'>)[source]¶: Helper for quickly disabling all urllib3 warnings.

torequests.dummy¶

torequests.utils¶

torequests.utils.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)[source]¶

Parse a query given as a string argument.

Arguments:

qs: percent-encoded query string to be parsed

keep_blank_values: flag indicating whether blank values in: percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included.
strict_parsing: flag indicating what to do with parsing errors.: If false (the default), errors are silently ignored. If true, errors raise a ValueError exception.
encoding and errors: specify how to decode percent-encoded sequences: into Unicode characters, as accepted by the bytes.decode() method.
max_num_fields: int. If set, then throws a ValueError if there: are more than n fields read by parse_qsl().

Returns a dictionary.

torequests.utils.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)[source]¶

Parse a query given as a string argument.

Arguments:

qs: percent-encoded query string to be parsed

keep_blank_values: flag indicating whether blank values in: percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included.
strict_parsing: flag indicating what to do with parsing errors. If: false (the default), errors are silently ignored. If true, errors raise a ValueError exception.
encoding and errors: specify how to decode percent-encoded sequences: into Unicode characters, as accepted by the bytes.decode() method.
max_num_fields: int. If set, then throws a ValueError: if there are more than n fields read by parse_qsl().

Returns a list, as G-d intended.

torequests.utils.urlparse(url, scheme='', allow_fragments=True)[source]¶: Parse a URL into 6 components: <scheme>://<netloc>/<path>;<params>?<query>#<fragment> Return a 6-tuple: (scheme, netloc, path, params, query, fragment). Note that we don’t break the components up in smaller bits (e.g. netloc is a single string) and we don’t expand % escapes.

torequests.utils.quote('abc def') → 'abc%20def'[source]¶

Each part of a URL, e.g. the path info, the query, etc., has a different set of reserved characters that must be quoted.

RFC 3986 Uniform Resource Identifiers (URI): Generic Syntax lists the following reserved characters.

reserved = “;” | “/” | “?” | “:” | “@” | “&” | “=” | “+” |: “$” | “,” | “~”

Each of these characters is reserved in some component of a URL, but not necessarily in all of them.

Python 3.7 updates from using RFC 2396 to RFC 3986 to quote URL strings. Now, “~” is included in the set of reserved characters.

By default, the quote function is intended for quoting the path section of a URL. Thus, it will not encode ‘/’. This character is reserved, but in typical usage the quote function is being called on a path where the existing slash characters are used as reserved characters.

string and safe may be either str or bytes objects. encoding and errors must not be specified if string is a bytes object.

The optional encoding and errors parameters specify how to deal with non-ASCII characters, as accepted by the str.encode method. By default, encoding=’utf-8’ (characters are encoded with UTF-8), and errors=’strict’ (unsupported characters raise a UnicodeEncodeError).

torequests.utils.quote_plus(string, safe='', encoding=None, errors=None)[source]¶: Like quote(), but also replace ‘ ‘ with ‘+’, as required for quoting HTML form values. Plus signs in the original string are escaped unless they are included in safe. It also does not have safe default to ‘/’.

torequests.utils.unquote(string, encoding='utf-8', errors='replace')[source]¶

Replace %xx escapes by their single-character equivalent. The optional encoding and errors parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the bytes.decode() method. By default, percent-encoded sequences are decoded with UTF-8, and invalid sequences are replaced by a placeholder character.

unquote(‘abc%20def’) -> ‘abc def’.

torequests.utils.unquote_plus(string, encoding='utf-8', errors='replace')[source]¶

Like unquote(), but also replace plus signs by spaces, as required for unquoting HTML form values.

unquote_plus(‘%7e/abc+def’) -> ‘~/abc def’

torequests.utils.urljoin(base, url, allow_fragments=True)[source]¶: Join a base URL and a possibly relative URL to form an absolute interpretation of the latter.

torequests.utils.urlsplit(url, scheme='', allow_fragments=True)[source]¶: Parse a URL into 5 components: <scheme>://<netloc>/<path>?<query>#<fragment> Return a 5-tuple: (scheme, netloc, path, query, fragment). Note that we don’t break the components up in smaller bits (e.g. netloc is a single string) and we don’t expand % escapes.

torequests.utils.urlunparse(components)[source]¶: Put a parsed URL back together again. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had redundant delimiters, e.g. a ? with an empty query (the draft states that these are equivalent).

torequests.utils.escape(s, quote=True)[source]¶: Replace special characters “&”, “<” and “>” to HTML-safe sequences. If the optional flag quote is true (the default), the quotation mark characters, both double quote (“) and single quote (‘) characters are also translated.

torequests.utils.unescape(s)[source]¶: Convert all named and numeric character references (e.g. >, >, &x3e;) in the string s to the corresponding unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the list of HTML 5 named character references defined in html.entities.html5.

torequests.utils.simple_cmd()[source]¶: Deprecated: Not better than fire -> pip install fire

torequests.utils.print_mem(unit=None, callback=<function print_info>, rounded=2)[source]¶

Show the proc-mem-cost with psutil, use this only for lazinesssss.

Parameters:	unit – B, KB, MB, GB.

torequests.utils.curlparse(string, encoding='utf-8')[source]¶

Translate curl-string into dict of request. Do not support file upload which contains @file_path.

param string:	standard curl-string, like r’‘’curl …’‘’.
param encoding:	encoding for post-data encoding.

Basic Usage:

>>> from torequests.utils import curlparse
>>> curl_string = '''curl 'https://p.3.cn?skuIds=1&nonsense=1&nonce=0' -H 'Pragma: no-cache' -H 'DNT: 1' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: zh-CN,zh;q=0.9' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Cache-Control: no-cache' -H 'Referer: https://p.3.cn?skuIds=1&nonsense=1&nonce=0' -H 'Cookie: ASPSESSIONIDSQRRSADB=MLHDPOPCAMBDGPFGBEEJKLAF' -H 'Connection: keep-alive' --compressed'''
>>> request_args = curlparse(curl_string)
>>> request_args
{'url': 'https://p.3.cn?skuIds=1&nonsense=1&nonce=0', 'headers': {'Pragma': 'no-cache', 'Dnt': '1', 'Accept-Encoding': 'gzip, deflate', 'Accept-Language': 'zh-CN,zh;q=0.9', 'Upgrade-Insecure-Requests': '1', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8', 'Cache-Control': 'no-cache', 'Referer': 'https://p.3.cn?skuIds=1&nonsense=1&nonce=0', 'Cookie': 'ASPSESSIONIDSQRRSADB=MLHDPOPCAMBDGPFGBEEJKLAF', 'Connection': 'keep-alive'}, 'method': 'get'}
>>> import requests
>>> requests.request(**request_args)
<Response [200]>

class torequests.utils.Null(*args, **kwargs)[source]¶: Null instance will return self when be called, it will alway be False.

torequests.utils.null¶: Null instance will return self when be called, it will alway be False.

torequests.utils.itertools_chain(*iterables)[source]¶: For the shortage of Python2’s, Python3: from itertools import chain.

torequests.utils.slice_into_pieces(seq, n)[source]¶: Slice a sequence into n pieces, return a generation of n pieces

torequests.utils.slice_by_size(seq, size)[source]¶: Slice a sequence into chunks, return as a generation of chunks with size.

torequests.utils.ttime(timestamp=None, tzone=None, fail='', fmt='%Y-%m-%d %H:%M:%S')[source]¶

Translate timestamp into human-readable: %Y-%m-%d %H:%M:%S.

Parameters:	timestamp – the timestamp float, or time.time() by default. tzone – time compensation, int(-time.timezone / 3600) by default, (can be set with Config.TIMEZONE). fail – while raising an exception, return it. fmt – %Y-%m-%d %H:%M:%S, %z not work.
Return type:	str

>>> ttime()
2018-03-15 01:24:35
>>> ttime(1486572818.421858323)
2017-02-09 00:53:38

torequests.utils.ptime(timestr=None, tzone=None, fail=0, fmt='%Y-%m-%d %H:%M:%S')[source]¶

Translate %Y-%m-%d %H:%M:%S into timestamp.

Parameters:

timestr – string like 2018-03-15 01:27:56, or time.time() if not set.
tzone – time compensation, int(-time.timezone / 3600) by default, (can be set with Config.TIMEZONE).
fail – while raising an exception, return it.
fmt – %Y-%m-%d %H:%M:%S, %z not work.

Return type:

int

>>> ptime('2018-03-15 01:27:56')
1521048476

torequests.utils.split_seconds(seconds)[source]¶

Split seconds into [day, hour, minute, second, ms]

divisor: 1, 24, 60, 60, 1000

units: day, hour, minute, second, ms

>>> split_seconds(6666666)
[77, 3, 51, 6, 0]

torequests.utils.timeago(seconds=0, accuracy=4, format=0, lang='en', short_name=False)[source]¶

Translate seconds into human-readable.

param seconds: seconds (float/int).

param accuracy: 4 by default (units[:accuracy]), determine the length of elements.

param format: index of [led, literal, dict].

param lang: en or cn.

param units: day, hour, minute, second, ms.

>>> timeago(93245732.0032424, 5)
'1079 days, 05:35:32,003'
>>> timeago(93245732.0032424, 4, 1)
'1079 days 5 hours 35 minutes 32 seconds'
>>> timeago(-389, 4, 1)
'-6 minutes 29 seconds 0 ms'

torequests.utils.timepass(seconds=0, accuracy=4, format=0, lang='en', short_name=False)¶

Translate seconds into human-readable.

param seconds: seconds (float/int).

param accuracy: 4 by default (units[:accuracy]), determine the length of elements.

param format: index of [led, literal, dict].

param lang: en or cn.

param units: day, hour, minute, second, ms.

>>> timeago(93245732.0032424, 5)
'1079 days, 05:35:32,003'
>>> timeago(93245732.0032424, 4, 1)
'1079 days 5 hours 35 minutes 32 seconds'
>>> timeago(-389, 4, 1)
'-6 minutes 29 seconds 0 ms'

torequests.utils.md5(string, n=32, encoding='utf-8', skip_encode=False)[source]¶

str(obj) -> md5_string

Parameters:	string – string to operate. n – md5_str length.

>>> from torequests.utils import md5
>>> md5(1, 10)
'923820dcc5'
>>> md5('test')
'098f6bcd4621d373cade4e832627b4f6'

class torequests.utils.Counts(start=0, step=1)[source]¶

Counter for counting the times been called

>>> from torequests.utils import Counts
>>> cc = Counts()
>>> cc.x
1
>>> cc.x
2
>>> cc.now
2
>>> cc.current
2
>>> cc.sub()
1

add(num=None)[source]¶

c¶

clear()[source]¶

current¶

now¶

s¶

start¶

step¶

sub(num=None)[source]¶

total¶

x¶

torequests.utils.unique(seq, key=None, return_as=None)[source]¶

Unique the seq and keep the order.

Instead of the slow way:: lambda seq: (x for index, x in enumerate(seq) if seq.index(x)==index)

Parameters:	seq – raw sequence. return_as – generator for default, or list / set / str…

>>> from torequests.utils import unique
>>> a = [1,2,3,4,2,3,4]
>>> unique(a)
<generator object unique.<locals>.<genexpr> at 0x05720EA0>
>>> unique(a, str)
'1234'
>>> unique(a, list)
[1, 2, 3, 4]

torequests.utils.unparse_qs(qs, sort=False, reverse=False)[source]¶: Reverse conversion for parse_qs

torequests.utils.unparse_qsl(qsl, sort=False, reverse=False)[source]¶: Reverse conversion for parse_qsl

class torequests.utils.Regex(ensure_mapping=False)[source]¶

Register some objects(like functions) to the regular expression.

>>> from torequests.utils import Regex, re
>>> reg = Regex()
>>> @reg.register_function('http.*cctv.*')
... def mock():
...     pass
...
>>> reg.register('http.*HELLOWORLD', 'helloworld', instances='http://helloworld', flags=re.I)
>>> reg.register('http.*HELLOWORLD2', 'helloworld2', flags=re.I)
>>> reg.find('http://cctv.com')
[<function mock at 0x031FC5D0>]
>>> reg.match('http://helloworld')
['helloworld']
>>> reg.match('non-http://helloworld')
[]
>>> reg.search('non-http://helloworld')
['helloworld']
>>> len(reg.search('non-http://helloworld2'))
2
>>> print(reg.show_all())
('http.*cctv.*') =>  => <class 'function'> mock ""
('http.*HELLOWORLD', re.IGNORECASE) => http://helloworld => <class 'str'> helloworld
('http.*HELLOWORLD2', re.IGNORECASE) =>  => <class 'str'> helloworld2

find(string, default=None)[source]¶

Return match or search result.

Return type:	list

match(string, default=None)[source]¶

Use re.search to find the result

Return type:	list

register(patterns, obj=None, instances=None, **reg_kwargs)[source]¶

Register one object which can be matched/searched by regex.

Parameters:	patterns – a list/tuple/set of regex-pattern. obj – return it while search/match success. instances – instance list will search/match the patterns. reg_kwargs – kwargs for re.compile.

register_function(patterns, instances=None, **reg_kwargs)[source]¶: Decorator for register.

search(string, default=None)[source]¶

Use re.search to find the result

Return type:	list

show_all(as_string=True)[source]¶: , python2 will not show flags

torequests.utils.kill_after(seconds, timeout=2)[source]¶: Kill self after seconds

class torequests.utils.UA[source]¶

Some common User-Agents for crawler.

Android, iPhone, iPad, Firefox, Chrome, IE6, IE9

Android = 'Mozilla/5.0 (Linux; Android 5.1.1; Nexus 6 Build/LYZ28E) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36'¶

Chrome = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Safari/537.36'¶

Firefox = 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0'¶

IE6 = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)'¶

IE9 = 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0;'¶

WECHAT_ANDROID = 'Mozilla/5.0 (Linux; Android 5.0; SM-N9100 Build/LRX21V) > AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 > Chrome/37.0.0.0 Mobile Safari/537.36 > MicroMessenger/6.0.2.56_r958800.520 NetType/WIFI'¶

WECHAT_IOS = 'Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9B176 MicroMessenger/4.3.2'¶

iPad = 'Mozilla/5.0 (iPad; CPU OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1'¶

iPhone = 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1'¶

torequests.utils.try_import(module_name, names=None, default=<class 'torequests.exceptions.ImportErrorModule'>, warn=True)[source]¶: Try import module_name, except ImportError and return default, sometimes to be used for catch ImportError and lazy-import.

torequests.utils.ensure_request(request)[source]¶

Used for requests.request / Requests.request with ensure_request(request) :param request: dict or curl-string or url :type request: [dict] :return: dict of request :rtype: [dict]

Basic Usage:

>>> from torequests.utils import ensure_request
>>> ensure_request('''curl http://test.com''')
{'url': 'http://test.com', 'method': 'get'}
>>> ensure_request('http://test.com')
{'method': 'get', 'url': 'http://test.com'}
>>> ensure_request({'method': 'get', 'url': 'http://test.com'})
{'method': 'get', 'url': 'http://test.com'}
>>> ensure_request({'url': 'http://test.com'})
{'url': 'http://test.com', 'method': 'get'}

class torequests.utils.Timer(name=None, log_func=None, default_timer=None, rounding=None, readable=None, log_after_del=True, stack_level=1)[source]¶

Usage:

init Timer anywhere:: such as head of function, or head of module, then it will show log after del it by gc.

param default_timer:
param name:	be used in log or None.
param log_func:	some function to show process.
	use timeit.default_timer by default.
param rounding:	None, or seconds will be round(xxx, rounding)
param readable:	None, or use timepass: readable(cost_seconds) -> 00:00:01,234

Basic Usage:

from torequests.utils import Timer
import time
Timer()

@Timer.watch()
def test(a=1):
    Timer()
    time.sleep(1)

    def test_inner():
        t = Timer('test_non_del')
        time.sleep(1)
        t.x

    test_inner()

test(3)
time.sleep(1)
# [2018-03-10 02:16:48]: Timer [00:00:01]: test_non_del, start at 2018-03-10 02:16:47.
# [2018-03-10 02:16:48]: Timer [00:00:02]: test(a=3), start at 2018-03-10 02:16:46.
# [2018-03-10 02:16:48]: Timer [00:00:02]: test(3), start at 2018-03-10 02:16:46.
# [2018-03-10 02:16:49]: Timer [00:00:03]: <module>: __main__ (temp_code.py), start at 2018-03-10 02:16:46.

passed¶: Return the cost_seconds after starting up.

string¶: Only return the expect_string quietly.

tick()[source]¶: Return the time cost string as expect.

static watch(*timer_args, **timer_kwargs)[source]¶: Decorator for Timer.

x¶: Call self.log_func(self) and return expect_string.

class torequests.utils.ClipboardWatcher(interval=0.2, callback=None)[source]¶

Watch clipboard with pyperclip, run callback while changed.

current¶: Return the current clipboard content.

default_callback(text)[source]¶: Default clean the n in text.

read()[source]¶: Return the current clipboard content.

watch(limit=None, timeout=None)[source]¶: Block method to watch the clipboard changing.

watch_async(limit=None, timeout=None)[source]¶: Non-block method to watch the clipboard changing.

write(text)[source]¶: Rewrite the current clipboard content.

x¶: Return self.watch()

class torequests.utils.Saver(path=None, save_mode='json', auto_backup=False, **saver_args)[source]¶

Simple object persistent toolkit with pickle/json, if only you don’t care the performance and security. Do not set the key startswith “_”

Parameters:	path – if not set, will be ~/_saver.db. print(self._path) to show it. Set pickle’s protocol < 3 for compatibility between python2/3, but use -1 for performance and some other optimizations. save_mode – pickle / json.

>>> ss = Saver()
>>> ss._path
'/home/work/_saver.json'
>>> ss.a = 1
>>> ss['b'] = 2
>>> str(ss)
{'a': 1, 'b': 2, 'c': 3, 'd': 4}
>>> del ss.b
>>> str(ss)
"{'a': 1, 'c': 3, 'd': 4}"
>>> ss._update({'c': 3, 'd': 4})
>>> ss
Saver(path="/home/work/_saver.json"){'a': 1, 'c': 3, 'd': 4}

torequests.utils.guess_interval(nums, accuracy=0)[source]¶

Given a seq of number, return the median, only calculate interval >= accuracy.

Basic Usage:

from torequests.utils import guess_interval
import random

seq = [random.randint(1, 100) for i in range(20)]
print(guess_interval(seq, 5))
# sorted_seq: [2, 10, 12, 19, 19, 29, 30, 32, 38, 40, 41, 54, 62, 69, 75, 79, 82, 88, 97, 99]
# diffs: [8, 7, 10, 6, 13, 8, 7, 6, 6, 9]
# median: 8

torequests.utils.split_n(string, seps, reg=False)[source]¶

Split strings into n-dimensional list.

Basic Usage:

from torequests.utils import split_n

ss = '''a b c  d e f  1 2 3  4 5 6
a b c  d e f  1 2 3  4 5 6
a b c  d e f  1 2 3  4 5 6'''

print(split_n(ss, ('\n', '  ', ' ')))
# [[['a', 'b', 'c'], ['d', 'e', 'f'], ['1', '2', '3'], ['4', '5', '6']], [['a', 'b', 'c'], ['d', 'e', 'f'], ['1', '2', '3'], ['4', '5', '6']], [['a', 'b', 'c'], ['d', 'e', 'f'], ['1', '2', '3'], ['4', '5', '6']]]
print(split_n(ss, ['\s+'], reg=1))
# ['a', 'b', 'c', 'd', 'e', 'f', '1', '2', '3', '4', '5', '6', 'a', 'b', 'c', 'd', 'e', 'f', '1', '2', '3', '4', '5', '6', 'a', 'b', 'c', 'd', 'e', 'f', '1', '2', '3', '4', '5', '6']

torequests.utils.register_re_findone()[source]¶: import re; re.findone = find_one

class torequests.utils.Cooldown(init_items=None, interval=0, born_at_now=False)[source]¶

Thread-safe Cooldown toolkit.

Parameters:	init_items – iterables to add into the default queue at first. interval – each item will cooldown interval seconds before return. born_at_now – if be set True, the item.use_at will be set time.time() instead of 0 when adding to queue at the first time.

>>> from torequests.logs import print_info
>>> cd = Cooldown(range(1, 3), interval=2)
>>> cd.add_items([3, 4])
>>> cd.add_item(5)
>>> for _ in range(7):
...     print_info(cd.get(1, 'timeout'))
[2019-01-17 01:50:59] pyld.py(152): 1
[2019-01-17 01:50:59] pyld.py(152): 3
[2019-01-17 01:50:59] pyld.py(152): 5
[2019-01-17 01:50:59] pyld.py(152): 2
[2019-01-17 01:50:59] pyld.py(152): 4
[2019-01-17 01:51:00] pyld.py(152): timeout
[2019-01-17 01:51:01] pyld.py(152): 1
>>> cd.size
5

add_item(item)[source]¶

add_items(items)[source]¶

all_items¶

get(timeout=None, default=None)[source]¶

get_now_timestamp()[source]¶

remove_item(item)[source]¶

remove_items(items)[source]¶

size¶

torequests.utils.curlrequests(curl_string, **kwargs)[source]¶

Use tPool to request for curl string. If kwargs contains the req which hasattr request method, like req=requests.

Parameters:	curl_string (dict) – standard curl string. kwargs – valid kwargs for tPool.

Basic Usage:

from torequests.utils import curlrequests


r = curlrequests('''curl 'http://p.3.cn/' -H 'Connection: keep-alive' -H 'Cache-Control: max-age=0' -H 'Upgrade-Insecure-Requests: 1' -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36' -H 'DNT: 1' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: zh-CN,zh;q=0.9,en;q=0.8' -H 'If-None-Match: "55dd9090-264"' -H 'If-Modified-Since: Wed, 26 Aug 2015 10:10:24 GMT' --compressed''', retry=1)
print(r.text)

torequests.utils.sort_url_query(url, reverse=False, _replace_kwargs=None)[source]¶: sort url query args. _replace_kwargs is a dict to update attributes before sorting (such as scheme / netloc…). http://www.google.com?b=2&z=26&a=1 => http://www.google.com?a=1&b=2&z=26

torequests.utils.retry(tries=1, exceptions: Tuple[Type[BaseException]] = (<class 'Exception'>, ), catch_exception=False)[source]¶

torequests.utils.get_readable_size(input_num, unit=None, rounded=<object object>, format='%s %s', units=None, carry=1024)[source]¶

Show the num readable with unit.

Parameters:	input_num (float, int) – raw number unit (str, optional) – target unit, defaults to None for auto set. rounded (None or int, optional) – defaults to NotSet return raw float without round. format (str, optional) – output string format, defaults to “%s %s” units (list, optional) – unit list, defaults to None for computer storage unit carry (int, optional) – carry a number as in adding, defaults to 1024
Returns:	string for input_num with unit.
Return type:	str

torequests.configs¶

class torequests.configs.Config[source]¶

Some global default configs.

TIMEZONE = 0¶

wait_futures_before_exiting = True¶

torequests.crawlers¶

torequests.exceptions¶

exception torequests.exceptions.CommonException(name)[source]¶: This Exception mainly used for bool(self) is False, and not callable.

exception torequests.exceptions.FailureException(error, name=None)[source]¶

Use self.error to review the origin exception.

text¶

exception torequests.exceptions.ImportErrorModule(name)[source]¶

torequests.logs¶

torequests.logs.init_logger(name='', handler_path_levels=None, level=20, formatter=None, formatter_str=None, datefmt='%Y-%m-%d %H:%M:%S')[source]¶

Add a default handler for logger.

Args:

name = ‘’ or logger obj.

handler_path_levels = [[‘loggerfile.log’,13],[‘’,’DEBUG’],[‘’,’info’],[‘’,’notSet’]] # [[path,level]]

level = the least level for the logger.

formatter = logging.Formatter(

‘%(levelname)-7s %(asctime)s %(name)s (%(filename)s: %(lineno)s): %(message)s’,: “%Y-%m-%d %H:%M:%S”)

formatter_str = ‘%(levelname)-7s %(asctime)s %(name)s (%(funcName)s: %(lineno)s): %(message)s’

custom formatter:: %(asctime)s %(created)f %(filename)s %(funcName)s %(levelname)s %(levelno)s %(lineno)s %(message)s %(module)s %(name)s %(pathname)s %(process)s %(relativeCreated)s %(thread)s %(threadName)s

torequests.logs.print_info(*messages, **kwargs)[source]¶

Simple print use logger, print with time / file / line_no.

param sep:	sep of messages, ” ” by default.

Basic Usage:

print_info(1, 2, 3)
print_info(1, 2, 3)
print_info(1, 2, 3)

# [2018-10-24 19:12:16] temp_code.py(7): 1 2 3
# [2018-10-24 19:12:16] temp_code.py(8): 1 2 3
# [2018-10-24 19:12:16] temp_code.py(9): 1 2 3

torequests.frequency_controller.sync_tools¶

class torequests.frequency_controller.sync_tools.Frequency(n=None, interval=0)[source]¶

Frequency controller, means concurrent running n tasks every interval seconds.

Basic Usage:

from torequests.frequency_controller.sync_tools import Frequency
from concurrent.futures import ThreadPoolExecutor
from time import time

# limit to 2 concurrent tasks each 1 second
frequency = Frequency(2, 1)

def test():
    with frequency:
        return time()

now = time()
pool = ThreadPoolExecutor()
tasks = []
for _ in range(5):
    tasks.append(pool.submit(test))
result = [task.result() for task in tasks]
assert result[0] - now < 1
assert result[1] - now < 1
assert result[2] - now > 1
assert result[3] - now > 1
assert result[4] - now > 2
assert frequency.to_dict() == {'n': 2, 'interval': 1}
assert frequency.to_list() == [2, 1]

TIMER()¶

time() -> floating point number

Return the current time in seconds since the Epoch. Fractions of a second may be present if the system clock provides them.

classmethod ensure_frequency(frequency)[source]¶

Ensure the given args is Frequency.

Parameters:	frequency (Frequency / dict / list / tuple) – args to create a Frequency instance.
Returns:	Frequency instance
Return type:	Frequency

gen¶

generator(n=2, interval=1)[source]¶

interval¶

lock¶

n¶

repr¶

to_dict()[source]¶: Return the dict {‘n’: self.n, ‘interval’: self.interval}

to_list()[source]¶: Return the [self.n, self.interval]

torequests.frequency_controller.async_tools¶

class torequests.frequency_controller.async_tools.AsyncFrequency(n=None, interval=0)[source]¶

AsyncFrequency controller, means concurrent running n tasks every interval seconds.

Basic Usage:

from torequests.frequency_controller.async_tools import AsyncFrequency
from asyncio import ensure_future, get_event_loop
from time import time


async def test_async():
    frequency = AsyncFrequency(2, 1)

    async def task():
        async with frequency:
            return time()

    now = time()
    tasks = [ensure_future(task()) for _ in range(5)]
    result = [await task for task in tasks]
    assert result[0] - now < 1
    assert result[1] - now < 1
    assert result[2] - now > 1
    assert result[3] - now > 1
    assert result[4] - now > 2
    assert frequency.to_dict() == {'n': 2, 'interval': 1}
    assert frequency.to_list() == [2, 1]

get_event_loop().run_until_complete(test_async())

TIMER()¶

time() -> floating point number

Return the current time in seconds since the Epoch. Fractions of a second may be present if the system clock provides them.

classmethod ensure_frequency(frequency)[source]¶

Ensure the given args is AsyncFrequency.

Parameters:	frequency (AsyncFrequency / dict / list / tuple) – args to create a AsyncFrequency instance.
Returns:	AsyncFrequency instance
Return type:	AsyncFrequency

gen¶

generator(n, interval)[source]¶

interval¶

lock¶

n¶

repr¶

to_dict()[source]¶: Return the dict {‘n’: self.n, ‘interval’: self.interval}

to_list()[source]¶: Return the [self.n, self.interval]

torequests package¶

torequests.main¶

torequests.dummy¶

torequests.utils¶

torequests.configs¶

torequests.crawlers¶

torequests.exceptions¶

torequests.logs¶

torequests.frequency_controller.sync_tools¶

torequests.frequency_controller.async_tools¶

Table of Contents

This Page

param seconds:	seconds (float/int).
param accuracy:	4 by default (units[:accuracy]), determine the length of elements.
param format:	index of [led, literal, dict].
param lang:	en or cn.
param units:	day, hour, minute, second, ms.