The util
module¶
General library of small utility functions.
-
ferenda.util.
ns
¶ A mapping of well-known prefixes and their corresponding namespaces. Includes
dc
,dcterms
,rdfs
,rdf
,skos
,xsd
,foaf
,owl
,xhv
,prov
andbibo
.
-
ferenda.util.
mkdir
(newdir)[source]¶ Like
os.makedirs()
, but doesn’t raise an exception if the directory already exists.
-
ferenda.util.
ensure_dir
(filename)[source]¶ Given a filename (typically one that you wish to create), ensures that the directory the file is in actually exists.
-
ferenda.util.
robust_rename
(old, new)[source]¶ Rename old to new no matter what (if the file exists, it’s removed, if the target dir doesn’t exist, it’s created)
-
ferenda.util.
robust_remove
(filename)[source]¶ Removes a filename no matter what (unlike
os.unlink()
, does not raise an error if the file does not exist).
-
ferenda.util.
relurl
(url, starturl)[source]¶ Works like
os.path.relpath()
, but for urls>>> relurl("http://example.org/other/index.html", "http://example.org/main/index.html") == '../other/index.html' True >>> relurl("http://other.org/foo.html", "http://example.org/bar.html") == 'http://other.org/foo.html' True
-
ferenda.util.
numcmp
(x, y)[source]¶ Works like
cmp
in python 2, but compares two strings using a ‘natural sort’ order, ie “10” < “2”. Also handles strings that contains a mixture of numbers and letters, ie “2” < “2 a”.Return negative if x<y, zero if x==y, positive if x>y.
>>> numcmp("10", "2") 1 >>> numcmp("2", "2 a") -1 >>> numcmp("3", "2 a") 1
-
ferenda.util.
split_numalpha
(s)[source]¶ Converts a string into a list of alternating string and integers. This makes it possible to sort a list of strings numerically even though they might not be fully convertable to integers
>>> split_numalpha('10 a §') == ['', 10, ' a §'] True >>> sorted(['2 §', '10 §', '1 §'], key=split_numalpha) == ['1 §', '2 §', '10 §'] True
-
ferenda.util.
runcmd
(cmdline, require_success=False, cwd=None, cmdline_encoding=None, output_encoding='utf-8')[source]¶ Run a shell command, wait for it to finish and return the results.
Parameters: - cmdline (str) – The full command line (will be passed through a shell)
- require_success (bool) – If the command fails (non-zero exit code), raise
ExternalCommandError
- cwd – The working directory for the process to run
Returns: The returncode, all stdout output, all stderr output
Return type:
-
ferenda.util.
normalize_space
(string)[source]¶ Normalize all whitespace in string so that only a single space between words is ever used, and that the string neither starts with nor ends with whitespace.
>>> normalize_space(" This is a long \n string\n") == 'This is a long string' True
-
ferenda.util.
list_dirs
(d, suffix=None, reverse=False)[source]¶ A generator that works much like
os.listdir()
, only recursively (and only returns files, not directories).Parameters: Returns: the full path (starting from d) of each matching file
Return type: generator
-
ferenda.util.
replace_if_different
(src, dst, archivefile=None)[source]¶ Like
shutil.move()
, except the src file isn’t moved if the dst file already exists and is identical to src. Also doesn’t require that the directory of dst exists beforehand.Note: regardless of whether it was moved or not, src is always deleted.
Parameters: Returns: True if src was copied to dst, False otherwise
Return type:
-
ferenda.util.
copy_if_different
(src, dest)[source]¶ Like
shutil.copyfile()
, except the src file isn’t copied if the dst file already exists and is identical to src. Also doesn’t require that the directory of dst exists beforehand.param src: The source file to move type src: str param dst: The destination file type dst: str returns: True if src was copied to dst, False otherwise rtype: bool
-
ferenda.util.
outfile_is_newer
(infiles, outfile)[source]¶ Check if a given outfile is newer (has a more recent modification time) than a list of infiles. Returns True if so, False otherwise (including if outfile doesn’t exist).
-
ferenda.util.
link_or_copy
(src, dst)[source]¶ Create a symlink at dst pointing back to src on systems that support it. On other systems (i.e. Windows), copy src to dst (using
copy_if_different()
)
-
ferenda.util.
ucfirst
(string)[source]¶ Returns string with first character uppercased but otherwise unchanged.
>>> ucfirst("iPhone") == 'IPhone' True
-
ferenda.util.
rfc_3339_timestamp
(dt)[source]¶ Converts a datetime object to a RFC 3339-style date
>>> rfc_3339_timestamp(datetime.datetime(2013, 7, 2, 21, 20, 25)) == '2013-07-02T21:20:25-00:00' True
-
ferenda.util.
parse_rfc822_date
(httpdate)[source]¶ Converts a RFC 822-type date string (more-or-less the same as a HTTP-date) to an UTC-localized (naive) datetime.
>>> parse_rfc822_date("Mon, 4 Aug 1997 02:14:00 EST") datetime.datetime(1997, 8, 4, 7, 14)
-
ferenda.util.
strptime
(datestr, format)[source]¶ Like datetime.strptime, but guaranteed to not be affected by current system locale – all datetime parsing is done using the C locale.
>>> strptime("Mon, 4 Aug 1997 02:14:05", "%a, %d %b %Y %H:%M:%S") datetime.datetime(1997, 8, 4, 2, 14, 5)
-
ferenda.util.
readfile
(filename, mode='r', encoding='utf-8')[source]¶ Opens filename, reads it’s contents and returns them as a string.
-
ferenda.util.
writefile
(filename, contents, encoding='utf-8')[source]¶ Create filename and write contents to it.
-
ferenda.util.
extract_text
(html, start, end, decode_entities=True, strip_tags=True)[source]¶ Given html, a string of HTML content, and two substrings (start and end) present in this string, return all text between the substrings, optionally decoding any HTML entities and removing HTML tags.
>>> extract_text("<body><div><b>Hello</b> <i>World</i>™</div></body>", ... "<div>", "</div>") == 'Hello World™' True >>> extract_text("<body><div><b>Hello</b> <i>World</i>™</div></body>", ... "<div>", "</div>", decode_entities=False) == 'Hello World™' True >>> extract_text("<body><div><b>Hello</b> <i>World</i>™</div></body>", ... "<div>", "</div>", strip_tags=False) == '<b>Hello</b> <i>World</i>™' True
-
ferenda.util.
merge_dict_recursive
(base, other)[source]¶ Merges the other dict into the base dict. If any value in other is itself a dict and the base also has a dict for the same key, merge these sub-dicts (and so on, recursively).
>>> base = {'a': 1, 'b': {'c': 3}} >>> other = {'x': 4, 'b': {'y': 5}} >>> want = {'a': 1, 'x': 4, 'b': {'c': 3, 'y': 5}} >>> got = merge_dict_recursive(base, other) >>> got == want True >>> base == want True
-
ferenda.util.
resource_extract
(resource_name, outfile, params={})[source]¶ Copy a file from the ferenda package resources to a specified path, optionally performing variable substitutions on the contents of the file.
Parameters: - resource_name – The named resource (eg ‘res/sparql/annotations.rq’)
- outfile – Path to extract the resource to
- params – A dict of parameters, to be used with regular string subtitutions in the resource file.
-
ferenda.util.
uri_leaf
(uri)[source]¶ Get the “leaf” - fragment id or last segment - of a URI. Useful e.g. for getting a term from a “namespace like” URI.
>>> uri_leaf("http://purl.org/dc/terms/title") == 'title' True >>> uri_leaf("http://www.w3.org/2004/02/skos/core#Concept") == 'Concept' True >>> uri_leaf("http://www.w3.org/2004/02/skos/core#") # returns None
-
ferenda.util.
logtime
(method, format='The operation took %(elapsed).3f sec', values={})[source]¶ A context manager that uses the supplied method and format string to log the elapsed time:
with util.logtime(log.debug, "Basefile %(basefile)s took %(elapsed).3f s", {'basefile':'foo'}): do_stuff_that_takes_some_time()
This results in a call like log.debug(“Basefile foo took 1.324 s”).
-
ferenda.util.
c_locale
(category=2)[source]¶ Temporarily change process locale to the C locale, for use when eg parsing English dates on a system that may have non-english locale.
>>> with c_locale(): ... datetime.datetime.strptime("August 2013", "%B %Y") datetime.datetime(2013, 8, 1, 0, 0)
-
ferenda.util.
from_roman
(s)[source]¶ convert Roman numeral to integer.
>>> from_roman("MCMLXXXIV") 1984