docdeid.str#
docdeid.str.processor#
- class docdeid.str.processor.StringProcessor#
Bases:
ABCAbstract class for string processing.
- abstract process_items(items: Iterable[str]) list[str]#
Process an iterable of strings.
- Parameters:
items – The input items.
- Returns:
The processed items.
- class docdeid.str.processor.StringModifier#
Bases:
StringProcessor,ABCModifies strings.
- abstract process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- process_items(items: Iterable[str]) list[str]#
Process an iterable of strings.
- Parameters:
items – The input items.
- Returns:
The processed items.
- class docdeid.str.processor.StringFilter#
Bases:
StringProcessor,ABCFilters strings.
- abstract filter(item: str) bool#
Filters strings.
- Parameters:
item – The input string.
- Returns:
Trueto keep the item,Falseto remove it (same asfilterbuiltin).
- process_items(items: Iterable[str]) list[str]#
Process an iterable of strings.
- Parameters:
items – The input items.
- Returns:
The processed items.
- class docdeid.str.processor.LowercaseString#
Bases:
StringModifierLowercase a string.
- process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- class docdeid.str.processor.StripString#
Bases:
StringModifierStrip string (whitespaces, tabs, newlines, etc.
at start/end).
- process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- class docdeid.str.processor.RemoveNonAsciiCharacters#
Bases:
StringModifierRemoves non-ascii characters from a string.
E.g.: Renée -> Rene.
- process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- class docdeid.str.processor.ReplaceNonAsciiCharacters#
Bases:
StringModifierMaps non-ascii characters to ascii characters.
E.g.: Renée -> Renee. It’s advised to test this before using as mapping can be tricky in practice for some characters.
- process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- class docdeid.str.processor.ReplaceValue(find_value: str, replace_value: str)#
Bases:
StringModifierReplaces a value in a string, literally.
- Parameters:
find_value – The value to be replaced.
replace_value – The value to replace with.
- process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- class docdeid.str.processor.ReplaceValueRegexp(find_value: str, replace_value: str)#
Bases:
StringModifierReplace a value in a string with regexp.
- Parameters:
find_value – The input regexp.
replace_value – The value to replace it with.
- process(item: str) str#
Processes a string by modifying it.
- Parameters:
item – The input string.
- Returns:
The output string.
- class docdeid.str.processor.FilterByLength(min_len: int)#
Bases:
StringFilterFilter by length.
- Parameters:
min_len – The minimum length. Strings shorter than this will be filtered out.
- filter(item: str) bool#
Filters strings.
- Parameters:
item – The input string.
- Returns:
Trueto keep the item,Falseto remove it (same asfilterbuiltin).