docdeid.str#

docdeid.str.processor#

class docdeid.str.processor.StringProcessor#

Bases: ABC

Abstract class for string processing.

abstract process_items(items: Iterable[str]) list[str]#

Process an iterable of strings.

Parameters:

items – The input items.

Returns:

The processed items.

class docdeid.str.processor.StringModifier#

Bases: StringProcessor, ABC

Modifies strings.

abstract process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

process_items(items: Iterable[str]) list[str]#

Process an iterable of strings.

Parameters:

items – The input items.

Returns:

The processed items.

class docdeid.str.processor.StringFilter#

Bases: StringProcessor, ABC

Filters strings.

abstract filter(item: str) bool#

Filters strings.

Parameters:

item – The input string.

Returns:

True to keep the item, False to remove it (same as filter builtin).

process_items(items: Iterable[str]) list[str]#

Process an iterable of strings.

Parameters:

items – The input items.

Returns:

The processed items.

class docdeid.str.processor.LowercaseString#

Bases: StringModifier

Lowercase a string.

process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

class docdeid.str.processor.StripString#

Bases: StringModifier

Strip string (whitespaces, tabs, newlines, etc.

at start/end).

process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

class docdeid.str.processor.RemoveNonAsciiCharacters#

Bases: StringModifier

Removes non-ascii characters from a string.

E.g.: Renée -> Rene.

process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

class docdeid.str.processor.ReplaceNonAsciiCharacters#

Bases: StringModifier

Maps non-ascii characters to ascii characters.

E.g.: Renée -> Renee. It’s advised to test this before using as mapping can be tricky in practice for some characters.

process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

class docdeid.str.processor.ReplaceValue(find_value: str, replace_value: str)#

Bases: StringModifier

Replaces a value in a string, literally.

Parameters:
  • find_value – The value to be replaced.

  • replace_value – The value to replace with.

process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

class docdeid.str.processor.ReplaceValueRegexp(find_value: str, replace_value: str)#

Bases: StringModifier

Replace a value in a string with regexp.

Parameters:
  • find_value – The input regexp.

  • replace_value – The value to replace it with.

process(item: str) str#

Processes a string by modifying it.

Parameters:

item – The input string.

Returns:

The output string.

class docdeid.str.processor.FilterByLength(min_len: int)#

Bases: StringFilter

Filter by length.

Parameters:

min_len – The minimum length. Strings shorter than this will be filtered out.

filter(item: str) bool#

Filters strings.

Parameters:

item – The input string.

Returns:

True to keep the item, False to remove it (same as filter builtin).