lkj.strings

String Utilities Module

This module provides a comprehensive set of utility functions and classes for working with strings in Python. It includes tools for string manipulation, formatting, pretty-printing, and find/replace operations.

Core Components:

StringAppender: A helper class for collecting strings, useful for capturing output that would otherwise be printed.
indent_lines: Indents each line of a string by a specified prefix.
most_common_indent: Determines the most common indentation used in a multi-line string.
FindReplaceTool: A class for advanced find-and-replace operations on strings, supporting regular expressions, match history, and undo functionality.

Pretty-Printing Functions:

print_list: Prints lists in various human-friendly formats (wrapped, columns, numbered, bullet, table, compact), with options for width, separators, and custom print functions.
print_list.as_table: Formats and prints a list (or list of lists) as a table, with optional headers and alignment.
print_list.summary: Prints a summary of a list, showing first few and last few items if the list is long.
print_list.compact, print_list.wrapped, print_list.columns, print_list.numbered, print_list.bullets: Convenience methods using print_list’s partial functionality for common display styles.

These utilities are designed to make it easier to display, format, and manipulate strings and collections of strings in a readable and flexible way.

class lkj.strings.FindReplaceTool(text: str, *, line_mode: bool = False, flags: int = 0, show_line_numbers: bool = True, context_size: int = 2, highlight_char: str = '^')[source]

A general-purpose find-and-replace tool that can treat the input text as a continuous sequence of characters, even if operations such as viewing context are performed line by line. The tool can analyze matches based on a user-supplied regular expression, navigate through the matches with context, and perform replacements either interactively or in bulk. Replacements can be provided as either a static string or via a callback function that receives details of the match.

Instead of keeping a single modified text, this version maintains a history of text versions in self._text_versions, where self._text_versions[0] is the original text and self._text_versions[-1] is the current text. Each edit is performed on the current version and appended to the history. Additional methods allow reverting changes.

1: Basic usage

>>> FindReplaceTool("apple banana apple").find_and_print_matches(r'apple')
Match 0 (around line 1):
apple banana apple
^^^^^
----------------------------------------
Match 1 (around line 1):
apple banana apple
             ^^^^^
----------------------------------------
>>> FindReplaceTool("apple banana apple").find_and_replace(r'apple', "orange")
'orange banana orange'

2: Using line_mode=True with a static replacement.

>>> text1 = "apple\nbanana apple\ncherry"
>>> tool = FindReplaceTool(text1, line_mode=True, flags=re.MULTILINE)
>>> import re
>>> # Find all occurrences of "apple" (two in total).
>>> _ = tool.analyze(r'apple')
>>> len(tool._matches)
2
>>> # Replace the first occurrence ("apple" on the first line) with "orange".
>>> tool.replace_one(0, "orange").get_modified_text()
'orange\nbanana apple\ncherry'

3: Using line_mode=False with a callback replacement.

>>> text2 = "apple banana apple"
>>> tool2 = FindReplaceTool(text2, line_mode=False)
>>> # Find all occurrences of "apple" in the continuous text.
>>> len(tool2.analyze(r'apple')._matches)
2
>>> # Define a callback that converts each matched text to uppercase.
>>> def to_upper(match):
...     return match["matched_text"].upper()
>>> tool2.replace_all(to_upper).get_modified_text()
'APPLE banana APPLE'

4: Reverting changes.

>>> text3 = "one two three"
>>> tool3 = FindReplaceTool(text3)
>>> import re
>>> # Analyze to match the first word "one" (at the start of the text).
>>> tool3.analyze(r'^one').replace_one(0, "ONE").get_modified_text()
'ONE two three'
>>> # Revert the edit.
>>> tool3.revert()
'one two three'

analyze(pattern: str) → None[source]: Searches the current text (the last version) for occurrences matching the given regular expression. Any match data (including group captures) is stored internally.

find_and_print_matches(pattern: str) → None[source]: Searches the current text (the last version) for occurrences matching the given regular expression. Any match data (including group captures) is stored internally.

find_and_replace(pattern: str, replacement: str | Callable[[Dict[str, Any]], str]) → None[source]: Searches the current text (the last version) for occurrences matching the given regular expression. Any match data (including group captures) is stored internally.

get_modified_text() → str[source]: Returns the current (latest) text version.

get_original_text() → str[source]: Returns the original text (first version).

replace_all(replacement: str | Callable[[Dict[str, Any]], str]) → None[source]: Replaces all stored matches in the current text version. The ‘replacement’ argument may be a static string or a callable (see replace_one for details). Replacements are performed from the last match to the first, so that earlier offsets are not affected.

replace_one(match_index: int, replacement: str | Callable[[Dict[str, Any]], str]) → None[source]: Replaces a single match, identified by match_index, with a new string. The ‘replacement’ argument may be either a static string or a callable. When it is a callable, it is called with a dictionary containing the match data (including any captured groups) and should return the replacement string. The replacement is performed on the current text version, and the new text is appended as a new version in the history.

revert(steps: int = 1)[source]

Reverts the current text version by removing the last ‘steps’ versions from the history. The original text (version 0) is never removed. Returns the new current text.

>>> text = "one two three"
>>> tool = FindReplaceTool(text)
>>> import re
>>> tool.analyze(r'^one').replace_one(0, "ONE").get_modified_text()
'ONE two three'
>>> tool.revert()
'one two three'

view_matches() → None[source]: Displays all stored matches along with surrounding context. When line_mode is enabled, the context is provided in full lines with (optionally) line numbers, and a line is added below the matched line to indicate the matched portion. In non-line mode, a snippet of characters around the match is shown.

class lkj.strings.StringAppender(separator='\n')[source]

Helper class to collect strings instead of printing them directly.

get_string()[source]: Alternative way to get the string.

lkj.strings.camel_to_snake(camel_string)[source]

Convert a CamelCase string to snake_case. Useful for converting class names to variable names.

Parameters:: camel_string (str) – The CamelCase string to convert.
Returns:: The converted snake_case string.
Return type:: str

Examples

>>> camel_to_snake('BasicParseTest')
'basic_parse_test'
>>> camel_to_snake('HTMLParser')
'html_parser'
>>> camel_to_snake('CamelCaseExample')
'camel_case_example'

Note that acronyms are handled correctly:

>>> camel_to_snake('XMLHttpRequestTest')
'xml_http_request_test'

lkj.strings.fields_of_string_formats(templates, *, aggregator=<class 'set'>)[source]

Extract all unique field names from the templates in _github_url_templates using string.Formatter.

Parameters:: templates (list) – A list of dictionaries containing ‘template’ keys.
Returns:: A sorted list of unique field names found in the templates.
Return type:: list

Example

>>> templates = ['{this}/and/{that}', 'and/{that}/is/an/{other}']
>>> sorted(fields_of_string_formats(templates))
['other', 'that', 'this']

lkj.strings.indent_lines(string: str, indent: str, *, line_sep='\n') → str[source]

Indent each line of a string.

Parameters:

string – The string to indent.
indent – The string to use for indentation.

Returns:

The indented string.

>>> print(indent_lines('This is a test.\nAnother line.', ' ' * 8))
        This is a test.
        Another line.

lkj.strings.most_common_indent(string: str, ignore_first_line=False) → str[source]

Find the most common indentation in a string.

Parameters:

string – The string to analyze.
ignore_first_line – Whether to ignore the first line when determining the indentation. Default is False. One case where you want True is when using python triple quotes (as in docstrings, for example), since the first line often has no indentation (from the point of view of the string, in this case.

Returns:

The most common indentation string.

Examples:

>>> most_common_indent('    This is a test.\n    Another line.')
'    '

lkj.strings.print_list(items: ~typing.Iterable[~typing.Any] | None = None, *, style: ~typing.Literal['wrapped', 'columns', 'numbered', 'bullet', 'table', 'compact'] = 'wrapped', max_width: int = 80, sep: str = ', ', line_prefix: str = '', items_per_line=None, show_count: bool | ~typing.Callable[[int], str] = False, title=None, print_func=<built-in function print>)[source]

Print a list in a nice, readable format with multiple style options.

Parameters:

items – The list or iterable to print. If None, returns a partial function.
style – One of “wrapped”, “columns”, “numbered”, “bullet”, “table”, “compact”
max_width – Maximum width for wrapped style
sep – Separator for items
line_prefix – Prefix for each line
items_per_line – For columns style, how many items per line
show_count – Whether to prefix with the count of items
title – Optional title to display before the list
print_func – Function to use for printing. Defaults to print. If None, returns the string instead of printing.

Examples

>>> items = ["apple", "banana", "cherry", "date", "elderberry", "fig"]

# Wrapped style (default) >>> print_list(items, max_width=30) apple, banana, cherry, date, elderberry, fig

# Columns style >>> print_list(items, style=”columns”, items_per_line=3) apple banana cherry date elderberry fig

# Numbered style >>> print_list(items, style=”numbered”) 1. apple 2. banana 3. cherry 4. date 5. elderberry 6. fig

# Bullet style >>> print_list(items, style=”bullet”) • apple • banana • cherry • date • elderberry • fig

# Return string instead of printing >>> result = print_list(items, style=”numbered”, print_func=None, show_count=True) >>> print(result) List (6 items): 1. apple 2. banana 3. cherry 4. date 5. elderberry 6. fig

Partial function functionality: If you don’t specify the items (or items=None), the function returns a partial function that can be called with the items later. That is, the print_list acts as a factory function for different printing styles.

>>> numbered_printer = print_list(style="numbered", show_count=False)
>>> numbered_printer(items)
apple
banana
cherry
date
elderberry
fig

>>> compact_printer = print_list(style="compact", max_width=60, show_count=False)
>>> compact_printer(items)
apple, banana, cherry, date, elderberry, fig

>>> bullet_printer = print_list(style="bullet", print_func=None, show_count=False)
>>> result = bullet_printer(items)
>>> print(result)
• apple
• banana
• cherry
• date
• elderberry
• fig

lkj.strings.print_list_as_table(items, headers=None, *, max_width=80, align='left', print_func=<built-in function print>)[source]

Print a list as a nicely formatted table.

Parameters:

items – List of items (strings, numbers, or objects with __str__)
headers – Optional list of column headers
max_width – Maximum width of the table
align – Alignment for columns (“left”, “right”, “center”)
print_func – Function to use for printing. Defaults to print. If None, returns the string instead of printing.

Examples

>>> data = [["Name", "Age", "City"], ["Alice", 25, "NYC"], ["Bob", 30, "LA"]]
>>> print_list_as_table(data)
Name  | Age | City
-----|---|----
Alice | 25  | NYC
Bob   | 30  | LA

# Return string instead of printing >>> result = print_list_as_table(data, print_func=None) >>> print(result) Name | Age | City —–|---|—- Alice | 25 | NYC Bob | 30 | LA

lkj.strings.print_list_summary(items, *, max_items=10, show_total=True, title=None, print_func=<built-in function print>)[source]

Print a summary of a list, showing first few and last few items if the list is long.

Parameters:

items – The list to summarize
max_items – Maximum number of items to show (first + last)
show_total – Whether to show the total count
title – Optional title
print_func – Function to use for printing. Defaults to print. If None, returns the string instead of printing.

Examples

>>> long_list = list(range(100))
>>> print_list_summary(long_list, max_items=6)
List (100 items):
[0, 1, 2, ..., 97, 98, 99]

>>> print_list_summary(long_list, max_items=10)
List (100 items):
[0, 1, 2, 3, 4, ..., 95, 96, 97, 98, 99]

# Return string instead of printing >>> result = print_list_summary(long_list, max_items=6, print_func=None) >>> print(result) List (100 items): [0, 1, 2, …, 97, 98, 99]

lkj.strings.regex_based_substitution(replacements: dict, regex=None, s: str | None = None)[source]

Construct a substitution function based on an iterable of replacement pairs.

Parameters:: replacements (iterable[tuple[str, str]]) – An iterable of (replace_this, with_that) pairs.
Returns:: A function that, when called with a string, will perform all substitutions.
Return type:: Callable[[str], str]

The function is meant to be used with replacements as its single input, returning a substitute function that will carry out the substitutions on an input string.

>>> replacements = {'apple': 'orange', 'banana': 'grape'}
>>> substitute = regex_based_substitution(replacements)
>>> substitute("I like apple and bananas.")
'I like orange and grapes.'

You have access to the replacements and regex attributes of the substitute function. See how the replacements dict has been ordered by descending length of keys. This is to ensure that longer keys are replaced before shorter keys, avoiding partial replacements.

>>> substitute.replacements
{'banana': 'grape', 'apple': 'orange'}

lkj.strings.snake_to_camel(snake_string)[source]

Convert a snake_case string to CamelCase. Useful for converting variable names to class names.

Parameters:: snake_string (str) – The snake_case string to convert.
Returns:: The converted CamelCase string.
Return type:: str

Examples

>>> snake_to_camel('complex_tokenizer')
'ComplexTokenizer'
>>> snake_to_camel('simple_example_test')
'SimpleExampleTest'

Note that acronyms are capitalized correctly:

>>> snake_to_camel('xml_http_request_test')
'XmlHttpRequestTest'

lkj.strings.truncate_lines(s: str, top_limit: int | None = None, bottom_limit: int | None = None, middle_marker: str = '...') → str[source]

Truncates a string by limiting the number of lines from the top and bottom. If the total number of lines is greater than top_limit + bottom_limit, it keeps the first top_limit lines, keeps the last bottom_limit lines, and replaces the omitted middle portion with a single line containing middle_marker.

If top_limit or bottom_limit is None, it is treated as 0.

Example

>>> text = '''Line1
... Line2
... Line3
... Line4
... Line5
... Line6'''

>>> print(truncate_lines(text, top_limit=2, bottom_limit=2))
Line1
Line2
...
Line5
Line6

lkj.strings.truncate_string(s: str, *, left_limit=15, right_limit=15, middle_marker='...')[source]

Truncate a string to a maximum length, inserting a marker in the middle.

If the string is longer than the sum of the left_limit and right_limit, the string is truncated and the middle_marker is inserted in the middle.

If the string is shorter than the sum of the left_limit and right_limit, the string is returned as is.

>>> truncate_string('1234567890')
'1234567890'

But if the string is longer than the sum of the limits, it is truncated:

>>> truncate_string('1234567890', left_limit=3, right_limit=3)
'123...890'
>>> truncate_string('1234567890', left_limit=3, right_limit=0)
'123...'
>>> truncate_string('1234567890', left_limit=0, right_limit=3)
'...890'

If you’re using a specific parametrization of the function often, you can create a partial function with the desired parameters:

>>> from functools import partial
>>> truncate_string = partial(truncate_string, left_limit=2, right_limit=2, middle_marker='---')
>>> truncate_string('1234567890')
'12---90'
>>> truncate_string('supercalifragilisticexpialidocious')
'su---us'

lkj.strings.truncate_string_with_marker(s: str, *, left_limit=15, right_limit=15, middle_marker='...')

Truncate a string to a maximum length, inserting a marker in the middle.

If the string is longer than the sum of the left_limit and right_limit, the string is truncated and the middle_marker is inserted in the middle.

If the string is shorter than the sum of the left_limit and right_limit, the string is returned as is.

>>> truncate_string('1234567890')
'1234567890'

But if the string is longer than the sum of the limits, it is truncated:

>>> truncate_string('1234567890', left_limit=3, right_limit=3)
'123...890'
>>> truncate_string('1234567890', left_limit=3, right_limit=0)
'123...'
>>> truncate_string('1234567890', left_limit=0, right_limit=3)
'...890'

If you’re using a specific parametrization of the function often, you can create a partial function with the desired parameters:

>>> from functools import partial
>>> truncate_string = partial(truncate_string, left_limit=2, right_limit=2, middle_marker='---')
>>> truncate_string('1234567890')
'12---90'
>>> truncate_string('supercalifragilisticexpialidocious')
'su---us'

lkj.strings.unique_affixes(items: ~typing.Iterable[~typing.Sequence], suffix: bool = False, *, egress: ~typing.Callable | None = None, ingress: ~typing.Callable = <function identity>) → Iterable[Sequence][source]

Returns a list of unique prefixes (or suffixes) for the given iterable of sequences. Raises a ValueError if duplicates are found.

Parameters: - items: Iterable of sequences (e.g., list of strings). - suffix: If True, finds unique suffixes instead of prefixes. - ingress: Callable to preprocess each item. Default is identity function. - egress: Callable to postprocess each affix. Default is appropriate function based on item type.

Usually, ingress and egress are inverses of each other.

>>> unique_affixes(['apple', 'ape', 'apricot', 'banana', 'band', 'bandana'])
['app', 'ape', 'apr', 'bana', 'band', 'banda']

>>> unique_affixes(['test', 'testing', 'tester'])
['test', 'testi', 'teste']

>>> unique_affixes(['test', 'test'])
Traceback (most recent call last):
...
ValueError: Duplicate item detected: test

>>> unique_affixes(['abc', 'abcd', 'abcde'])
['abc', 'abcd', 'abcde']

>>> unique_affixes(['a', 'b', 'c'])
['a', 'b', 'c']

>>> unique_affixes(['x', 'xy', 'xyz'])
['x', 'xy', 'xyz']

>>> unique_affixes(['can', 'candy', 'candle'])
['can', 'candy', 'candl']

>>> unique_affixes(['flow', 'flower', 'flight'])
['flow', 'flowe', 'fli']

>>> unique_affixes(['ation', 'termination', 'examination'], suffix=True)
['ation', 'rmination', 'amination']

>>> import functools
>>> ingress = functools.partial(str.split, sep='.')
>>> egress = '.'.join
>>> items = ['here.and.there', 'here.or.there', 'here']
>>> unique_affixes(items, ingress=ingress, egress=egress)
['here.and', 'here.or', 'here']