shared.common¶

common.py

This module contains shared variables and utility functions used across the Flask application for processing uploaded files.

Usage:

Import this module in both app.py and processing.py to access and modify the shared progress_data.

shared.common.addFieldNode(sf, l, cat, shapes, fillcolors, colors, calc)¶

Creates graph node objects for an input source field.

Parameters:

sf (str) – Input source field replacement ID.
l (str) – Input source field label.
cat (str) – Input source field category.
shapes (dict) – List of shapes per source field category.
fillcolors (dict) – List of fill colors per source field category.
colors (dict) – List of border colors per source field category.
calc (str) – Input source field calculation expression.

Returns:

List of 2 node objects with:

name equal to the replacement ID,
label equal to the source field label,
shape/color/tooltip based on field type.

Return type:

list

shared.common.appendFieldsToDicts(l, k, v)¶

Append a fixed list of (key, value) to a list of dictionaries.

Parameters:

l (list) – Input list of dictionaries to update.
k (list) – List of fixed key names to append.
v (list) – List of fixed values to append corresponding to keys.

Returns:

Updated version of input list with new (key, value) pairs appended to each dictionary.

Return type:

list

shared.common.backwardDependencies(df, f, level=0, c=None, _cache=None)¶

Recursively get all backward dependencies of a field with memoization and canonical caching (cache stores subtree as if called with level=0).

Parameters:

df (pandas.DataFrame) – Input data frame.
f (str) – Source field replacement ID.
level (int) – Depth from the original root (0=root, 1=first dep, …).
c (str|None) – Parent (the child in your naming) at previous step.
_cache (dict|None) – Internal memoization cache mapping node -> canonical list.

Returns:

Items like {“parent”, “child”, “level”, “category”} with string levels (e.g. “-1”).

Return type:

list[dict]

shared.common.deduplicate_graph(G: Dot) → Dot¶: Return a new graph with duplicate nodes and edges removed.

shared.common.fieldCalculationDependencies(l, x)¶

List direct dependencies in a calculation based on a list of possible values.

Parameters:

l (list) – A list of all possible values that can be matched.
x (str) – An input calculation string.

Returns:

A list of values from the list l that were matched in the: string x.

Return type:

list

shared.common.fieldCalculationMapping(c, s, d, l)¶

Replace all external and internal field references by unique source/field IDs

Parameters:

c (str) – The source field calculation string.
s (str) – The source field name.
d (dict) – A dictionary mapping source fields to their replacement IDs.
l (list) – A list of unique field names.

Returns:

The calculation string without comments and with all field: ID references replaced by their corresponding unique replacement ID references.

Return type:

str

Notes

This function assumes that external fields are referenced as [source ID].[field ID] and internal fields as [field ID]. If this is not the case, the function may return incorrect results.

shared.common.fieldCategory(s, c)¶

Returns the category of a source field.

Parameters:

s (str) – Source field label.
c (str) – Source field cleaned calculation.

Returns:

Category of the source field, which can be:

”Parameter”
”Calculated Field (LOD)”
”Calculated Field”
”Field”

Return type:

str

shared.common.fieldIDMapping(x, s, d)¶

Replace IDs by labels for an input string or dict list.

Parameters:

x (str or list) – Input string or dict list.
s (str) – Source name.
d (dict) – Dictionary of (field/sheet) label -> ID mappings.

Returns:

String or dict list with all field IDs replaced by labels and references to internal source fields removed.

Return type:

str or list

shared.common.fieldMappingTable(df, colFrom, colTo)¶

Replace all external and internal field references by unique source/field IDs

Parameters:

df (pandas.DataFrame) – The input dataframe containing the field references.
colFrom – The name of the column containing the original values.
colTo – The name of the column containing the mapped values.

Returns:

A dictionary containing mappings from original values to: mapped values, where each key is an original value and each value is its corresponding mapped value.

Return type:

dict

shared.common.forwardDependencies(df, f, w, level=0, p=None, _cache=None)¶

Recursively get all forward dependencies of a field with memoization and canonical caching (cache stores subtree as if called with level=0).

Returns:

Each item has keys: {“parent”,”child”,”level”,”category”,”sheets”}

where “level” is always a string.

Return type:

list[dict]

shared.common.getRandomReplacementBaseID(df, c, suffix='')¶

Generate a random ID of 10 lowercase letters combined with the dataframe index.

The generated ID is used as a base for a field identifier. It ensures that the new ID does not match any values already present in the specified column.

Parameters:

df (pandas.DataFrame) – The input dataframe from which unique values are checked.
c (str) – The column name in the dataframe to ensure none of the generated IDs conflict with existing values.
suffix (str, optional) – An optional fixed suffix to append to the randomly generated ID. Defaults to an empty string.

Returns:

A random ID consisting of 10 lowercase letters combined with the optional suffix. The ID is guaranteed not to match any existing values in the specified column.

Return type:

str

shared.common.init_progress(user_id, file_name)¶

Initialize progress_data entry and return a live reference to it.

Creates or resets progress_data[user_id] (Dash) or progress_data (Flask) with default fields for a new run. The returned dict is the actual entry inside progress_data, not a copy.

shared.common.isParamDuplicate(p, s, x)¶

Checks if a field is a parameter duplicate.

Parameters:

p (list) – List of parameter fields.
s (str) – Source name.
x (str) – Field name.

Returns:

True if the field is a parameter duplicate and should be removed; False otherwise.

Return type:

bool

shared.common.processCaptions(i, c)¶

Process captions into a format suitable for calculations, removing invalid characters for JSON parsing.

Parameters:

i (str) – The source or field ID value.
c (str) – The source or field caption value.

Returns:

A tuple containing:

str: The original field name enclosed in brackets.
str: The processed caption enclosed in square brackets, with
any additional right square brackets doubled. Single and double quotes are replaced by HTML codes (' and "), while a backslash () is replaced by two backslashes ().

Return type:

tuple

shared.common.processSheetNames(s)¶

Remove invalid characters from sheet names for JSON parsing.

Parameters:

s (list) – A list of input sheet names.

Returns:

A processed list of sheet names with single quotes replaced: by ', double quotes replaced by ", and backslashes replaced by two backslashes (\).

Return type:

list

shared.common.removeDuplicatesByRowLength(df, x)¶

Remove duplicates from a DataFrame by retaining the row with the largest concatenated string length per grouping.

Parameters:

df (pandas.DataFrame) – The input DataFrame.
x (str) – The name of the column to group by.

Returns:

A tuple containing:

pandas.DataFrame: A copy of the input DataFrame with, for each unique grouping value, the row with the largest concatenated string length.
int: The number of duplicates removed.

Return type:

tuple

shared.common.sheetMapping(s, d)¶

Replace all sheet names with sequential sheet IDs.

Parameters:

s (list) – A list of sheet names.
d (dict) – A dictionary mapping sheet names to their corresponding sheet IDs.

Returns:

A list of mapped sheet IDs corresponding to the input sheet: names.

Return type:

list

shared.common.sheetMappingTable(df, colFrom)¶

Create a dictionary mapping sheet names to sheet IDs.

Parameters:

df (pandas.DataFrame) – The input dataframe containing the sheet lists.
colFrom (str) – The name of the column containing the sheet names.

Returns:

A dictionary mapping each unique sheet name to its corresponding: sheet ID, where each key is a sheet name and each value is a generated sheet ID.

Return type:

dict

shared.common.show_exception_and_exit(exc_type, exc_value, tb)¶: Keeps the application alive when an unhandled exception occurs Source: https://stackoverflow.com/questions/779675/stop-python-from-closing-on-error

shared.common.uniqueDependencies(d, g, f)¶

Keep unique dependencies from a list of dependencies with their minimum dependency level.

Parameters:

d (list) – Input list of dependency dictionaries.
g (list) – Grouping list used to determine unique dependencies.
f (str) – Field name representing the dependency level.

Returns:

Dependency dictionaries that only contain unique dependencies with their minimum dependency level.

Return type:

list

shared.common.visualizeFieldDependencies(df, sf, l, g, dout_root, svg=False)¶

Creates output PNG/SVG files containing all dependencies for a given source field.

Parameters:

df (DataFrame) – Input data frame containing backward and forward
dependencies.
sf (str) – Input source field replacement ID.
l (str) – Input source field label.
g (Graph) – Master graph containing all source field and field node
objects.
dout_root (str) – Full path to root directory where graphs will be saved.
png (bool, optional) – Indicator (True/False) whether or not to
False. (generate PNG as well. Defaults to)

Returns:

SVG file is saved in “<workbook path> FilesGraphs<source field name>.png” and additional PNG file (with extra attributes) if png is True.

Return type:

None

shared.common.visualizeSheetDependencies(df, sh, g, dout, png=False)¶

Create output PNG/SVG files containing all dependencies for a given source field.

Parameters:

df (pandas.DataFrame) – Input data frame containing backward and forward
dependencies.
sh (str) – Input sheet ID for which dependencies are visualized.
g (Graph) – Master graph containing all source field and field node
objects.
dout (str) – Full path to the root directory where graphs will be saved.
png (bool, optional) – Indicator (True/False) to generate PNG as well.
False. (Defaults to)

Returns:

PNG file is saved in “<workbook path> FilesGraphsSheets<sheet name>.svg” and an additional PNG file (with extra attributes) if png is True.

Return type:

None

shared.common.zip_folder(folder_path, output_zip_path, skip_exts=['parquet'])¶

Zip the contents of a folder, preserving its structure.

Skips:

The output zip file itself if it’s inside the folder.
Any files with extensions listed in skip_exts.

Parameters:

folder_path (str) – Folder to zip.
output_zip_path (str) – Path of the zip file to create.
skip_exts (list[str], optional) – File extensions to skip (without dots). Defaults to [“parquet”].