shared.common

common.py

This module contains shared variables and utility functions used across the Flask application for processing uploaded files.

Usage:

Import this module in both app.py and processing.py to access and modify the shared progress_data.

shared.common.addFieldNode(sf, l, cat, shapes, fillcolors, colors, calc)

Creates graph node objects for an input source field.

Parameters:
  • sf (str) – Input source field replacement ID.

  • l (str) – Input source field label.

  • cat (str) – Input source field category.

  • shapes (dict) – List of shapes per source field category.

  • fillcolors (dict) – List of fill colors per source field category.

  • colors (dict) – List of border colors per source field category.

  • calc (str) – Input source field calculation expression.

Returns:

List of 2 node objects with:
  • name equal to the replacement ID,

  • label equal to the source field label,

  • shape/color/tooltip based on field type.

Return type:

list

shared.common.appendFieldsToDicts(l, k, v)

Append a fixed list of (key, value) to a list of dictionaries.

Parameters:
  • l (list) – Input list of dictionaries to update.

  • k (list) – List of fixed key names to append.

  • v (list) – List of fixed values to append corresponding to keys.

Returns:

Updated version of input list with new (key, value) pairs appended to each dictionary.

Return type:

list

shared.common.backwardDependencies(df, f, level=0, c=None, _cache=None)

Recursively get all backward dependencies of a field with memoization and canonical caching (cache stores subtree as if called with level=0).

Parameters:
  • df (pandas.DataFrame) – Input data frame.

  • f (str) – Source field replacement ID.

  • level (int) – Depth from the original root (0=root, 1=first dep, …).

  • c (str|None) – Parent (the child in your naming) at previous step.

  • _cache (dict|None) – Internal memoization cache mapping node -> canonical list.

Returns:

Items like {“parent”, “child”, “level”, “category”} with string levels (e.g. “-1”).

Return type:

list[dict]

shared.common.deduplicate_graph(G: Dot) Dot

Return a new graph with duplicate nodes and edges removed.

shared.common.fieldCalculationDependencies(l, x)

List direct dependencies in a calculation based on a list of possible values.

Parameters:
  • l (list) – A list of all possible values that can be matched.

  • x (str) – An input calculation string.

Returns:

A list of values from the list l that were matched in the

string x.

Return type:

list

shared.common.fieldCalculationMapping(c, s, d, l)

Replace all external and internal field references by unique source/field IDs

Parameters:
  • c (str) – The source field calculation string.

  • s (str) – The source field name.

  • d (dict) – A dictionary mapping source fields to their replacement IDs.

  • l (list) – A list of unique field names.

Returns:

The calculation string without comments and with all field

ID references replaced by their corresponding unique replacement ID references.

Return type:

str

Notes

This function assumes that external fields are referenced as [source ID].[field ID] and internal fields as [field ID]. If this is not the case, the function may return incorrect results.

shared.common.fieldCategory(s, c)

Returns the category of a source field.

Parameters:
  • s (str) – Source field label.

  • c (str) – Source field cleaned calculation.

Returns:

Category of the source field, which can be:
  • ”Parameter”

  • ”Calculated Field (LOD)”

  • ”Calculated Field”

  • ”Field”

Return type:

str

shared.common.fieldIDMapping(x, s, d)

Replace IDs by labels for an input string or dict list.

Parameters:
  • x (str or list) – Input string or dict list.

  • s (str) – Source name.

  • d (dict) – Dictionary of (field/sheet) label -> ID mappings.

Returns:

String or dict list with all field IDs replaced by labels and references to internal source fields removed.

Return type:

str or list

shared.common.fieldMappingTable(df, colFrom, colTo)

Replace all external and internal field references by unique source/field IDs

Parameters:
  • df (pandas.DataFrame) – The input dataframe containing the field references.

  • colFrom – The name of the column containing the original values.

  • colTo – The name of the column containing the mapped values.

Returns:

A dictionary containing mappings from original values to

mapped values, where each key is an original value and each value is its corresponding mapped value.

Return type:

dict

shared.common.forwardDependencies(df, f, w, level=0, p=None, _cache=None)

Recursively get all forward dependencies of a field with memoization and canonical caching (cache stores subtree as if called with level=0).

Returns:

Each item has keys

{“parent”,”child”,”level”,”category”,”sheets”}

where “level” is always a string.

Return type:

list[dict]

shared.common.getRandomReplacementBaseID(df, c, suffix='')

Generate a random ID of 10 lowercase letters combined with the dataframe index.

The generated ID is used as a base for a field identifier. It ensures that the new ID does not match any values already present in the specified column.

Parameters:
  • df (pandas.DataFrame) – The input dataframe from which unique values are checked.

  • c (str) – The column name in the dataframe to ensure none of the generated IDs conflict with existing values.

  • suffix (str, optional) – An optional fixed suffix to append to the randomly generated ID. Defaults to an empty string.

Returns:

A random ID consisting of 10 lowercase letters combined with the optional suffix. The ID is guaranteed not to match any existing values in the specified column.

Return type:

str

shared.common.init_progress(user_id, file_name)

Initialize progress_data entry and return a live reference to it.

Creates or resets progress_data[user_id] (Dash) or progress_data (Flask) with default fields for a new run. The returned dict is the actual entry inside progress_data, not a copy.

shared.common.isParamDuplicate(p, s, x)

Checks if a field is a parameter duplicate.

Parameters:
  • p (list) – List of parameter fields.

  • s (str) – Source name.

  • x (str) – Field name.

Returns:

True if the field is a parameter duplicate and should be removed; False otherwise.

Return type:

bool

shared.common.processCaptions(i, c)

Process captions into a format suitable for calculations, removing invalid characters for JSON parsing.

Parameters:
  • i (str) – The source or field ID value.

  • c (str) – The source or field caption value.

Returns:

A tuple containing:
  • str: The original field name enclosed in brackets.

  • str: The processed caption enclosed in square brackets, with

    any additional right square brackets doubled. Single and double quotes are replaced by HTML codes (' and "), while a backslash () is replaced by two backslashes ().

Return type:

tuple

shared.common.processSheetNames(s)

Remove invalid characters from sheet names for JSON parsing.

Parameters:

s (list) – A list of input sheet names.

Returns:

A processed list of sheet names with single quotes replaced

by ', double quotes replaced by ", and backslashes replaced by two backslashes (\).

Return type:

list

shared.common.removeDuplicatesByRowLength(df, x)

Remove duplicates from a DataFrame by retaining the row with the largest concatenated string length per grouping.

Parameters:
  • df (pandas.DataFrame) – The input DataFrame.

  • x (str) – The name of the column to group by.

Returns:

A tuple containing:
  • pandas.DataFrame: A copy of the input DataFrame with, for each unique grouping value, the row with the largest concatenated string length.

  • int: The number of duplicates removed.

Return type:

tuple

shared.common.sheetMapping(s, d)

Replace all sheet names with sequential sheet IDs.

Parameters:
  • s (list) – A list of sheet names.

  • d (dict) – A dictionary mapping sheet names to their corresponding sheet IDs.

Returns:

A list of mapped sheet IDs corresponding to the input sheet

names.

Return type:

list

shared.common.sheetMappingTable(df, colFrom)

Create a dictionary mapping sheet names to sheet IDs.

Parameters:
  • df (pandas.DataFrame) – The input dataframe containing the sheet lists.

  • colFrom (str) – The name of the column containing the sheet names.

Returns:

A dictionary mapping each unique sheet name to its corresponding

sheet ID, where each key is a sheet name and each value is a generated sheet ID.

Return type:

dict

shared.common.show_exception_and_exit(exc_type, exc_value, tb)

Keeps the application alive when an unhandled exception occurs Source: https://stackoverflow.com/questions/779675/stop-python-from-closing-on-error

shared.common.uniqueDependencies(d, g, f)

Keep unique dependencies from a list of dependencies with their minimum dependency level.

Parameters:
  • d (list) – Input list of dependency dictionaries.

  • g (list) – Grouping list used to determine unique dependencies.

  • f (str) – Field name representing the dependency level.

Returns:

Dependency dictionaries that only contain unique dependencies with their minimum dependency level.

Return type:

list

shared.common.visualizeFieldDependencies(df, sf, l, g, dout_root, svg=False)

Creates output PNG/SVG files containing all dependencies for a given source field.

Parameters:
  • df (DataFrame) – Input data frame containing backward and forward

  • dependencies.

  • sf (str) – Input source field replacement ID.

  • l (str) – Input source field label.

  • g (Graph) – Master graph containing all source field and field node

  • objects.

  • dout_root (str) – Full path to root directory where graphs will be saved.

  • png (bool, optional) – Indicator (True/False) whether or not to

  • False. (generate PNG as well. Defaults to)

Returns:

SVG file is saved in “<workbook path> FilesGraphs<source field name>.png” and additional PNG file (with extra attributes) if png is True.

Return type:

None

shared.common.visualizeSheetDependencies(df, sh, g, dout, png=False)

Create output PNG/SVG files containing all dependencies for a given source field.

Parameters:
  • df (pandas.DataFrame) – Input data frame containing backward and forward

  • dependencies.

  • sh (str) – Input sheet ID for which dependencies are visualized.

  • g (Graph) – Master graph containing all source field and field node

  • objects.

  • dout (str) – Full path to the root directory where graphs will be saved.

  • png (bool, optional) – Indicator (True/False) to generate PNG as well.

  • False. (Defaults to)

Returns:

PNG file is saved in “<workbook path> FilesGraphsSheets<sheet name>.svg” and an additional PNG file (with extra attributes) if png is True.

Return type:

None

shared.common.zip_folder(folder_path, output_zip_path, skip_exts=['parquet'])

Zip the contents of a folder, preserving its structure.

Skips:
  • The output zip file itself if it’s inside the folder.

  • Any files with extensions listed in skip_exts.

Parameters:
  • folder_path (str) – Folder to zip.

  • output_zip_path (str) – Path of the zip file to create.

  • skip_exts (list[str], optional) – File extensions to skip (without dots). Defaults to [“parquet”].