Note
Go to the end to download the full example code.
ParamDict data structures#
We give an overview of the ParamDict data structures provided in opti-extensions. They are subclasses of Python’s dict with some additional functionality.
Suppose we want build a model to solve the facility location problem. We’ll define the parameters for this problem with these data structures.
# Let's import the classes defining IndexSets & ParamDicts
from opti_extensions import ParamDict1D, ParamDictND
# To show fail cases
import traceback
# We'll also work with dataframes and series
import pandas as pd
ParamDict1D#
Tip
Type annotations: Being a subclass of Python’s dict, ParamDict1D is also a generic container type and can be annotated accordingly. Additionally, opti-extensions provides a type-complete interface, enabling most type checkers and LSPs to infer the type automatically.
print(issubclass(ParamDict1D, dict))
True
Constructor#
The facility location problem is usually solved with customer-level demand data i.e., values indexed on the set of customers. Let’s define it with ParamDict1D data structure which, as the name suggests, should be used for parameters indexed unidimensional sets. The ParamDict keys should be ‘scalar’ data types such as int, str, pd.Timestamp, etc. and the values should be int or float.
Demand of customer :math:`i`: \(dem_{i} \in \mathbb{R}^{+} \quad \forall \; i \in CUST\)
# Each key is customer id, each value is units of demand
DEM = ParamDict1D(
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149},
)
print(DEM)
ParamDict1D:
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149}
Type annotation of DEM is ParamDict1D[int, int], similar to dict[int, int].
Fail cases#
# Will raise an error if any key(s) are non-scalar element(s)
try:
ParamDict1D({0: 215, 1: 138, 2: 240, 3: 134, (4, 5): 149})
# Non-scalar -> ^^^^^^
except TypeError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_paramdicts.py", line 75, in <module>
ParamDict1D({0: 215, 1: 138, 2: 240, 3: 134, (4, 5): 149})
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_param_dicts.py", line 334, in __init__
self._reraise_exc_from_indexset(exc, caller='IndexSet1D')
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_dict_mixins.py", line 60, in _reraise_exc_from_indexset
raise exception.__class__(msg) from None
TypeError: input introduced non-scalar key(s) (no iterables except string)
# Will raise an error if any value(s) are not int or float
try:
ParamDict1D({0: 215, 1: 138, 2: 240, 3: 134, 4: '149'})
# Not int or float -> ^^^^^
except TypeError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_paramdicts.py", line 84, in <module>
ParamDict1D({0: 215, 1: 138, 2: 240, 3: 134, 4: '149'})
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_param_dicts.py", line 336, in __init__
super().__init__(mapping, indexset=self._indexset)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_param_dicts.py", line 62, in __init__
raise TypeError("input mapping's values should be either int or float")
TypeError: input mapping's values should be either int or float
key_name & value_name attributes#
It also has optional key_name & value_name arguments, which get stored as attributes that can be referred to for various downstream uses.
# Without specifying the name arguments
DEM = ParamDict1D(
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149}
) # equivalent to `key_name=None, value_name=None`
print(DEM)
ParamDict1D:
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149}
print(DEM.key_name, DEM.value_name)
None None
# Specifying the name arguments, should be strings
DEM = ParamDict1D(
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149},
key_name='Customer',
value_name='Demand',
)
print(DEM) # the names will also be added in the header of the string representation
ParamDict1D: Customer -> Demand
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149}
print(DEM.key_name, DEM.value_name)
Customer Demand
# Change the names with attribute assignment
DEM.key_name = 'CUST'
DEM.value_name = 'DEM'
print(DEM.key_name, DEM.value_name)
CUST DEM
# Simple use case 1
print(f'({DEM.key_name}: {DEM.value_name}) -> ' + ', '.join(f'({k}: {v})' for k, v in DEM.items()))
(CUST: DEM) -> (0: 215), (1: 138), (2: 240), (3: 134), (4: 149)
# Simple use case 2
s_cost = pd.Series(DEM, name=DEM.value_name).rename_axis(DEM.key_name)
print(s_cost)
CUST
0 215
1 138
2 240
3 134
4 149
Name: DEM, dtype: int64
Basic methods#
Being a subclass, it provides all methods of python’s dict. Please refer to the Mapping operations and Views sections of API Reference for more details.
It provides a special method lookup that returns parameter values for keys present in the
ParamDict1D and zero for keys not found i.e., equivalent to ParamDict1D.get(key, 0). This
is helpful when working with parameters indexed on sparse sets.
print({i: DEM.lookup(i) for i in (0, 1, 99)})
{0: 215, 1: 138, 99: 0}
Numerical operations#
It has special methods for numerical operations over parameter values.
print(f'{DEM.sum() = }')
print(f'{DEM.min() = }')
print(f'{DEM.max() = }')
print(f'{DEM.mean() = }')
print(f'{DEM.median() = }')
print(f'{DEM.median_low() = }')
print(f'{DEM.median_high() = }')
DEM.sum() = 876
DEM.min() = 134
DEM.max() = 240
DEM.mean() = 175.2
DEM.median() = 149
DEM.median_low() = 149
DEM.median_high() = 149
ParamDictND#
Tip
Type annotations: Being a subclass of Python’s dict, ParamDictND is also a generic container type and can be annotated accordingly. Additionally, opti-extensions provides a type-complete interface, enabling most type checkers and LSPs to infer the type automatically.
print(issubclass(ParamDictND, dict))
True
Constructor#
The facility location problem is usually solved with customer-facility cost data i.e., values indexed on the set of customers & facilities. Let’s define it with ParamDictND data structure which, as the name suggests, should be used for parameters indexed unidimensional sets. The ParamDict keys should be tuples of ‘scalar’ data types such as int, str, pd.Timestamp, etc. (with each tuple element having the same length) and the values should be int or float.
Cost of supplying customer :math:`i` from facility :math:`j`: \(cost_{i, j} \in \mathbb{R}^{+} \quad \forall \; (i, j) \in FAC\_CUST\)
# Each key is a tuple of facility code & customer id, each value is cost
COST = ParamDictND(
{('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F2', 2): 270, ('F3', 1): 205, ('F3', 2): 360},
)
print(COST)
ParamDictND:
{('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F2', 2): 270, ('F3', 1): 205, ('F3', 2): 360}
Type annotation of COST is ParamDictND[tuple[str, int], int], similar to dict[tuple[str, int], int].
Fail cases#
# Will raise an error if any key(s) are scalar element(s)
try:
ParamDictND({('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F3', 1): 205, 'F3': 360})
# Scalar -> ^^^^
except TypeError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_paramdicts.py", line 222, in <module>
ParamDictND({('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F3', 1): 205, 'F3': 360})
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_param_dicts.py", line 614, in __init__
self._reraise_exc_from_indexset(exc, caller='IndexSetND')
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_dict_mixins.py", line 60, in _reraise_exc_from_indexset
raise exception.__class__(msg) from None
TypeError: input introduced non-tuple key(s)
# Will raise an error if any value(s) are not int or float
try:
ParamDictND({('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F3', 1): 205, ('F3', 2): '300'})
# Not int or float -> ^^^^^
except TypeError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_paramdicts.py", line 231, in <module>
ParamDictND({('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F3', 1): 205, ('F3', 2): '300'})
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_param_dicts.py", line 616, in __init__
super().__init__(mapping, indexset=self._indexset)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_param_dicts.py", line 62, in __init__
raise TypeError("input mapping's values should be either int or float")
TypeError: input mapping's values should be either int or float
key_names & value_name attributes#
It also has optional key_name & value_name arguments, which get stored as attributes that can be referred to for various downstream uses.
# Without specifying the name arguments
COST = ParamDictND(
{('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F2', 2): 270, ('F3', 1): 205, ('F3', 2): 360}
) # equivalent to `key_names=None, value_name=None`
print(COST)
ParamDictND:
{('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F2', 2): 270, ('F3', 1): 205, ('F3', 2): 360}
print(COST.key_names, COST.value_name)
None None
# Specifying the name arguments, should be strings
COST = ParamDictND(
{('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F2', 2): 270, ('F3', 1): 205, ('F3', 2): 360},
key_names=('Facility', 'Customer'),
value_name='Cost',
)
print(COST) # the names will also be added in the header of the string representation
ParamDictND: (Facility, Customer) -> Cost
{('F1', 1): 197, ('F1', 2): 345, ('F2', 1): 99, ('F2', 2): 270, ('F3', 1): 205, ('F3', 2): 360}
print(COST.key_names, COST.value_name)
['Facility', 'Customer'] Cost
# Change the names with attribute assignment
COST.key_names = ('FAC', 'CUST')
COST.value_name = 'COST'
print(COST.key_names, COST.value_name)
['FAC', 'CUST'] COST
Simple use case
s_cost = pd.Series(COST, name=COST.value_name).rename_axis(COST.key_names)
print(s_cost)
FAC CUST
F1 1 197
2 345
F2 1 99
2 270
F3 1 205
2 360
Name: COST, dtype: int64
Basic methods#
Being a subclass, it provides all methods of python’s dict. Please refer to the Mapping operations and Views sections of API Reference for more details.
It provides a special method lookup that returns parameter values for keys present in the
ParamDictND and zero for keys not found i.e., equivalent to ParamDictND.get(key, 0). This
is helpful when working with parameters indexed on sparse sets.
print({(i, j): COST.lookup(i, j) for i in ('F1', 'F2') for j in (1, 99)})
{('F1', 1): 197, ('F1', 99): 0, ('F2', 1): 99, ('F2', 99): 0}
Efficient subset selection#
It has two special methods that allow us to efficiently get subsets: subset_keys and subset_values.
# If we only want keys/values of COST (indexed on a 2-dimensional set) that have the value
# 'F1' in the first dimension and any value in the second dimension, we can supply the wildcard
# pattern to the `subset_keys`/`subset_values` method as shown below.
#
# (the single-character string '*' is used as the wildcard to indicate all possible values for the
# given dimension).
#
print(COST.subset_keys('F1', '*'))
print(COST.subset_values('F1', '*'))
[('F1', 1), ('F1', 2)]
[197, 345]
# If we only want keys/values of COST (indexed on a 2-dimensional set) that have any value in the
# first dimension and the value 1 in the second dimension
print('Subset method:', COST.subset_keys('*', 1))
print('Subset method:', COST.subset_values('*', 1))
# As compared to an if check inside a loop/comprehension
print('With if check:', [elem for elem in COST if elem[1] == 1])
print('With if check:', [val for elem, val in COST.items() if elem[1] == 1])
Subset method: [('F1', 1), ('F2', 1), ('F3', 1)]
Subset method: [197, 99, 205]
With if check: [('F1', 1), ('F2', 1), ('F3', 1)]
With if check: [197, 99, 205]
Numerical operations#
It has special methods for numerical operations over all or a subset of parameter values.
# Numerical operations over all parameter values of COST
print(f'{COST.sum() = }')
print(f'{COST.min() = }')
print(f'{COST.max() = }')
print(f'{COST.mean() = }')
print(f'{COST.median() = }')
print(f'{COST.median_low() = }')
print(f'{COST.median_high() = }')
COST.sum() = 1476
COST.min() = 99
COST.max() = 360
COST.mean() = 246
COST.median() = 237.5
COST.median_low() = 205
COST.median_high() = 270
# Numerical operations over a subset of parameter values, that have the value 'F1' in the first
# dimension and any value in the second dimension of COST, based on wildcard pattern
print(f'{COST.sum("F1", "*") = }')
print(f'{COST.min("F1", "*") = }')
print(f'{COST.max("F1", "*") = }')
print(f'{COST.mean("F1", "*") = }')
print(f'{COST.median("F1", "*") = }')
print(f'{COST.median_low("F1", "*") = }')
print(f'{COST.median_high("F1", "*") = }')
COST.sum("F1", "*") = 542
COST.min("F1", "*") = 197
COST.max("F1", "*") = 345
COST.mean("F1", "*") = 271
COST.median("F1", "*") = 271.0
COST.median_low("F1", "*") = 197
COST.median_high("F1", "*") = 345
Not only does it provide a cleaner syntax, but it is also very performant because of an internal caching mechanism. Let’s see for an example below:
from random import choice
from timeit import repeat, timeit
# We'll create a large ParamDictND where the first dimension for each element is unique but the
# second dimension is not (many elements share common values in the second dimension)
test_param = ParamDictND(
{(i, choice(range(99))): choice(range(10)) for i in range(1_000_000)},
)
# While the first call of `sum` (or any other numerical op, or `subset_keys`, or `subset_values`)
# takes a some millisecs, subsequent calls are extremely fast
code = "subset_res1 = test_param.sum('*', 42)"
time = timeit(code, number=1, globals=globals())
print(f'Execution time: {1000 * time:08.3f} ms')
Execution time: 0222.932 ms
code = "subset_res2 = test_param.sum('*', 42)"
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0018.075 ms
code = "subset_res3 = test_param.sum('*', 27)"
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0019.606 ms
code = 'ifcheck_res = sum(v for k, v in test_param.items() if k[1] == 42)'
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0533.682 ms
code = 'ifcheck_res = sum(v for k, v in test_param.items() if k[1] == 27)'
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0503.871 ms
Individually, these micro-speedups may seem trivial, but in aggregate, they end up making a notable difference when building large-scale models.
Integration with pandas#
opti-extensions provides optional functionality to directly cast pandas Series/DataFrame/Index
objects into ParamDict data structures. If pandas is present in the python environment, this
functionality will be registered with a custom .opti accessor when opti-extensions is
imported.
Tip
Type annotations: Since this functionality is registered at runtime, Python type checkers and LSPs that use static type checking cannot automatically infer the types. The user will need to annotate the ParamDict data structures created through these methods for type checking.
# Say, we have demand data for the problem in form of a pandas dataframe
data1 = pd.DataFrame({'CUST': [0, 1, 2, 3, 4], 'DEM': [215, 138, 240, 134, 149]})
print(data1)
CUST DEM
0 0 215
1 1 138
2 2 240
3 3 134
4 4 149
# To directly get the demand parameter in form of ParamDict1D
# (pd.Series to ParamDict1D)
# index labels: ParamDict keys,
# series values: ParamDict values
s_dem = data1.set_index('CUST').DEM
print(s_dem.opti.to_paramdict())
ParamDict1D: CUST -> DEM
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149}
# To directly get the demand parameter in form of ParamDict1D
# (single column pd.DataFrame to ParamDict1D)
# index labels: ParamDict keys,
# series values: ParamDict values
df_dem = data1.set_index('CUST')[['DEM']]
print(df_dem.opti.to_paramdict())
ParamDict1D: CUST -> DEM
{0: 215, 1: 138, 2: 240, 3: 134, 4: 149}
# Say, we have cost data for the problem in form of a pandas dataframe
data2 = pd.DataFrame(
{
'FAC': ['F0', 'F0', 'F0', 'F1', 'F2', 'F2', 'F2', 'F3', 'F3', 'F3', 'F4', 'F4'],
'CUST': [2, 3, 4, 1, 0, 3, 4, 2, 3, 4, 0, 1],
'COST': [119, 144, 185, 261, 230, 102, 192, 169, 116, 138, 126, 100],
},
)
print(data2)
FAC CUST COST
0 F0 2 119
1 F0 3 144
2 F0 4 185
3 F1 1 261
4 F2 0 230
5 F2 3 102
6 F2 4 192
7 F3 2 169
8 F3 3 116
9 F3 4 138
10 F4 0 126
11 F4 1 100
# To directly get the cost parameter in form of ParamDictND
# (pd.Series to ParamDictND)
# index labels: ParamDict keys,
# series values: ParamDict values
s_cost = data2.set_index(['FAC', 'CUST']).COST
print(s_cost.opti.to_paramdict())
ParamDictND: (FAC, CUST) -> COST
{('F0', 2): 119, ('F0', 3): 144, ('F0', 4): 185, ('F1', 1): 261, ('F2', 0): 230, ('F2', 3): 102, ('F2', 4): 192, ('F3', 2): 169, ('F3', 3): 116, ('F3', 4): 138, ('F4', 0): 126, ('F4', 1): 100}
# To directly get the set of cost parameter in form of ParamDictND
# (single column pd.DataFrame to ParamDictND)
# index labels: ParamDict keys,
# dataframe column values: ParamDict values
df_cost = data2.set_index(['FAC', 'CUST'])[['COST']]
print(df_cost.opti.to_paramdict())
ParamDictND: (FAC, CUST) -> COST
{('F0', 2): 119, ('F0', 3): 144, ('F0', 4): 185, ('F1', 1): 261, ('F2', 0): 230, ('F2', 3): 102, ('F2', 4): 192, ('F3', 2): 169, ('F3', 3): 116, ('F3', 4): 138, ('F4', 0): 126, ('F4', 1): 100}
Total running time of the script: (0 minutes 7.785 seconds)