Note
Go to the end to download the full example code.
IndexSet data structures#
We give an overview of the IndexSet data structures provided in opti-extensions. They operate exactly like Python’s list (but require all elements to be unique) and have some additional functionality.
Suppose we want build a model to solve the facility location problem. We’ll define the sets for this problem with these data structures.
# Let's import the classes defining IndexSets
from opti_extensions import IndexSet1D, IndexSetND
# To show subclass
import collections.abc
# To show fail cases
import traceback
# We'll also work with dataframes and series
import pandas as pd
IndexSet1D#
Tip
Type annotations: Like Python’s list, IndexSet1D is a generic container type and can be annotated accordingly. Additionally, opti-extensions provides a type-complete interface, enabling most type checkers and LSPs to infer the type automatically.
print(issubclass(IndexSet1D, collections.abc.MutableSequence))
True
Constructor#
The facility location problem is usually defined with two unidimensional sets: Facilities and Customers. Let’s define them with the IndexSet1D data structure which, as the name suggests, should be used for unidimensional sets. The elements can be ‘scalar’ data types such as int, str, pd.Timestamp, etc.
Set of customers to be served, described by id(s): \(i \in CUST\)
# If we want to define for a range on numbers
CUST = IndexSet1D(range(5))
print(CUST)
IndexSet1D:
[0, 1, 2, 3, 4]
Type annotation of CUST is IndexSet1D[int], similar to list[int].
# If we want to define for a sequence of numbers such as a list or tuple or set
CUST = IndexSet1D([1, 2, 3, 4, 5]) # IndexSet1D((1, 2, 3, 4, 5)) # IndexSet1D({1, 2, 3, 4, 5})
print(CUST)
IndexSet1D:
[1, 2, 3, 4, 5]
Set of facilities to consider, described by code(s): \(j \in FAC\)
# If we want to define for a sequence of strings
FAC = IndexSet1D(['F1', 'F2', 'F3'])
print(FAC)
IndexSet1D:
['F1', 'F2', 'F3']
Type annotation of FAC is IndexSet1D[str], similar to list[str].
Fail cases#
# Will raise an error if input contains non-scalar element(s)
try:
IndexSet1D(['F1', 'F2', ('F3', 'F4')])
# Non-scalar -> ^^^^^^^^^^^^
except TypeError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_indexsets.py", line 94, in <module>
IndexSet1D(['F1', 'F2', ('F3', 'F4')])
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 562, in __init__
super().__init__(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 80, in __init__
if self._validate_elements(elems):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 640, in _validate_elements
self._check_allscalars(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 618, in _check_allscalars
raise TypeError('input introduced non-scalar element(s) (no iterables except string)')
TypeError: input introduced non-scalar element(s) (no iterables except string)
# Will raise an error if input includes duplicates
try:
IndexSet1D(['F1', 'F2', 'F2'])
# Duplicates -> ^^^^ ^^^^
except ValueError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_indexsets.py", line 103, in <module>
IndexSet1D(['F1', 'F2', 'F2'])
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 562, in __init__
super().__init__(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 81, in __init__
self._set: set[ElemT] = self._ensure_no_duplicates(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 112, in _ensure_no_duplicates
raise ValueError(f'input introduced duplicates in {self.__class__.__name__}')
ValueError: input introduced duplicates in IndexSet1D
name attribute#
It also has an optional name argument, which gets stored as an attribute that can be referred to for various downstream uses.
# Without specifying the name argument
FAC = IndexSet1D(['F1', 'F2', 'F3']) # equivalent to `name=None`
print(FAC)
IndexSet1D:
['F1', 'F2', 'F3']
print(FAC.name)
None
# Specifying the name argument, should be a string
FAC = IndexSet1D(['F1', 'F2', 'F3'], name='Facilities')
print(FAC) # the name will also be added in the header of the string representation
IndexSet1D: (Facilities)
['F1', 'F2', 'F3']
print(FAC.name)
Facilities
# Change the name with attribute assignment
FAC.name = 'FAC'
print(FAC.name)
FAC
# Simple use case 1
print(f'Elements of set {FAC.name} are -> ' + ', '.join(FAC))
Elements of set FAC are -> F1, F2, F3
# Simple use case 2
s = pd.Series(FAC, name=FAC.name)
print(s)
0 F1
1 F2
2 F3
Name: FAC, dtype: object
Basic methods#
(1) It supports all methods of list. Please refer to the Sequence operations and Dunder methods sections of API Reference for more details.
(2) It supports rich comparisons like set. Please refer to the Set comparison section of API Reference for more details.
(3) It supports set operations like set. Please refer to the Set operations section of API Reference for more details.
IndexSetND#
Tip
Type annotations: Like Python’s list, IndexSetND is a generic container type and can be annotated accordingly. Additionally, opti-extensions provides a type-complete interface, enabling most type checkers and LSPs to infer the type automatically.
print(issubclass(IndexSetND, collections.abc.MutableSequence))
True
Constructor#
We can define a two-dimensional set with the IndexSetND data structure which, as the name suggests, should be used for multidimensional sets. The elements should be tuples of ‘scalar’ data types such as int, str, pd.Timestamp, etc. (with each tuple element having the same length).
Set of allowable combinations of facilities and customers: \((i, j) \in FAC\_CUST\)
# If we want to define for a sequence of tuples such as a list or tuple or set
FAC_CUST = IndexSetND(
[('F1', 1), ('F1', 2), ('F2', 1), ('F2', 2), ('F3', 1), ('F3', 2)],
)
print(FAC_CUST)
IndexSetND:
[('F1', 1), ('F1', 2), ('F2', 1), ('F2', 2), ('F3', 1), ('F3', 2)]
Type annotation of FAC_CUST is IndexSetND[tuple[str, int]], similar to list[tuple[str, int]].
FAC = IndexSet1D(['F1', 'F2', 'F3'])
CUST = IndexSet1D([1, 2])
# If we want to define for all possible combinations of two IndexSet1Ds
FAC_CUST = IndexSetND(FAC, CUST)
print(FAC_CUST)
IndexSetND:
[('F1', 1), ('F1', 2), ('F2', 1), ('F2', 2), ('F3', 1), ('F3', 2)]
# .. or all possible combinations of two iterables (such as range or list or tuple or set)
FAC_CUST = IndexSetND(['F1', 'F2', 'F3'], range(2))
print(FAC_CUST)
IndexSetND:
[('F1', 0), ('F1', 1), ('F2', 0), ('F2', 1), ('F3', 0), ('F3', 1)]
Fail cases#
# Will raise an error if input contains scalar element(s)
try:
IndexSetND([('F1', 1), ('F1', 1), ('F2', 1), 'F2', 2])
# Scalars -> ^^^^ ^^^
except TypeError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_indexsets.py", line 232, in <module>
IndexSetND([('F1', 1), ('F1', 1), ('F2', 1), 'F2', 2])
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1029, in __init__
super().__init__(cast('list[ElemNDT]', elems))
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 80, in __init__
if self._validate_elements(elems):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1227, in _validate_elements
self._check_alltuples(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1180, in _check_alltuples
raise TypeError('input introduced non-tuple element(s)')
TypeError: input introduced non-tuple element(s)
# Will raise an error if input contains tuple elements of different lengths
try:
IndexSetND([('F1', 1), ('F2', 1), ('F1', 1), ('F2', 2, 'A')])
# Tuple of different length -> ^^^^^^^^^^^^^^
except ValueError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_indexsets.py", line 241, in <module>
IndexSetND([('F1', 1), ('F2', 1), ('F1', 1), ('F2', 2, 'A')])
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1029, in __init__
super().__init__(cast('list[ElemNDT]', elems))
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 80, in __init__
if self._validate_elements(elems):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1236, in _validate_elements
self._tuplelen = self._check_tuplelen(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1201, in _check_tuplelen
raise ValueError('all tuple elements must be of same length')
ValueError: all tuple elements must be of same length
# Will raise an error if input includes duplicates
try:
IndexSetND([('F1', 1), ('F2', 1), ('F2', 1), ('F2', 2)])
# Duplicates -> ^^^^^^^^^ ^^^^^^^^^
except ValueError:
traceback.print_exc()
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/checkouts/latest/examples/data_structures_overview/example_indexsets.py", line 250, in <module>
IndexSetND([('F1', 1), ('F2', 1), ('F2', 1), ('F2', 2)])
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1029, in __init__
super().__init__(cast('list[ElemNDT]', elems))
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 81, in __init__
self._set: set[ElemT] = self._ensure_no_duplicates(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 1265, in _ensure_no_duplicates
unique = super()._ensure_no_duplicates(elems)
File "/home/docs/checkouts/readthedocs.org/user_builds/opti-extensions/envs/latest/lib/python3.10/site-packages/opti_extensions/_index_sets.py", line 112, in _ensure_no_duplicates
raise ValueError(f'input introduced duplicates in {self.__class__.__name__}')
ValueError: input introduced duplicates in IndexSetND
names attribute#
It also has an optional names argument, which gets stored as an attribute that can be referred to for various downstream uses.
# Without specifying the name argument
FAC_CUST = IndexSetND(['F1', 'F2', 'F3'], range(2)) # equivalent to `name=None`
print(FAC_CUST)
IndexSetND:
[('F1', 0), ('F1', 1), ('F2', 0), ('F2', 1), ('F3', 0), ('F3', 1)]
print(FAC_CUST.names)
None
# Specifying the names argument, should be a sequence of strings
FAC_CUST = IndexSetND(['F1', 'F2', 'F3'], range(2), names=['FAC', 'CUST'])
print(FAC_CUST) # the name will also be added in the header of the string representation
IndexSetND: (FAC, CUST)
[('F1', 0), ('F1', 1), ('F2', 0), ('F2', 1), ('F3', 0), ('F3', 1)]
print(FAC_CUST.names)
['FAC', 'CUST']
Change the name with attribute assignment
FAC_CUST.names = ['Facilities', 'Customers']
print(FAC_CUST.names)
['Facilities', 'Customers']
Simple use case
df = pd.DataFrame(FAC_CUST, columns=FAC_CUST.names)
print(df)
Facilities Customers
0 F1 0
1 F1 1
2 F2 0
3 F2 1
4 F3 0
5 F3 1
Basic methods#
(1) It supports all methods of list. Please refer to the Sequence operations and Dunder methods sections of API Reference for more details.
(2) It supports rich comparisons like set. Please refer to the Set comparison section of API Reference for more details.
(3) It supports set operations like set. Please refer to the Set operations section of API Reference for more details.
Efficient subset selection#
It has two special methods that allow us to efficiently get subsets: subset and squeeze.
subset method#
# If we only want elements of FAC_CUST (a 2-dimensional set) that have the value 'F1' in the first
# dimension and any value in the second dimension, we can supply the wildcard pattern to the
# `subset` method as shown below.
#
# (the single-character string '*' is used as the wildcard to indicate all possible values for the
# given dimension).
#
print(FAC_CUST.subset('F1', '*'))
[('F1', 0), ('F1', 1)]
# If we only want elements of FAC_CUST (a 2-dimensional set) that have any value in the first
# dimension and the value 1 in the second dimension
print('Subset method:', FAC_CUST.subset('*', 1))
# As compared to an if check inside a loop/comprehension
print('With if check:', [elem for elem in FAC_CUST if elem[1] == 1])
Subset method: [('F1', 1), ('F2', 1), ('F3', 1)]
With if check: [('F1', 1), ('F2', 1), ('F3', 1)]
Not only does it provide a cleaner syntax, but it is also very performant because of an internal caching mechanism. Let’s see for an example below:
from random import choice
from timeit import repeat, timeit
# We'll create a large IndexSetND where the first two dimensions are unique for each element but the
# third dimension is not (many elements share common values in the third dimension)
test_set = IndexSetND((i, str(i), choice(range(99))) for i in range(1_000_000))
# While the first call of `subset` takes a some millisecs, subsequent calls are extremely fast
code = "subset_res1 = test_set.subset('*', '*', 42)"
time = timeit(code, number=1, globals=globals())
print(f'Execution time: {1000 * time:08.3f} ms')
Execution time: 0209.270 ms
code = "subset_res2 = test_set.subset('*', '*', 42)"
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0000.036 ms
code = "subset_res3 = test_set.subset('*', '*', 27)"
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0000.036 ms
code = 'ifcheck_res = [elem for elem in test_set if elem[2] == 42]'
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0351.592 ms
code = 'ifcheck_res = [elem for elem in test_set if elem[2] == 27]'
times = repeat(code, number=10, repeat=5, globals=globals())
print(f'Execution time: {1000 * sum(times) / len(times):08.3f} ms')
Execution time: 0361.731 ms
Individually, these micro-speedups may seem trivial, but in aggregate, they end up making a notable difference when building large-scale models.
squeeze method#
# From FAC_CUST (a 2-dimensional set), if we want to get an IndexSet of all unique values at the
# first dimension (i.e., at dimension 0 because Python uses zero-based indexing)
#
# equivalent to: IndexSet1D(set(i for i, j in FAC_CUST))
#
print(FAC_CUST.squeeze(0))
IndexSet1D:
['F1', 'F2', 'F3']
# From FAC_CUST (a 2-dimensional set), if we want to get an IndexSet of all unique values at the
# second dimension (i.e., at dimension 1 because Python uses zero-based indexing)
#
# equivalent to: IndexSet1D(set(j for i, j in FAC_CUST), name='CUST')
#
print(FAC_CUST.squeeze(1, names=['CUST']))
IndexSet1D: (CUST)
[0, 1]
Integration with pandas#
opti-extensions provides optional functionality to directly cast pandas Series/DataFrame/Index
objects into IndexSet data structures. If pandas is present in the python environment, this
functionality will be registered with a custom .opti accessor when opti-extensions is
imported.
Tip
Type annotations: Since this functionality is registered at runtime, Python type checkers and LSPs that use static type checking cannot automatically infer the types. The user will need to annotate the IndexSet data structures created through these methods for type checking.
# Say, we have data for the problem in form of a pandas dataframe
data = pd.DataFrame(
{
'FAC': ['F0', 'F0', 'F0', 'F1', 'F2', 'F2', 'F2', 'F3', 'F3', 'F3', 'F4', 'F4'],
'CUST': [2, 3, 4, 1, 0, 3, 4, 2, 3, 4, 0, 1],
'COST': [119, 144, 185, 261, 230, 102, 192, 169, 116, 138, 126, 100],
},
)
print(data)
FAC CUST COST
0 F0 2 119
1 F0 3 144
2 F0 4 185
3 F1 1 261
4 F2 0 230
5 F2 3 102
6 F2 4 192
7 F3 2 169
8 F3 3 116
9 F3 4 138
10 F4 0 126
11 F4 1 100
# To directly get the set of facilities in form of IndexSet1D
# (pd.Series to IndexSet1D)
s_FAC = data.FAC.drop_duplicates() # need to drop duplicates
print(s_FAC.opti.to_indexset())
IndexSet1D: (FAC)
['F0', 'F1', 'F2', 'F3', 'F4']
# To directly get the set of customers in form of IndexSet1D
# (single column pd.DataFrame to IndexSet1D)
df_CUST = data[['CUST']].drop_duplicates() # need to drop duplicates
print(df_CUST.opti.to_indexset())
IndexSet1D: (CUST)
[2, 3, 4, 1, 0]
# To directly get the set of allowed combinations of facilities & customers in form of IndexSetND
# (pd.DataFrame to IndexSetND)
df_FAC_CUST = data[['FAC', 'CUST']]
print(df_FAC_CUST.opti.to_indexset())
IndexSetND: (FAC, CUST)
[('F0', 2), ('F0', 3), ('F0', 4), ('F1', 1), ('F2', 0), ('F2', 3), ('F2', 4), ('F3', 2), ('F3', 3), ('F3', 4), ('F4', 0), ('F4', 1)]
# ... or
# (pd.Index to IndexSetND)
ix_FAC_CUST = data.set_index(['FAC', 'CUST']).index
print(ix_FAC_CUST.opti.to_indexset())
IndexSetND: (FAC, CUST)
[('F0', 2), ('F0', 3), ('F0', 4), ('F1', 1), ('F2', 0), ('F2', 3), ('F2', 4), ('F3', 2), ('F3', 3), ('F3', 4), ('F4', 0), ('F4', 1)]
Total running time of the script: (0 minutes 5.054 seconds)