Python Script API
This section lists the API of the module knime.scripting.io that functions as the main contact point between KNIME
and Python in the KNIME Python Script node.
Please refer to the KNIME Python Integration Guide for more details on how to set up and use the node.
Note
Before KNIME AP 4.7, the module used to interact with KNIME from Python was called knime_io and provided a slightly
different API. Since KNIME AP 4.7 the new Python Script node is no longer in Labs status and uses the knime.scripting.io
module for interaction between KNIME and Python. It uses the same Table and Batch classes as can be used in KNIME Python Extensions.
The previous API is described in Deprecated Python Script API
Inputs and outputs
These properties can be used to retrieve data from or pass data back to KNIME Analytics Platform. The length of the input and output lists depends on the number of input and output ports of the node.
Example:
If you have a Python Script node configured with two input tables and one input object, you can
access the two tables via knime.scripting.io.input_tables[0] and knime.scripting.io.input_tables[1], and the input object
via knime.scripting.io.input_objects[0].
Input and output variables used to communicate with KNIME from within KNIME’s Python Scripting nodes
- knime.scripting.io.flow_variables: Dict[str, Any] = {}
A dictionary of flow variables provided by the KNIME workflow. New flow variables can be added to the output of the node by adding them to the dictionary. Supported flow variable types are numbers, strings, booleans and lists thereof.
- knime.scripting.io.input_objects: List = <knime.scripting._io_containers._FixedSizeListView object>
A list of input objects of this script node using zero-based indices. This list has a fixed size, which is determined by the number of input object ports configured for this node. Input objects are Python objects that are passed in from another Python script node’s``output_object`` port. This can, for instance, be used to pass trained models between Python nodes. If no input is given, the list exists but is empty.
- knime.scripting.io.input_tables: List[Table] = <knime.scripting._io_containers._FixedSizeListView object>
The input tables of this script node. This list has a fixed size, which is determined by the number of input table ports configured for this node. Tables are available in the same order as the port connectors are displayed alongside the node (from top to bottom), using zero-based indexing. If no input is given, the list exists but is empty.
- knime.scripting.io.output_images: List = <knime.scripting._io_containers._FixedSizeListView object>
The output images of this script node. This list has a fixed size, which is determined by the number of output images configured for this node. The value passed to the output port should be a bytes-like object encoding an SVG or PNG image.
Examples
>>> import knime.scripting.io as knio ... ... data = knio.input_tables[0].to_pandas() ... buffer = io.BytesIO() ... ... pyplot.figure() ... pyplot.plot('x', 'y', data=data) ... pyplot.savefig(buffer, format='svg') ... ... knio.output_images[0] = buffer.getvalue()
- knime.scripting.io.output_objects: List = <knime.scripting._io_containers._FixedSizeListView object>
The output objects of this script node. This list has a fixed size, which is determined by the number of output object ports configured for this node. Each output object can be an arbitrary Python object as long as it can be pickled. Use this to, for example, pass a trained model to another Python script node.
Examples
>>> model = torchvision.models.resnet18() ... ... ... # train/finetune model ... ... ... knime.scripting.io.output_objects[0] = model
- knime.scripting.io.output_tables: List[Table | BatchOutputTable] = <knime.scripting._io_containers._FixedSizeListView object>
The output tables of this script node. This list has a fixed size, which is determined by the number of output table ports configured for this node. You should assign a
TableorBatchOutputTableto each output port of this node.Examples
>>> import knime.scripting.io as knio ... knio.output_tables[0] = knio.Table.from_pandas(my_pandas_df)
- knime.scripting.io.output_view: NodeView | None = None
The output view of the script node. This variable must be populated with a
NodeViewwhen using the Python View node. Views can be created by calling theview(obj)method with a viewable object. See the documentation ofview(obj)to understand how views are created from different kinds of objects.Examples
>>> import knime.scripting.io as knio ... import plotly.express as px ... ... fig = px.scatter(x=data_x, y=data_y) ... knio.output_view = knio.view(fig)
Classes
- class knime.scripting.io.Table
This class serves as public API to create KNIME tables either from pandas or pyarrow. These tables can than be sent back to KNIME. This class has to be instantiated by calling either
from_pyarrow()orfrom_pandas()- __getitem__(slicing: slice | List[int] | List[str] | Tuple[slice | List[int] | List[str], slice]) _TabularView
Creates a view of this Table by slicing rows and columns. The slicing syntax is similar to that of numpy arrays, but columns can also be addressed as index lists or via a list of column names.
Notes
The syntax is [column_slice, row_slice]. Note that this is the exact opposite order than in the deprecated scripting API’s ReadTable.
- Parameters:
column_slice (int, str, slice, list) – A column index, a column name, a slice object, a list of column indices, or a list of column names.
row_slice (slice, optional) – A slice object describing which rows to use.
- Returns:
A _TabularView representing a slice of the original Table.
- Return type:
TabularView
Examples
>>> row_sliced_table = table[:, :100] # Get the first 100 rows ... column_sliced_table = table[["name", "age"]] # Get all rows of the columns "name" and "age" ... row_and_column_sliced_table = table[1:5, :100] # Get the first 100 rows of columns 1,2,3,4
- append(other: _Columnar | Sequence[_Columnar]) _ColumnarView
Append another _Columnar object (e.g. Table, Schema) or a sequence of _Columnar objects to the current _Columnar object.
- Parameters:
other (Union["_Columnar", Sequence["_Columnar"]]) – The _Columnar object or a sequence of _Columnar objects to be appended.
- Returns:
A _ColumnarView object representing the current _Columnar object after the append operation.
- Return type:
_ColumnarView
- batches() Iterator[Table]
Returns a generator over the batches in this table. A batch is part of the table with all columns, but only a subset of the rows. A batch should always fit into memory (max size currently 64mb). The table being passed to execute() is already present in batches, so accessing the data this way is very efficient.
- Returns:
A generator object that yields batches of the table.
- Return type:
generator
Examples
>>> output_table = BatchOutputTable.create() ... for batch in my_table.batches(): ... input_batch = batch.to_pandas() ... # process the batch ... output_table.append(Table.from_pandas(input_batch))
- abstract property column_names: list
Get the names of the columns in a dataset.
- static from_pandas(data: pandas.DataFrame, sentinel: str | int | None = None, row_ids: str = 'auto')
Factory method to create a Table given a pandas.DataFrame. The index of the data frame will be used as RowKey by KNIME.
Examples
>>> Table.from_pandas(my_pandas_df, sentinel="min")
- Parameters:
data (pandas.DataFrame) – A pandas DataFrame.
sentinel (str, optional) –
Interpret the following values in integral columns as missing value:
"min": min int32 or min int64 depending on the type of the column"max": max int32 or max int64 depending on the type of the columna special integer value that should be interpreted as missing value
row_ids ({'keep', 'generate', 'auto'}, optional) –
Defines what RowID should be used. Must be one of the following values:
"keep": Keep theDataFrame.indexas the RowID. Convert the index to strings if necessary."generate": Generate new RowIDs of the formatf"Row{i}"whereiis the position of the row (from0tolength-1)."auto": If theDataFrame.indexis of type int or unsigned int, usef"Row{n}"wherenis the index of the row. Else, use “keep”.
- Returns:
The created Table object.
- Return type:
Table
- static from_pyarrow(data: pyarrow.Table, sentinel: str | int | None = None, row_ids: str = 'auto')
Factory method to create a Table given a pyarrow.Table.
All batches of the table must have the same number of rows. Only the last batch can have less rows than the other batches.
Examples
>>> Table.from_pyarrow(my_pyarrow_table, sentinel="min")
- Parameters:
data (pyarrow.Table) – A pyarrow.Table
sentinel (str) –
Interpret the following values in integral columns as missing value:
"min"min int32 or min int64 depending on the type of the column"max"max int32 or max int64 depending on the type of the columna special integer value that should be interpreted as missing value
row_ids (str) –
Defines what RowID should be used. Must be one of the following values:
"keep": Use the first column of the table as RowID. The first column must be of type string."generate": Generate new RowIDs of the formatf"Row{i}"whereiis the position of the row (from0tolength-1)."auto": Use the first column of the table if it has the name “<RowID>” and is of type string or integer.If the “<RowID>” column is of type string, use it directly
If the “<RowID>” column is of an integer type use
f"Row{n}wherenis the value of the integer column.Generate new RowIDs (
"generate") if the first column has another type or name.
- Returns:
The created Table instance.
- Return type:
pyarrow.Table
- insert(other: _Columnar, at: int) _Columnar
Insert a column or another _Columnar object (e.g. Table, Schema) into the current _Columnar object at a specific position.
- Parameters:
other (_Columnar or Column) – The column or _Columnar object to be inserted.
at (int) – The index at which the insertion should occur.
- Returns:
The _Columnar object after the insertion.
- Return type:
_Columnar
- Raises:
TypeError – If other is not of type _Columnar or Column.
Notes
The insertion is done in-place, meaning the current _Columnar object is modified.
- abstract property num_columns: int
Get the number of columns in the dataset.
- remove(slicing: str | int | List[str])
Implements remove method for Columnar data structures. The input can be a column index, a column name or a list of column names.
If the input is a column index, the column with that index will be removed. If it is a column name, then the first column with matching name is removed. Passing a list of column names will filter out all (including duplicate) columns with matching names.
- Parameters:
slicing (int | list | str) – Can be of type integer representing the index in column_names to remove. Or a list of strings removing every column matching from that list. Or a string of which first occurrence is removed from the column_names.
- Returns:
_ColumnarView
- Return type:
A View missing the columns to be removed.
- Raises:
ValueError – If no matching column is found given a list or str.:
IndexError – If column is accessed by integer and is out of bounds.:
TypeError – If the key is neither an integer nor a string or list of strings.:
- abstract property schema: Schema
The schema of this table, containing column names, types, and potentially metadata
- to_batches() Iterator[Table]
Alias for Table.batches()
- to_pandas(sentinel: str | int | None = None) pandas.DataFrame
Access this table as a pandas.DataFrame.
- Parameters:
sentinel (str or int) –
Replace missing values in integral columns by the given value. It can be one of the following:
"min"min int32 or min int64 depending on the type of the column"max"max int32 or max int64 depending on the type of the columnAn integer value that should be inserted for each missing value
- to_pyarrow(sentinel: str | int | None = None) pyarrow.Table
Access this table as a pyarrow.Table.
- Parameters:
sentinel (str or int) –
Replace missing values in integral columns by the given value, which can be one of the following:
”min”: minimum value of int32 or int64 depending on the type of the column
”max”: maximum value of int32 or int64 depending on the type of the column
An integer value that should be inserted for each missing value
- class knime.scripting.io.BatchOutputTable
An output table generated by combining smaller tables (also called batches).
Notes
All batches must have the same number, names and types of columns.
All batches except the last batch must have the same number of rows.
The last batch can have fewer rows than the other batches.
This object does not provide means to continue to work with the data but is meant to be used as a return value of a Node’s execute() method.
- abstract append(batch: Table | pandas.DataFrame | pyarrow.Table | pyarrow.RecordBatch) None
Append a batch to this output table. The first batch defines the structure of the table, and all subsequent batches must have the same number of columns, column names and column types.
Notes
Keep in mind that the RowID will be handled according to the “row_ids” mode chosen in BatchOutputTable.create.
- static create(row_ids: str = 'keep')
Create an empty BatchOutputTable
- Parameters:
row_ids (str) –
Defines what RowID should be used. Must be one of the following values:
”keep”:
For appending DataFrames: Keep the DataFrame.index as the RowID. Convert the index to strings if necessary.
For appending Arrow tables or record batches: Use the first column of the table as RowID. The first column must be of type string.
”generate”: Generate new RowIDs of the format “Row{i}”
- static from_batches(generator, row_ids: str = 'generate')
Create output table where each batch is provided by a generator
- Parameters:
row_ids (object) – See BatchOutputTable.create.
- abstract property num_batches: int
The number of batches written to this output table.
Views
- knime.scripting.io.view(obj) NodeView
Return a best-effort
NodeViewfor obj.The function attempts, in order, to
match obj to one of the dedicated helper functions listed below, or
fall back to the object’s IPython
_repr_*_methods (_repr_html_,_repr_svg_,_repr_png_,_repr_jpeg_).
Images - SVG, PNG, JPEG, matplotlib, seaborn - are exported with a static snapshot and therefore show up in Reports of the KNIME Reporting Extension outputs automatically. All other supported objects (HTML strings, Plotly figures, etc.) are not supported in reports by default. To enable them in reports, use the
view_html()function and set thecan_be_used_in_reportflag.Special view implementations
The input must match one of the following patterns:
HTML
strstarting with"<!DOCTYPE html>". Must be self-contained; external links open in a browser.SVG
strcontaining valid<svg … xmlns="…">markup.PNG
bytesbeginning with the PNG magic number.JPEG
bytesbeginning0xFFD8FFand ending0xFFD9.Matplotlib
matplotlib.figure.Figureinstance.Plotly
plotly.graph_objects.Figureinstance.
- param obj:
The object to visualise.
- type obj:
Any
- raises ValueError:
If no suitable helper or
_repr_*_method is found.
- knime.scripting.io.view_matplotlib(fig=None, format='png') NodeView
Create a
NodeViewthat displays a matplotlib figure and is report-ready out of the box.The figure is exported as a PNG or SVG (controlled by format). If fig is None, the current active figure is used. The figure is then closed so it should not be modified afterwards.
Because a static image is always supplied, the helper sets
can_be_used_in_report=Trueautomatically, allowing the view to appear in reports generated by the KNIME Reporting Extension without additional JavaScript.- Parameters:
fig (matplotlib.figure.Figure, optional) – Figure to render. Defaults to the current figure.
format ({"png", "svg"}, default "png") – Output format embedded in the HTML snippet.
- Raises:
ImportError – If matplotlib is not available.
TypeError – If the figure is not a matplotlib figure.
- knime.scripting.io.view_seaborn() NodeView
Create a
NodeViewthat shows the current active seaborn figure and is report-ready out of the box.The function simply forwards to
view_matplotlib()because seaborn charts are regular matplotlib figures under the hood.As a static image is always embedded, the helper sets
can_be_used_in_report=Trueautomatically, so the view can be included in reports generated by the KNIME Reporting Extension without any extra JavaScript.- Raises:
ImportError – If matplotlib is not available.
- knime.scripting.io.view_plotly(fig) NodeView
Create a view showing the given plotly figure.
The figure is displayed by exporting it as an HTML document.
To be able to synchronize the selection between the view and other KNIME views the customdata of the figure traces must be set to the RowID.
- Parameters:
fig (plotly.graph_objects.Figure) – A plotly figure object which should be displayed.
- Raises:
ImportError – If plotly is not available.
TypeError – If the figure is not a plotly figure.
Examples
>>> fig = px.scatter(df, x="my_x_col", y="my_y_col", color="my_label_col", ... custom_data=[df.index]) ... node_view = view_plotly(fig)
- knime.scripting.io.view_html(html: str, svg_or_png: str | bytes | None = None, render_fn: Callable[[], str | bytes] | None = None, can_be_used_in_report=False) NodeView
Create a NodeView that displays the given HTML document.
The document must be self-contained and must not reference external resources. Links to external resources will be opened in an external browser.
- Parameters:
html (str) – A string containing the HTML document.
svg_or_png (str or bytes) – A rendered representation of the HTML page. Either a string containing an SVG or a bytes object containing a PNG image.
render_fn (callable) – A callable that returns an SVG or PNG representation of the page.
can_be_used_in_report (bool, default
False) – Indicates whether this view will appear in a report generated by the KNIME Reporting Extension. Only set the flag toTruewhen you provide the view to the knime-ui-extension-service ReportingService.
- knime.scripting.io.view_svg(svg: str) NodeView
Create a
NodeViewthat shows an SVG and is report-ready out of the box.- Parameters:
svg (str) – SVG markup (must include the
<svg ...>root element with the XML namespace declaration).
- knime.scripting.io.view_png(png: bytes) NodeView
Create a
NodeViewthat shows a PNG image and is report-ready out of the box.- Parameters:
png (bytes) – Raw PNG data.
- knime.scripting.io.view_jpeg(jpeg: bytes) NodeView
Create a
NodeViewthat shows a JPEG image and is report-ready out of the box.- Parameters:
jpeg (bytes) – Raw JPEG data.
- knime.scripting.io.view_ipy_repr(obj) NodeView
Create a NodeView by using the IPython _repr_*_ function of the object.
Tries to use:
_repr_html_
_repr_svg_
_repr_png_
_repr_jpeg_
in this order.
- Parameters:
obj (object) – The object which should be displayed.
- Raises:
ValueError – If no view could be created for the given object.
- class knime.scripting.io.NodeView(html: str, svg_or_png: str | bytes | None = None, render_fn: Callable[[], str | bytes] | None = None, can_be_used_in_report: bool = False)
A wrapper that embeds a visualisation in KNIME Analytics Platform.
Notes
Do not instantiate NodeView directly—use one of the helper functions (
view(),view_html(),view_svg(),view_png(),view_jpeg(), …).- Parameters:
html (str) – Self-contained HTML snippet that renders the interactive view inside KNIME AP.
svg_or_png (str | bytes | None, optional) – Static SVG or PNG/JPEG bytes handed to the Reporting engine when a report is generated. If None, the value is produced lazily via render_fn.
render_fn (Callable[[], str | bytes] | None, optional) – Callback that returns svg_or_png on-demand.
can_be_used_in_report (bool, default
False) –Set True to signal that this view can be embedded in a report produced by the KNIME Reporting Extension.
Image helpers (view_svg, view_png, view_jpeg, :pydata:`view_matplotlib`, …) turn the flag on automatically because they always provide the image to the Reporting engine.
HTML helpers leave the flag False by default. Only enable it when the view’s JavaScript sends a static representation to the ReportingService using
reportingService.setReportingContent(...). If you are not in control of the JavaScript or cannot ensure this, keep the flag False so the view is omitted from the report instead of breaking it.
Utility functions
- knime.scripting.io.get_workflow_temp_dir() str
Returns the local absolute path where temporary files for this workflow should be stored. Files created in this folder are not automatically deleted by KNIME.
By default, this folder is located in the operating system’s temporary folder. In that case, the contents will be cleaned by the OS.
- knime.scripting.io.get_workflow_data_area_dir() str
Returns the local absolute path to the current workflow’s data area folder. This folder is meant to be part of the workflow, so its contents are included whenever the workflow is shared.