SAP HANA ML Python Client (hana-ml)
Package Version: 2.22.241011
Last Verified: 2025-11-27
Table of Contents
-
Installation & Setup
-
Quick Start
-
Core Libraries
-
Common Patterns
-
Best Practices
-
Bundled Resources
Installation & Setup
pip install hana-ml
Requirements: Python 3.8+, SAP HANA 2.0 SPS03+ or SAP HANA Cloud
Quick Start
Connection & DataFrame
from hana_ml import ConnectionContext
Connect
conn = ConnectionContext( address='<hostname>', port=443, user='<username>', password='<password>', encrypt=True )
Create DataFrame
df = conn.table('MY_TABLE', schema='MY_SCHEMA') print(f"Shape: {df.shape}") df.head(10).collect()
PAL Classification
from hana_ml.algorithms.pal.unified_classification import UnifiedClassification
Train model
clf = UnifiedClassification(func='RandomDecisionTree') clf.fit(train_df, features=['F1', 'F2', 'F3'], label='TARGET')
Predict & evaluate
predictions = clf.predict(test_df, features=['F1', 'F2', 'F3']) score = clf.score(test_df, features=['F1', 'F2', 'F3'], label='TARGET')
APL AutoML
from hana_ml.algorithms.apl.classification import AutoClassifier
Automated classification
auto_clf = AutoClassifier() auto_clf.fit(train_df, label='TARGET') predictions = auto_clf.predict(test_df)
Model Persistence
from hana_ml.model_storage import ModelStorage
ms = ModelStorage(conn) clf.name = 'MY_CLASSIFIER' ms.save_model(model=clf, if_exists='replace')
Core Libraries
PAL (Predictive Analysis Library)
-
100+ algorithms executed in-database
-
Categories: Classification, Regression, Clustering, Time Series, Preprocessing
-
Key classes: UnifiedClassification , UnifiedRegression , KMeans , ARIMA
-
See: references/PAL_ALGORITHMS.md for complete list
APL (Automated Predictive Library)
-
AutoML capabilities with automatic feature engineering
-
Key classes: AutoClassifier , AutoRegressor , GradientBoostingClassifier
-
See: references/APL_ALGORITHMS.md for details
DataFrames
-
Lazy evaluation - builds SQL until collect() called
-
In-database processing for optimal performance
-
See: references/DATAFRAME_REFERENCE.md for complete API
Visualizers
-
EDA plots, model explanations, metrics
-
SHAP integration for model interpretability
-
See: references/VISUALIZERS.md for 14 visualization modules
Common Patterns
Train-Test Split
from hana_ml.algorithms.pal.partition import train_test_val_split
train, test, val = train_test_val_split( data=df, training_percentage=0.7, testing_percentage=0.2, validation_percentage=0.1 )
Feature Importance
APL models
importance = auto_clf.get_feature_importances()
PAL models
from hana_ml.algorithms.pal.preprocessing import FeatureSelection fs = FeatureSelection() fs.fit(train_df, features=features, label='TARGET')
Pipeline
from hana_ml.algorithms.pal.pipeline import Pipeline from hana_ml.algorithms.pal.preprocessing import Imputer, FeatureNormalizer
pipeline = Pipeline([ ('imputer', Imputer(strategy='mean')), ('normalizer', FeatureNormalizer()), ('classifier', UnifiedClassification(func='RandomDecisionTree')) ])
Best Practices
-
Use lazy evaluation - Operations build SQL without execution until collect()
-
Leverage in-database processing - Keep data in HANA for performance
-
Use Unified interfaces - Consistent APIs across algorithms
-
Save models - Use ModelStorage for persistence
-
Explain predictions - Use SHAP explainers for interpretability
-
Monitor AutoML - Use PipelineProgressStatusMonitor for long-running jobs
Bundled Resources
Reference Files
references/DATAFRAME_REFERENCE.md (479 lines)
-
ConnectionContext API, DataFrame operations, SQL generation
references/PAL_ALGORITHMS.md (869 lines)
-
Complete PAL algorithm reference (100+ algorithms)
-
Classification, Regression, Clustering, Time Series, Preprocessing
references/APL_ALGORITHMS.md (534 lines)
-
AutoML capabilities, automated feature engineering
-
AutoClassifier, AutoRegressor, GradientBoosting classes
references/VISUALIZERS.md (704 lines)
-
14 visualization modules (EDA, SHAP, metrics, time series)
-
Plot types, configuration, export options
references/SUPPORTING_MODULES.md (626 lines)
-
Model storage, spatial analytics, graph algorithms
-
Text mining, statistics, error handling
Error Handling
from hana_ml.ml_exceptions import Error
try: clf.fit(train_df, features=features, label='TARGET') except Error as e: print(f"HANA ML Error: {e}")
Documentation
-
Official Docs: https://help.sap.com/doc/1d0ebfe5e8dd44d09606814d83308d4b/2.0.07/en-US/hana_ml.html
-
PyPI Package: https://pypi.org/project/hana-ml/