0

I am trying to save the output from sklearn.smv.SVC training when verbose=True to a log-file. However, since it uses LibSVM in the back-end, I cannot figure out how this works. Copilot hasn't helped.

Here's a brief example. It isn't the exact problem I am trying to solve or the workflow, but gives the idea:

import numpy as np
import sklearn
import os

if __name__ == '__main__':
    breast_data = sklearn.datasets.load_breast_cancer()

    X = breast_data.data
    y = breast_data.target
   
    np_rand_state = np.random.RandomState(0)

    X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.33, random_state=np_rand_state)

    model = sklearn.svm.SVC(verbose=True)
    model.fit(X_train, y_train)

The console output is here from the model.fit():

*
optimization finished, #iter = 79
obj = -100.327399, rho = -0.702443
nSV = 114, nBSV = 109
Total nSV = 114

I want to save the console output to a log-file, using the integrated python logging functionality

(logging). The output the console is not done by a simple print statement, but through the SVM backend from sklearn.svm.SVC. This means it is not as simple as redirecting the print to a log file.

1 Answer 1

1

The verbose=True output from sklearn.svm.SVC comes from the underlying LibSVM C library, not from Python’s print() or logging.
That means the messages are written at the C level to stdout, so regular Python logging or contextlib.redirect_stdout won’t capture them.

To log that output, you need to temporarily redirect the C-level stdout.
The cleanest and most reliable way to do that is with the wurlitzer package, which safely captures both C and Python output streams.


Example using wurlitzer

import numpy as np
import sklearn
from wurlitzer import pipes

breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, test_size=0.33, random_state=0
)

model = sklearn.svm.SVC(verbose=True)

# Capture LibSVM output into a file
with open("svm_training.log", "w") as f, pipes(stdout=f, stderr=f):
    model.fit(X_train, y_train)

This will write all the training messages (the ones normally printed to the console, like optimization finished, #iter = ...) to svm_training.log.


If you can’t install wurlitzer

You can do a manual redirect of the underlying file descriptor:

import os
import numpy as np
import sklearn

breast_data = sklearn.datasets.load_breast_cancer()
X = breast_data.data
y = breast_data.target

X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(
    X, y, test_size=0.33, random_state=0
)

model = sklearn.svm.SVC(verbose=True)

with open("svm_training.log", "w") as f:
    old_stdout = os.dup(1)
    os.dup2(f.fileno(), 1)
    try:
        model.fit(X_train, y_train)
    finally:
        os.dup2(old_stdout, 1)
        os.close(old_stdout)
Sign up to request clarification or add additional context in comments.

1 Comment

I know about the LibSVM and the fact it is calling C++ stdout in the backend, I just could not figure out how to capture it. I've never heard of wurlitzer before, and even though it looks like what I need, it isn't available on Windows. And unfortunately, I cannot use a Linux setup (I wish I could, but that is beyond my control.... corporate work life). But, the other method of using os.dup and os.dup2 does work.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.