ML Experiments

Track all your machine learning experiments in one place. Monitor metrics, compare runs, and manage model versions.

Creating an Experiment

Start Tracking

Click ML Experiments in the sidebar
Click New Experiment
Give your experiment a name (e.g., "Customer Churn Model")
Optionally add a description

Logging Metrics

From your notebook, log metrics automatically:

from credvault import experiment

# Start tracking
exp = experiment.start('Customer Churn Model')

# Log metrics during training
for epoch in range(10):
    loss = model.train_step(data)
    exp.log_metric('loss', loss, step=epoch)
    
    accuracy = model.evaluate(test_data)
    exp.log_metric('accuracy', accuracy, step=epoch)

# Log final results
exp.log_metric('final_accuracy', 0.92)
exp.log_metric('final_loss', 0.18)

Tracking Runs

What Gets Recorded

Each run captures:

Metrics - Accuracy, loss, precision, recall, etc.
Parameters - Learning rate, batch size, model architecture
Files - Model weights, artifacts, outputs
Metadata - Start time, duration, status

Logging Parameters

Track the configuration of each run:

exp.log_params({
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 100,
    'optimizer': 'adam',
    'loss_function': 'crossentropy'
})

Saving Artifacts

Save model files and results:

# Save trained model
model.save('model.pkl')
exp.log_artifact('model.pkl')

# Save predictions
predictions.to_csv('predictions.csv')
exp.log_artifact('predictions.csv')

# Save visualizations
plt.savefig('confusion_matrix.png')
exp.log_artifact('confusion_matrix.png')

Comparing Experiments

View All Runs

See all runs in an experiment with a table:

Metrics columns
Parameters columns
Status and duration
Click any run for details

Side-by-Side Comparison

Compare two runs:

Select Compare from the toolbar
Choose two runs to compare
See metric differences highlighted
View parameter differences

Finding Best Run

Filter runs by metrics:

Best Accuracy - Highest accuracy value
Lowest Loss - Minimum loss value
Fastest - Shortest training time
Custom - Define your own criteria

Managing Models

Version Your Models

Tag important runs as model versions:

Find a successful run
Click Save as Model
Give it a version name (e.g., "v1.0-production")
Add description

Model Registry

Your saved models appear in the Model Registry:

View all versions
See metrics for each version
Download model files
Promote to production

Visualizations

Metric Graphs

Automatically see charts:

Line graphs - Metric trends over time
Scatter plots - Relationship between metrics
Parallel coordinates - Compare multiple runs

Custom Analysis

Export data for custom analysis:

# Download experiment data
data = exp.get_runs()

# Plot custom comparison
import matplotlib.pyplot as plt
plt.scatter(data['learning_rate'], data['accuracy'])
plt.xlabel('Learning Rate')
plt.ylabel('Accuracy')
plt.show()

Team Collaboration

Click Share on experiment
Select team members
Choose permission level:
- View - See results only
- Comment - Add notes and insights
- Edit - Create new runs

Notebooks and Experiments

Link experiments to your notebooks:

# Run notebook, automatically log to experiment
# Share notebook with team
# Team sees both code and results together

Best Practices

Naming Convention

Use clear, descriptive names:

Good:
- "Customer_Churn_v2_GridSearch"
- "NLP_Sentiment_Adam_LR0.001"

Avoid:
- "test1", "experiment2", "model_final"

Log Everything

Log parameters that might matter later:

exp.log_params({
    'data_version': '2024_Q1',
    'train_test_split': 0.8,
    'random_seed': 42,
    'data_augmentation': True
})

Organize Experiments

Group related experiments:

Use experiment folders
Add tags: #urgent, #production, #testing
Write clear descriptions

Common Workflows

Hyperparameter Tuning

Create experiment "HPO_Search"
For each parameter combination:
- Log parameters
- Train model
- Log metrics
Compare all runs
Select best parameters

Model Comparison

Create separate experiments for each approach
Train multiple models
Compare side-by-side
Choose best one

Production Baseline

Create "Production_Baseline" experiment
Log current production model metrics
Compare new experiments against baseline
Only promote if improvement is significant

Troubleshooting

Metrics not appearing

Ensure you're inside an active experiment
Check metric names are spelled correctly
Verify metric values are numbers

Can't find my run

Try filtering by date range
Search by parameter values
Check if in a different experiment

Export data

Download as CSV for external analysis
Export model files
Generate reports

Notebook - Run experiments here
AI Assistant - Get model suggestions
API Keys - Integrate with external ML tools