ML Experiments

Track all your machine learning experiments in one place. Monitor metrics, compare runs, and manage model versions.

Creating an Experiment

Start Tracking

  1. Click ML Experiments in the sidebar
  2. Click New Experiment
  3. Give your experiment a name (e.g., "Customer Churn Model")
  4. Optionally add a description

Logging Metrics

From your notebook, log metrics automatically:

from credvault import experiment

# Start tracking
exp = experiment.start('Customer Churn Model')

# Log metrics during training
for epoch in range(10):
    loss = model.train_step(data)
    exp.log_metric('loss', loss, step=epoch)
    
    accuracy = model.evaluate(test_data)
    exp.log_metric('accuracy', accuracy, step=epoch)

# Log final results
exp.log_metric('final_accuracy', 0.92)
exp.log_metric('final_loss', 0.18)

Tracking Runs

What Gets Recorded

Each run captures:

  • Metrics - Accuracy, loss, precision, recall, etc.
  • Parameters - Learning rate, batch size, model architecture
  • Files - Model weights, artifacts, outputs
  • Metadata - Start time, duration, status

Logging Parameters

Track the configuration of each run:

exp.log_params({
    'learning_rate': 0.001,
    'batch_size': 32,
    'epochs': 100,
    'optimizer': 'adam',
    'loss_function': 'crossentropy'
})

Saving Artifacts

Save model files and results:

# Save trained model
model.save('model.pkl')
exp.log_artifact('model.pkl')

# Save predictions
predictions.to_csv('predictions.csv')
exp.log_artifact('predictions.csv')

# Save visualizations
plt.savefig('confusion_matrix.png')
exp.log_artifact('confusion_matrix.png')

Comparing Experiments

View All Runs

See all runs in an experiment with a table:

  • Metrics columns
  • Parameters columns
  • Status and duration
  • Click any run for details

Side-by-Side Comparison

Compare two runs:

  1. Select Compare from the toolbar
  2. Choose two runs to compare
  3. See metric differences highlighted
  4. View parameter differences

Finding Best Run

Filter runs by metrics:

  • Best Accuracy - Highest accuracy value
  • Lowest Loss - Minimum loss value
  • Fastest - Shortest training time
  • Custom - Define your own criteria

Managing Models

Version Your Models

Tag important runs as model versions:

  1. Find a successful run
  2. Click Save as Model
  3. Give it a version name (e.g., "v1.0-production")
  4. Add description

Model Registry

Your saved models appear in the Model Registry:

  • View all versions
  • See metrics for each version
  • Download model files
  • Promote to production

Visualizations

Metric Graphs

Automatically see charts:

  • Line graphs - Metric trends over time
  • Scatter plots - Relationship between metrics
  • Parallel coordinates - Compare multiple runs

Custom Analysis

Export data for custom analysis:

# Download experiment data
data = exp.get_runs()

# Plot custom comparison
import matplotlib.pyplot as plt
plt.scatter(data['learning_rate'], data['accuracy'])
plt.xlabel('Learning Rate')
plt.ylabel('Accuracy')
plt.show()

Team Collaboration

Sharing Experiments

  1. Click Share on experiment
  2. Select team members
  3. Choose permission level:
    • View - See results only
    • Comment - Add notes and insights
    • Edit - Create new runs

Notebooks and Experiments

Link experiments to your notebooks:

# Run notebook, automatically log to experiment
# Share notebook with team
# Team sees both code and results together

Best Practices

Naming Convention

Use clear, descriptive names:

Good:
- "Customer_Churn_v2_GridSearch"
- "NLP_Sentiment_Adam_LR0.001"

Avoid:
- "test1", "experiment2", "model_final"

Log Everything

Log parameters that might matter later:

exp.log_params({
    'data_version': '2024_Q1',
    'train_test_split': 0.8,
    'random_seed': 42,
    'data_augmentation': True
})

Organize Experiments

Group related experiments:

  • Use experiment folders
  • Add tags: #urgent, #production, #testing
  • Write clear descriptions

Common Workflows

Hyperparameter Tuning

  1. Create experiment "HPO_Search"
  2. For each parameter combination:
    • Log parameters
    • Train model
    • Log metrics
  3. Compare all runs
  4. Select best parameters

Model Comparison

  1. Create separate experiments for each approach
  2. Train multiple models
  3. Compare side-by-side
  4. Choose best one

Production Baseline

  1. Create "Production_Baseline" experiment
  2. Log current production model metrics
  3. Compare new experiments against baseline
  4. Only promote if improvement is significant

Troubleshooting

Metrics not appearing

  • Ensure you're inside an active experiment
  • Check metric names are spelled correctly
  • Verify metric values are numbers

Can't find my run

  • Try filtering by date range
  • Search by parameter values
  • Check if in a different experiment

Export data

  • Download as CSV for external analysis
  • Export model files
  • Generate reports