Model Management

The Model Management API provides comprehensive control over your trained AI models, including versioning, metadata management, and deployment preparation. It serves as the central registry for all models produced by your training jobs.

List Models

Retrieve a list of trained models for your account.

curl -X GET "https://api.tensorone.ai/v2/training/models" \
  -H "Authorization: Bearer YOUR_API_KEY"

Query Parameters

type: Filter by model type (llm, vision, multimodal, custom)
status: Filter by status (training, ready, deployed, archived)
trainingJobId: Filter by originating training job
limit: Number of models to return (1-100, default: 50)
offset: Number of models to skip for pagination
sort: Sort order (created_at, updated_at, name, size)
order: Sort direction (asc, desc, default: desc)

Response

{
  "models": [
    {
      "id": "model_1234567890abcdef",
      "name": "llama-7b-customer-support",
      "type": "llm",
      "status": "ready",
      "baseModel": "meta-llama/Llama-2-7b-hf",
      "version": "v1.0.0",
      "trainingJobId": "job_1234567890abcdef",
      "size": {
        "parameters": 7000000000,
        "bytes": 13421772800,
        "compressed": 6710886400
      },
      "metrics": {
        "finalLoss": 0.234,
        "accuracy": 0.892,
        "perplexity": 12.45
      },
      "deployments": {
        "active": 2,
        "endpoints": [
          "ep_prod_support_chat",
          "ep_staging_support_chat"
        ]
      },
      "createdAt": "2024-01-15T14:30:00Z",
      "updatedAt": "2024-01-15T16:45:00Z"
    }
  ],
  "pagination": {
    "total": 8,
    "limit": 50,
    "offset": 0,
    "hasMore": false
  }
}

Get Model Details

Retrieve detailed information about a specific model.

curl -X GET "https://api.tensorone.ai/v2/training/models/model_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY"

Response

{
  "id": "model_1234567890abcdef",
  "name": "llama-7b-customer-support",
  "type": "llm",
  "status": "ready",
  "description": "Fine-tuned LLaMA 7B model for customer support conversations",
  "baseModel": "meta-llama/Llama-2-7b-hf",
  "version": "v1.0.0",
  "trainingJobId": "job_1234567890abcdef",
  "datasetId": "ds_1234567890abcdef",
  "architecture": {
    "layers": 32,
    "attention_heads": 32,
    "hidden_size": 4096,
    "vocabulary_size": 32000,
    "context_length": 2048
  },
  "training": {
    "strategy": "lora",
    "parameters": {
      "rank": 16,
      "alpha": 32,
      "dropout": 0.1,
      "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"]
    },
    "epochs": 3,
    "final_learning_rate": 1.2e-5,
    "total_steps": 1875,
    "training_time": 10800
  },
  "size": {
    "parameters": 7000000000,
    "trainable_parameters": 4194304,
    "bytes": 13421772800,
    "compressed": 6710886400
  },
  "metrics": {
    "training": {
      "final_loss": 0.234,
      "best_loss": 0.198,
      "perplexity": 12.45
    },
    "validation": {
      "loss": 0.267,
      "accuracy": 0.892,
      "f1_score": 0.885,
      "bleu_score": 0.78
    },
    "custom": {
      "helpfulness_score": 8.7,
      "safety_score": 9.2,
      "coherence_score": 8.9
    }
  },
  "files": [
    {
      "name": "pytorch_model.bin",
      "size": 13421772800,
      "type": "model_weights",
      "checksum": "sha256:a1b2c3d4e5f6..."
    },
    {
      "name": "config.json",
      "size": 1024,
      "type": "configuration",
      "checksum": "sha256:b2c3d4e5f6a7..."
    },
    {
      "name": "tokenizer.json",
      "size": 2048,
      "type": "tokenizer",
      "checksum": "sha256:c3d4e5f6a7b8..."
    }
  ],
  "deployments": [
    {
      "id": "ep_prod_support_chat",
      "name": "Production Support Chat",
      "status": "active",
      "url": "https://api.tensorone.ai/v2/ep_prod_support_chat/runsync",
      "workers": 3,
      "createdAt": "2024-01-15T16:00:00Z"
    }
  ],
  "tags": ["customer-support", "production", "llama"],
  "metadata": {
    "domain": "customer_support",
    "language": "en",
    "use_case": "conversational_ai",
    "quality_gate": "passed"
  },
  "createdAt": "2024-01-15T14:30:00Z",
  "updatedAt": "2024-01-15T16:45:00Z"
}

Update Model Metadata

Update model information, tags, and metadata.

curl -X PATCH "https://api.tensorone.ai/v2/training/models/model_1234567890abcdef" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "llama-7b-customer-support-v2",
    "description": "Updated fine-tuned model with improved safety",
    "tags": ["customer-support", "production", "llama", "v2"],
    "metadata": {
      "domain": "customer_support",
      "language": "en",
      "use_case": "conversational_ai",
      "quality_gate": "passed",
      "safety_review": "approved",
      "performance_tier": "premium"
    }
  }'

Create Model Version

Create a new version of an existing model from a training job.

curl -X POST "https://api.tensorone.ai/v2/training/models/model_1234567890abcdef/versions" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "version": "v1.1.0",
    "trainingJobId": "job_new_training_123",
    "description": "Improved version with additional safety training",
    "changelog": [
      "Enhanced safety guidelines training",
      "Improved response coherence",
      "Reduced hallucination rate by 15%"
    ]
  }'

Deploy Model

Deploy a model to create a new inference endpoint.

curl -X POST "https://api.tensorone.ai/v2/training/models/model_1234567890abcdef/deploy" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "customer-support-chat-v2",
    "environment": "production",
    "gpuType": "a100",
    "workers": 3,
    "scaling": {
      "minWorkers": 1,
      "maxWorkers": 10,
      "targetUtilization": 0.7
    },
    "configuration": {
      "max_tokens": 512,
      "temperature": 0.7,
      "top_p": 0.9,
      "repetition_penalty": 1.1
    }
  }'

Response

{
  "deployment": {
    "id": "ep_customer_support_v2",
    "modelId": "model_1234567890abcdef",
    "name": "customer-support-chat-v2",
    "status": "deploying",
    "environment": "production",
    "url": "https://api.tensorone.ai/v2/ep_customer_support_v2/runsync",
    "configuration": {
      "gpuType": "NVIDIA A100",
      "workers": 3,
      "maxTokens": 512,
      "temperature": 0.7
    },
    "estimatedReadyTime": "2024-01-15T17:05:00Z",
    "createdAt": "2024-01-15T17:00:00Z"
  }
}

Download Model

Download model files for local deployment or analysis.

curl -X POST "https://api.tensorone.ai/v2/training/models/model_1234567890abcdef/download" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "files": ["pytorch_model.bin", "config.json", "tokenizer.json"],
    "format": "pytorch",
    "compression": "zip"
  }'

Response

{
  "downloadUrl": "https://download.tensorone.ai/models/model_1234567890abcdef.zip",
  "downloadToken": "tok_download_1234567890abcdef",
  "expiresAt": "2024-01-15T19:00:00Z",
  "size": 6710886400,
  "files": [
    {
      "name": "pytorch_model.bin",
      "size": 13421772800,
      "checksum": "sha256:a1b2c3d4e5f6..."
    },
    {
      "name": "config.json",
      "size": 1024,
      "checksum": "sha256:b2c3d4e5f6a7..."
    },
    {
      "name": "tokenizer.json",
      "size": 2048,
      "checksum": "sha256:c3d4e5f6a7b8..."
    }
  ]
}

Compare Models

Compare performance metrics between different models or versions.

curl -X POST "https://api.tensorone.ai/v2/training/models/compare" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "models": [
      "model_1234567890abcdef",
      "model_2345678901bcdefg",
      "model_3456789012cdefgh"
    ],
    "metrics": ["accuracy", "f1_score", "perplexity", "bleu_score"],
    "benchmarks": ["custom_eval_suite", "hellaswag", "truthfulqa"]
  }'

Response

{
  "comparison": {
    "models": [
      {
        "id": "model_1234567890abcdef",
        "name": "llama-7b-customer-support",
        "version": "v1.0.0",
        "metrics": {
          "accuracy": 0.892,
          "f1_score": 0.885,
          "perplexity": 12.45,
          "bleu_score": 0.78
        },
        "benchmarks": {
          "custom_eval_suite": 85.2,
          "hellaswag": 76.8,
          "truthfulqa": 42.3
        }
      }
    ],
    "best_performing": {
      "accuracy": "model_1234567890abcdef",
      "f1_score": "model_2345678901bcdefg",
      "perplexity": "model_3456789012cdefgh"
    },
    "recommendations": [
      "model_1234567890abcdef shows best overall performance",
      "Consider model_2345678901bcdefg for precision-critical tasks"
    ]
  }
}

Archive Model

Archive a model to reduce storage costs while maintaining metadata.

curl -X POST "https://api.tensorone.ai/v2/training/models/model_1234567890abcdef/archive" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "reason": "Replaced by newer version",
    "retention_period": "90d"
  }'

SDK Examples

Python SDK

from tensorone import TensorOneClient

client = TensorOneClient(api_key="YOUR_API_KEY")

# List models
models = client.training.models.list(
    type="llm",
    status="ready",
    limit=20
)

for model in models:
    print(f"{model.name} - {model.size.parameters} parameters")

# Get model details
model = client.training.models.get("model_1234567890abcdef")
print(f"Model: {model.name}")
print(f"Accuracy: {model.metrics.validation.accuracy}")
print(f"F1 Score: {model.metrics.validation.f1_score}")

# Deploy model
deployment = client.training.models.deploy(
    model_id="model_1234567890abcdef",
    name="customer-support-chat",
    environment="production",
    gpu_type="a100",
    workers=3,
    scaling={
        "min_workers": 1,
        "max_workers": 10,
        "target_utilization": 0.7
    },
    configuration={
        "max_tokens": 512,
        "temperature": 0.7,
        "top_p": 0.9
    }
)

print(f"Deployment created: {deployment.url}")

# Compare models
comparison = client.training.models.compare(
    models=[
        "model_1234567890abcdef",
        "model_2345678901bcdefg"
    ],
    metrics=["accuracy", "f1_score", "perplexity"]
)

for model_result in comparison.models:
    print(f"{model_result.name}: {model_result.metrics}")

JavaScript SDK

import { TensorOneClient } from '@tensorone/sdk';

const client = new TensorOneClient({ apiKey: 'YOUR_API_KEY' });

// List models
const models = await client.training.models.list({
  type: 'llm',
  status: 'ready',
  limit: 20
});

models.forEach(model => {
  console.log(`${model.name} - ${model.size.parameters} parameters`);
});

// Get model details
const model = await client.training.models.get('model_1234567890abcdef');
console.log(`Model: ${model.name}`);
console.log(`Accuracy: ${model.metrics.validation.accuracy}`);
console.log(`F1 Score: ${model.metrics.validation.f1Score}`);

// Deploy model
const deployment = await client.training.models.deploy('model_1234567890abcdef', {
  name: 'customer-support-chat',
  environment: 'production',
  gpuType: 'a100',
  workers: 3,
  scaling: {
    minWorkers: 1,
    maxWorkers: 10,
    targetUtilization: 0.7
  },
  configuration: {
    maxTokens: 512,
    temperature: 0.7,
    topP: 0.9
  }
});

console.log(`Deployment created: ${deployment.url}`);

// Download model
const downloadInfo = await client.training.models.download('model_1234567890abcdef', {
  files: ['pytorch_model.bin', 'config.json'],
  format: 'pytorch',
  compression: 'zip'
});

console.log(`Download URL: ${downloadInfo.downloadUrl}`);

Model Formats

PyTorch Models

pytorch_model.bin: Model weights in PyTorch format
config.json: Model architecture configuration
tokenizer.json: Tokenizer configuration and vocabulary

Hugging Face Compatible

model.safetensors: Safe tensor format for weights
pytorch_model.bin: PyTorch weights (legacy)
config.json: Transformers configuration
tokenizer_config.json: Tokenizer configuration

ONNX Export

model.onnx: ONNX format for cross-platform deployment
config.json: Model metadata
tokenizer.json: Tokenizer information

TensorRT Optimization

model.trt: TensorRT optimized engine
config.json: Optimization parameters
profiling_data.json: Performance profiling results

Error Handling

Common Errors

{
  "error": "MODEL_NOT_FOUND",
  "message": "Model with specified ID does not exist",
  "details": {
    "modelId": "model_invalid_id"
  }
}

{
  "error": "DEPLOYMENT_FAILED",
  "message": "Model deployment failed due to resource constraints",
  "details": {
    "reason": "Insufficient GPU capacity",
    "availableGpuTypes": ["v100", "rtx4090"],
    "requestedGpuType": "h100"
  }
}

{
  "error": "MODEL_TOO_LARGE",
  "message": "Model exceeds size limits for deployment",
  "details": {
    "modelSize": "50GB",
    "maxAllowedSize": "40GB",
    "suggestions": [
      "Use model compression",
      "Deploy on higher-tier GPU instances"
    ]
  }
}

Best Practices

Model Organization

Use consistent naming conventions for models and versions
Tag models with relevant metadata (domain, use case, quality)
Maintain clear version histories with detailed changelogs
Archive outdated models to reduce storage costs

Performance Optimization

Choose appropriate deployment configurations based on latency requirements
Use auto-scaling to handle variable workloads efficiently
Monitor model performance metrics continuously
Implement A/B testing for model comparisons

Security and Compliance

Implement proper access controls for sensitive models
Maintain audit trails for model deployments
Use encryption for model files and communications
Regular security scans for deployed models

Model deployments typically take 3-5 minutes to become active. Larger models may require additional time for optimization and loading.

Archived models can be restored within the retention period. After that, they are permanently deleted and cannot be recovered.

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

List Models

Query Parameters

Response

Get Model Details

Response

Update Model Metadata

Create Model Version

Deploy Model

Response

Download Model

Response

Compare Models

Response

Archive Model

SDK Examples

Python SDK

JavaScript SDK

Model Formats

PyTorch Models

Hugging Face Compatible

ONNX Export

TensorRT Optimization

Error Handling

Common Errors

Best Practices

Model Organization

Performance Optimization

Security and Compliance

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

​List Models

​Query Parameters

​Response

​Get Model Details

​Response

​Update Model Metadata

​Create Model Version

​Deploy Model

​Response

​Download Model

​Response

​Compare Models

​Response

​Archive Model

​SDK Examples

​Python SDK

​JavaScript SDK

​Model Formats

​PyTorch Models

​Hugging Face Compatible

​ONNX Export

​TensorRT Optimization

​Error Handling

​Common Errors

​Best Practices

​Model Organization

​Performance Optimization

​Security and Compliance

List Models

Query Parameters

Response

Get Model Details

Response

Update Model Metadata

Create Model Version

Deploy Model

Response

Download Model

Response

Compare Models

Response

Archive Model

SDK Examples

Python SDK

JavaScript SDK

Model Formats

PyTorch Models

Hugging Face Compatible

ONNX Export

TensorRT Optimization

Error Handling

Common Errors

Best Practices

Model Organization

Performance Optimization

Security and Compliance