Cluster Templates

Overview

Cluster Templates provide pre-configured environments for consistent, repeatable cluster deployments. Templates include Docker images, environment variables, system configurations, and resource specifications that can be used to quickly spin up standardized development, training, or production environments.

Endpoints

List Templates

GET https://api.tensorone.ai/v1/clusters/templates

Get Template Details

GET https://api.tensorone.ai/v1/clusters/templates/{template_id}

Create Template

POST https://api.tensorone.ai/v1/clusters/templates

Update Template

PUT https://api.tensorone.ai/v1/clusters/templates/{template_id}

Delete Template

DELETE https://api.tensorone.ai/v1/clusters/templates/{template_id}

List Templates

Query Parameters

Parameter	Type	Required	Description
`category`	string	No	Filter by category: `ml`, `dev`, `production`, `custom`
`framework`	string	No	Filter by ML framework: `pytorch`, `tensorflow`, `huggingface`, `sklearn`
`gpu_compatible`	boolean	No	Filter GPU-compatible templates
`official`	boolean	No	Filter official TensorOne templates
`project_id`	string	No	Filter by project (for custom templates)
`search`	string	No	Search templates by name or description
`sort_by`	string	No	Sort by: `name`, `created_at`, `usage_count`, `rating`
`include_deprecated`	boolean	No	Include deprecated templates (default: false)

Request Examples

# List all ML templates
curl -X GET "https://api.tensorone.ai/v1/clusters/templates?category=ml&framework=pytorch" \
  -H "Authorization: Bearer YOUR_API_KEY"

# Search for Jupyter templates
curl -X GET "https://api.tensorone.ai/v1/clusters/templates?search=jupyter&gpu_compatible=true" \
  -H "Authorization: Bearer YOUR_API_KEY"

Create Template

Request Body

Parameter	Type	Required	Description
`name`	string	Yes	Template name (unique within project)
`description`	string	Yes	Template description
`category`	string	Yes	Template category
`docker_image`	string	Yes	Base Docker image
`framework`	string	No	ML framework if applicable
`default_configuration`	object	Yes	Default hardware configuration
`environment_variables`	object	No	Default environment variables
`startup_script`	string	No	Script to run on cluster start
`port_mappings`	array	No	Default port configurations
`required_packages`	array	No	Additional packages to install
`gpu_compatible`	boolean	No	Whether template supports GPUs
`min_resources`	object	No	Minimum resource requirements
`max_resources`	object	No	Maximum resource limits
`tags`	array	No	Template tags for organization
`is_public`	boolean	No	Make template publicly available (default: false)

Request Examples

# Create PyTorch training template
curl -X POST "https://api.tensorone.ai/v1/clusters/templates" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "pytorch-distributed-training",
    "description": "PyTorch distributed training environment with NCCL support",
    "category": "ml",
    "framework": "pytorch", 
    "docker_image": "pytorch/pytorch:2.2-cuda12.1-devel",
    "gpu_compatible": true,
    "default_configuration": {
      "gpu_type": "A100",
      "gpu_count": 4,
      "cpu_cores": 32,
      "memory_gb": 256,
      "storage_gb": 1000
    },
    "environment_variables": {
      "NCCL_DEBUG": "INFO",
      "CUDA_VISIBLE_DEVICES": "0,1,2,3",
      "MASTER_ADDR": "localhost",
      "MASTER_PORT": "12355",
      "WORLD_SIZE": "4"
    },
    "startup_script": "#!/bin/bash\necho \"Starting distributed training environment\"\nnvidia-smi\npython -c \"import torch; print(f\\\"PyTorch version: {torch.__version__}\\\")\"",
    "port_mappings": [
      {
        "internal_port": 6006,
        "protocol": "tcp",
        "description": "TensorBoard"
      },
      {
        "internal_port": 8888,
        "protocol": "tcp", 
        "description": "Jupyter Lab"
      }
    ],
    "required_packages": [
      "tensorboard",
      "wandb",
      "transformers",
      "datasets"
    ],
    "min_resources": {
      "gpu_count": 1,
      "memory_gb": 32,
      "storage_gb": 100
    },
    "tags": ["pytorch", "distributed", "training", "gpu"],
    "is_public": false
  }'

Get Template Details

curl -X GET "https://api.tensorone.ai/v1/clusters/templates/tmpl_abc123" \
  -H "Authorization: Bearer YOUR_API_KEY"

Template Response Schema

{
  "success": true,
  "data": {
    "id": "tmpl_abc123",
    "name": "pytorch-distributed-training",
    "description": "PyTorch distributed training environment with NCCL support", 
    "category": "ml",
    "framework": "pytorch",
    "version": "2.1.0",
    "docker_image": "pytorch/pytorch:2.2-cuda12.1-devel",
    "gpu_compatible": true,
    "official": false,
    "default_configuration": {
      "gpu_type": "A100",
      "gpu_count": 4,
      "cpu_cores": 32,
      "memory_gb": 256,
      "storage_gb": 1000,
      "estimated_hourly_cost": 10.00
    },
    "environment_variables": {
      "NCCL_DEBUG": "INFO",
      "CUDA_VISIBLE_DEVICES": "0,1,2,3",
      "MASTER_ADDR": "localhost",
      "MASTER_PORT": "12355",
      "WORLD_SIZE": "4"
    },
    "startup_script": "#!/bin/bash\necho \"Starting distributed training environment\"\nnvidia-smi\npython -c \"import torch; print(f\\\"PyTorch version: {torch.__version__}\\\")\"",
    "port_mappings": [
      {
        "internal_port": 6006,
        "protocol": "tcp",
        "description": "TensorBoard",
        "required": false
      },
      {
        "internal_port": 8888,
        "protocol": "tcp",
        "description": "Jupyter Lab",
        "required": true
      }
    ],
    "required_packages": [
      "tensorboard",
      "wandb", 
      "transformers",
      "datasets"
    ],
    "software_versions": {
      "python": "3.9",
      "pytorch": "2.2.0",
      "cuda": "12.1",
      "cudnn": "8.8"
    },
    "resource_limits": {
      "min_resources": {
        "gpu_count": 1,
        "cpu_cores": 8,
        "memory_gb": 32,
        "storage_gb": 100
      },
      "max_resources": {
        "gpu_count": 8,
        "cpu_cores": 128,
        "memory_gb": 1024,
        "storage_gb": 10000
      }
    },
    "compatibility": {
      "gpu_types": ["A100", "H100", "RTX4090", "V100"],
      "regions": ["us-east-1", "us-west-2", "eu-west-1"],
      "min_driver_version": "520.61.05"
    },
    "usage_statistics": {
      "total_deployments": 1247,
      "active_clusters": 34,
      "average_rating": 4.7,
      "success_rate": 98.3
    },
    "tags": ["pytorch", "distributed", "training", "gpu"],
    "created_by": {
      "user_id": "user_456",
      "username": "ml_engineer",
      "organization": "TensorOne"
    },
    "created_at": "2024-01-10T09:00:00Z",
    "updated_at": "2024-01-14T15:30:00Z",
    "is_public": false,
    "is_deprecated": false
  }
}

Use Cases

Standardized Development Environment

Create consistent development environments across teams.

def create_team_dev_template(team_name, requirements):
    """Create standardized development template for a team"""
    
    template_config = {
        "name": f"{team_name}-dev-environment",
        "description": f"Standardized development environment for {team_name} team",
        "category": "dev",
        "docker_image": requirements.get("base_image", "ubuntu:22.04"),
        "gpu_compatible": requirements.get("needs_gpu", False),
        "default_configuration": {
            "gpu_type": requirements.get("gpu_type", "RTX4090"),
            "gpu_count": 1 if requirements.get("needs_gpu") else 0,
            "cpu_cores": requirements.get("cpu_cores", 8),
            "memory_gb": requirements.get("memory_gb", 32),
            "storage_gb": requirements.get("storage_gb", 200)
        },
        "environment_variables": {
            "TEAM": team_name,
            "ENVIRONMENT": "development",
            **requirements.get("env_vars", {})
        },
        "startup_script": f"""#!/bin/bash
echo "Setting up {team_name} development environment"

# Install team-specific tools
{requirements.get("setup_script", "")}

# Set up workspace
mkdir -p /workspace/{team_name}
cd /workspace/{team_name}

echo "Environment ready!"
""",
        "port_mappings": requirements.get("ports", [
            {"internal_port": 8888, "protocol": "tcp", "description": "Jupyter Lab"},
            {"internal_port": 8080, "protocol": "tcp", "description": "Development Server"}
        ]),
        "required_packages": requirements.get("packages", []),
        "tags": [team_name, "development", "team-standard"],
        "is_public": False
    }
    
    response = requests.post(
        "https://api.tensorone.ai/v1/clusters/templates",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=template_config
    )
    
    return response.json()

# Create data science team template
ds_requirements = {
    "base_image": "jupyter/tensorflow-notebook:latest",
    "needs_gpu": True,
    "gpu_type": "RTX4090",
    "cpu_cores": 16,
    "memory_gb": 64,
    "storage_gb": 500,
    "env_vars": {
        "JUPYTER_ENABLE_LAB": "yes",
        "DATA_PATH": "/workspace/data"
    },
    "setup_script": """
pip install --upgrade pip
pip install pandas numpy scikit-learn matplotlib seaborn plotly
pip install wandb mlflow optuna
pip install tensorflow tensorflow-gpu torch torchvision
""",
    "packages": ["pandas", "numpy", "scikit-learn", "wandb", "mlflow"]
}

ds_template = create_team_dev_template("data-science", ds_requirements)

Production Inference Template

Create optimized templates for production model serving.

async function createInferenceTemplate(modelInfo) {
  const template = {
    name: `${modelInfo.name}-inference`,
    description: `Production inference template for ${modelInfo.name} model`,
    category: 'production',
    framework: modelInfo.framework,
    docker_image: modelInfo.dockerImage,
    gpu_compatible: true,
    default_configuration: {
      gpu_type: modelInfo.gpuType || 'T4',
      gpu_count: modelInfo.gpuCount || 1,
      cpu_cores: modelInfo.cpuCores || 8,
      memory_gb: modelInfo.memoryGb || 32,
      storage_gb: modelInfo.storageGb || 100
    },
    environment_variables: {
      MODEL_NAME: modelInfo.name,
      MODEL_VERSION: modelInfo.version,
      BATCH_SIZE: modelInfo.batchSize?.toString() || '8',
      MAX_SEQUENCE_LENGTH: modelInfo.maxSeqLength?.toString() || '512',
      INFERENCE_MODE: 'production',
      ...modelInfo.environmentVars
    },
    startup_script: `#!/bin/bash
echo "Starting ${modelInfo.name} inference server"

# Download model if needed
if [ ! -d "/workspace/models/${modelInfo.name}" ]; then
    echo "Downloading model..."
    ${modelInfo.downloadScript || 'echo "No download script provided"'}
fi

# Start inference server
echo "Starting inference server..."
${modelInfo.startCommand || 'python app.py'}
`,
    port_mappings: [
      {
        internal_port: 8000,
        protocol: 'tcp',
        description: 'Inference API',
        required: true
      },
      {
        internal_port: 8080,
        protocol: 'tcp', 
        description: 'Health Check',
        required: true
      }
    ],
    required_packages: modelInfo.requiredPackages || [],
    min_resources: {
      cpu_cores: 4,
      memory_gb: 16,
      storage_gb: 50
    },
    max_resources: {
      gpu_count: 4,
      cpu_cores: 32,
      memory_gb: 128,
      storage_gb: 1000
    },
    tags: [modelInfo.name, 'inference', 'production', modelInfo.framework],
    is_public: modelInfo.isPublic || false
  };
  
  const response = await fetch('https://api.tensorone.ai/v1/clusters/templates', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify(template)
  });
  
  return await response.json();
}

// Create BERT inference template
const bertInference = await createInferenceTemplate({
  name: 'bert-base-uncased',
  framework: 'huggingface',
  dockerImage: 'huggingface/transformers-pytorch-gpu:latest',
  gpuType: 'T4',
  gpuCount: 1,
  cpuCores: 8,
  memoryGb: 32,
  storageGb: 100,
  batchSize: 16,
  maxSeqLength: 512,
  environmentVars: {
    HF_MODEL_ID: 'bert-base-uncased',
    TOKENIZER_PARALLELISM: 'false'
  },
  downloadScript: 'huggingface-cli download bert-base-uncased --local-dir /workspace/models/bert-base-uncased',
  startCommand: 'python inference_server.py --model-path /workspace/models/bert-base-uncased --port 8000',
  requiredPackages: ['transformers', 'torch', 'fastapi', 'uvicorn']
});

Template Versioning and Updates

Manage template versions and updates.

def update_template_version(template_id, updates, version_notes):
    """Update template with version control"""
    
    # Get current template
    current_template = requests.get(
        f"https://api.tensorone.ai/v1/clusters/templates/{template_id}",
        headers={"Authorization": f"Bearer {API_KEY}"}
    ).json()
    
    if not current_template["success"]:
        return current_template
    
    # Increment version
    current_version = current_template["data"]["version"]
    major, minor, patch = map(int, current_version.split('.'))
    
    if updates.get("breaking_changes"):
        new_version = f"{major + 1}.0.0"
    elif updates.get("new_features"):
        new_version = f"{major}.{minor + 1}.0"
    else:
        new_version = f"{major}.{minor}.{patch + 1}"
    
    # Prepare update payload
    update_payload = {
        "version": new_version,
        "version_notes": version_notes,
        **updates
    }
    
    # Remove None values
    update_payload = {k: v for k, v in update_payload.items() if v is not None}
    
    response = requests.put(
        f"https://api.tensorone.ai/v1/clusters/templates/{template_id}",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json=update_payload
    )
    
    result = response.json()
    
    if result["success"]:
        print(f"Template updated to version {new_version}")
        print(f"Previous version: {current_version}")
        print(f"Update notes: {version_notes}")
    
    return result

# Update PyTorch template with new packages
pytorch_updates = {
    "docker_image": "pytorch/pytorch:2.3-cuda12.1-devel",
    "required_packages": [
        "tensorboard", "wandb", "transformers", "datasets", 
        "accelerate", "deepspeed"  # New packages
    ],
    "software_versions": {
        "python": "3.10",
        "pytorch": "2.3.0", 
        "cuda": "12.1",
        "cudnn": "8.9"
    },
    "new_features": True
}

update_result = update_template_version(
    "tmpl_pytorch_distributed",
    pytorch_updates,
    "Updated to PyTorch 2.3 with DeepSpeed and Accelerate support"
)

Template Management Best Practices

Template Organization

def organize_templates_by_use_case():
    """Organize templates by use case and maintain consistency"""
    
    use_case_templates = {
        "ml_training": {
            "pytorch_distributed": "tmpl_pytorch_dist_123",
            "tensorflow_multi_gpu": "tmpl_tf_multi_456", 
            "huggingface_fine_tune": "tmpl_hf_tune_789"
        },
        "development": {
            "jupyter_data_science": "tmpl_jupyter_ds_abc",
            "vscode_remote": "tmpl_vscode_def",
            "rstudio_gpu": "tmpl_rstudio_ghi"
        },
        "production_inference": {
            "fastapi_model_server": "tmpl_fastapi_jkl",
            "triton_inference": "tmpl_triton_mno",
            "tensorrt_optimized": "tmpl_trt_pqr"
        }
    }
    
    # Validate all templates exist and are up to date
    for use_case, templates in use_case_templates.items():
        print(f"\n{use_case.upper()} Templates:")
        for name, template_id in templates.items():
            template = get_template_details(template_id)
            if template["success"]:
                data = template["data"]
                print(f"  ✅ {name}: v{data['version']} ({data['usage_statistics']['total_deployments']} deployments)")
            else:
                print(f"  ❌ {name}: Template not found or error")
    
    return use_case_templates

Error Handling

{
  "success": false,
  "error": {
    "code": "TEMPLATE_NOT_FOUND",
    "message": "Template with ID 'tmpl_invalid' not found",
    "details": {
      "template_id": "tmpl_invalid",
      "suggestion": "Check template ID or search available templates"
    }
  }
}

Security Considerations

Docker Image Security: Only use trusted Docker images from verified sources
Environment Variables: Never store secrets in templates; use the secrets management system
Public Templates: Carefully review public templates before use
Access Control: Restrict template modification permissions appropriately

Best Practices

Version Control: Always increment versions for template updates
Documentation: Include comprehensive descriptions and usage examples
Testing: Test templates thoroughly before making them available to teams
Resource Optimization: Set appropriate resource limits to prevent overprovisioning
Standardization: Use templates to enforce consistent environments across projects
Maintenance: Regularly update templates with security patches and dependency updates

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

Overview

Endpoints

List Templates

Get Template Details

Create Template

Update Template

Delete Template

List Templates

Query Parameters

Request Examples

Create Template

Request Body

Request Examples

Get Template Details

Template Response Schema

Use Cases

Standardized Development Environment

Production Inference Template

Template Versioning and Updates

Template Management Best Practices

Template Organization

Error Handling

Security Considerations

Best Practices

Getting Started

Account Management

GPU Clusters (VPS)

Serverless Endpoints

Managed Training

AI Services

Payment & Billing

Monitoring & Analytics

​Overview

​Endpoints

​List Templates

​Get Template Details

​Create Template

​Update Template

​Delete Template

​List Templates

​Query Parameters

​Request Examples

​Create Template

​Request Body

​Request Examples

​Get Template Details

​Template Response Schema

​Use Cases

​Standardized Development Environment

​Production Inference Template

​Template Versioning and Updates

​Template Management Best Practices

​Template Organization

​Error Handling

​Security Considerations

​Best Practices

Overview

Endpoints

List Templates

Get Template Details

Create Template

Update Template

Delete Template

List Templates

Query Parameters

Request Examples

Create Template

Request Body

Request Examples

Get Template Details

Template Response Schema

Use Cases

Standardized Development Environment

Production Inference Template

Template Versioning and Updates

Template Management Best Practices

Template Organization

Error Handling

Security Considerations

Best Practices