List Models
Retrieve a list of trained models for your account.Query Parameters
type: Filter by model type (llm,vision,multimodal,custom)status: Filter by status (training,ready,deployed,archived)trainingJobId: Filter by originating training joblimit: Number of models to return (1-100, default: 50)offset: Number of models to skip for paginationsort: Sort order (created_at,updated_at,name,size)order: Sort direction (asc,desc, default:desc)
Response
Get Model Details
Retrieve detailed information about a specific model.Response
Update Model Metadata
Update model information, tags, and metadata.Create Model Version
Create a new version of an existing model from a training job.Deploy Model
Deploy a model to create a new inference endpoint.Response
Download Model
Download model files for local deployment or analysis.Response
Compare Models
Compare performance metrics between different models or versions.Response
Archive Model
Archive a model to reduce storage costs while maintaining metadata.SDK Examples
Python SDK
JavaScript SDK
Model Formats
PyTorch Models
- pytorch_model.bin: Model weights in PyTorch format
- config.json: Model architecture configuration
- tokenizer.json: Tokenizer configuration and vocabulary
Hugging Face Compatible
- model.safetensors: Safe tensor format for weights
- pytorch_model.bin: PyTorch weights (legacy)
- config.json: Transformers configuration
- tokenizer_config.json: Tokenizer configuration
ONNX Export
- model.onnx: ONNX format for cross-platform deployment
- config.json: Model metadata
- tokenizer.json: Tokenizer information
TensorRT Optimization
- model.trt: TensorRT optimized engine
- config.json: Optimization parameters
- profiling_data.json: Performance profiling results
Error Handling
Common Errors
Best Practices
Model Organization
- Use consistent naming conventions for models and versions
- Tag models with relevant metadata (domain, use case, quality)
- Maintain clear version histories with detailed changelogs
- Archive outdated models to reduce storage costs
Performance Optimization
- Choose appropriate deployment configurations based on latency requirements
- Use auto-scaling to handle variable workloads efficiently
- Monitor model performance metrics continuously
- Implement A/B testing for model comparisons
Security and Compliance
- Implement proper access controls for sensitive models
- Maintain audit trails for model deployments
- Use encryption for model files and communications
- Regular security scans for deployed models
Model deployments typically take 3-5 minutes to become active. Larger models may require additional time for optimization and loading.

