FAQ

Frequently Asked Questions

Everything you need to know about sovereign AI, on-premises deployment, and working with Tilkal.

What is sovereign AI?

Sovereign AI means running AI systems on infrastructure you own and control. Your models, your data, your servers — no external API calls, no data leaving your perimeter. It combines open-source models like Llama 3, Mistral, and Qwen with on-premises or private cloud deployment to deliver GPT-class capabilities without third-party data exposure.

How does on-premises AI compare to cloud APIs in cost?

According to Lenovo's 2026 Total Cost of Ownership analysis, self-hosted AI inference can be up to 18 times cheaper than cloud API equivalents over three years. The break-even point typically arrives between 3 and 6 months. After that, the marginal cost per inference approaches zero — you only pay for electricity and maintenance.

What open-source models do you deploy?

We are model-agnostic and deploy any open-source model that fits your use case. Common choices include Meta's Llama 3 series, Mistral and Mixtral, Qwen 2.5, DeepSeek, Falcon, and specialized models for code generation, computer vision, or domain-specific tasks. We evaluate and benchmark multiple models against your specific requirements before recommending a deployment.

How long does a typical deployment take?

A typical end-to-end deployment takes 8 to 15 weeks: discovery (1–2 weeks), design (2–3 weeks), build (4–8 weeks), and deployment (1–2 weeks). Simpler use cases like a single RAG system or chatbot can be production-ready in as little as 4–6 weeks.

What compliance frameworks do you support?

We design deployments to meet GDPR, HIPAA, SOC 2, ISO 27001, and EU AI Act requirements. Because sovereign AI keeps all data processing on your infrastructure, compliance is significantly simpler — you control the entire data lifecycle with no third-party data processing agreements required for AI inference.

What hardware do I need?

GPU hardware is required for AI inference at production scale. A single NVIDIA A100 or H100 server ($15,000–$40,000) handles most enterprise workloads, achieving over 12,500 tokens per second. For smaller deployments or proof-of-concept work, consumer GPUs or even CPU-only inference with quantized models can work. We assess your throughput requirements and recommend the most cost-effective configuration.

How much does it cost?

Costs depend on deployment complexity. Hardware typically ranges from $15,000 to $80,000 for a production-ready GPU server. Professional deployment (model optimization, API layer, security hardening, integration) is a one-time investment. Ongoing operations run 10–20% of hardware cost annually. Most organizations break even versus cloud API costs within 3–6 months.

Can you fine-tune models on my data?

Yes. Fine-tuning adapts a base model to your domain's language, terminology, and reasoning patterns. A model fine-tuned on your data typically outperforms a generic model by 20–40% on domain-specific tasks. We use efficient techniques like LoRA and QLoRA that require minimal hardware and training time while preserving the base model's general capabilities.

What is RAG and how does it work?

Retrieval-Augmented Generation (RAG) connects a language model to your organization's actual data — documents, databases, knowledge bases — so it answers questions based on real, current information rather than its training data. The system retrieves relevant documents, adds them to the model's context, and generates grounded, verifiable responses. RAG dramatically reduces hallucinations and keeps answers up-to-date without retraining.

Do you provide ongoing support after deployment?

Yes. Our Optimize phase includes performance monitoring dashboards, regular model evaluation, drift detection and retraining, and quarterly business reviews. AI systems improve over time when managed correctly — we ensure your deployment stays at peak performance as your data and requirements evolve.

Can you deploy in air-gapped environments?

Yes. We specialize in sovereign deployments including fully air-gapped environments with zero internet connectivity. This includes offline package management, local container registries, physical transfer of model weights, and offline model governance. Air-gapped deployments are common for defense, government, and critical infrastructure clients.

What is the EU AI Act and how does it affect my AI deployment?

The EU AI Act is the world's first comprehensive AI-specific regulation, with high-risk system enforcement beginning August 2, 2026. It classifies AI systems by risk level and imposes requirements for risk management, data governance, transparency, and human oversight. Penalties reach up to EUR 35 million or 7% of global revenue. Sovereign AI simplifies compliance because you control the entire data and model lifecycle.

How do you handle data security during the engagement?

We treat all client data as confidential. During engagements, we work within your infrastructure — data never leaves your environment. We follow security best practices including network isolation, encryption at rest and in transit, role-based access controls, and comprehensive audit logging. Our deployment process includes security hardening and penetration testing.

Have a question that is not answered here?

Get in Touch