Les 16 commandements pour économiser des tokens et rendre l'IA plus performante

Renaud Dékode·Apr 6, 2026

Audio Summary

AI Summary

This discussion focuses on 16 commandments to optimize AI performance and save tokens, addressing concerns about resource scarcity, environmental impact, and cost in AI usage. The current landscape highlights limitations in cloud subscriptions, with providers like Anthropic restricting usage due to high demand and inference needs. This underscores the importance of token efficiency for ecological reasons, financial savings, and overall operational effectiveness. The first commandment is to convert files to .MD format instead of PDFs or Word documents. PDFs and Word documents contain significant hidden formatting and metadata that consume many unnecessary tokens, even if they appear to be plain text. Markdown (.MD) files are pure, flat text with minimal formatting, making them efficient and easily interpreted by Large Language Models (LLMs). Online converters or even AI itself can be used to perform these conversions. For documents used repeatedly, it’s more token-efficient to convert them once to MD format and save them, rather than re-uploading the original PDF/DOC each time. Secondly, avoid screenshots and scanned documents. Copy-pasting error messages or text directly is far more efficient than sending images, which transmit a large amount of pixel data, consuming excessive tokens and leading to potential misinterpretation. Scanned PDFs, which are essentially images of text, are particularly inefficient and should be avoided, especially in corporate settings where old processes might still rely on them. The third point is to avoid long sessions and excessive conversation sprawl. Each interaction with an AI transmits not only the new input but also the entire conversation history and system prompt. Longer conversations exponentially increase the token load, leading to higher costs and reduced AI effectiveness as it struggles to maintain context over many turns. It's recommended to reset conversations frequently, perhaps every 10 exchanges, by summarizing the previous discussion into a new MD file and starting a fresh conversation with that summary as context. The fourth commandment, closely related, is to reset conversations frequently. This practice, by starting a new conversation with a concise summary of previous points, prevents the exponential growth of token usage and improves AI accuracy by providing a fresh, relevant context. Fifth, separate exploration and execution. Instead of conducting all stages of a task (research, brainstorming, synthesis) within a single long conversation, break them into dedicated sessions. For example, have one session for information gathering, another for brainstorming, and a third for synthesizing the final output. This allows for more targeted interactions and avoids overloading a single conversation with diverse objectives. Sixth, choose the appropriate AI model for the task. Not all tasks require the most powerful or "smartest" model like Claude Opus. For deep reflection or complex problem-solving, Opus might be ideal. However, for execution tasks, data retrieval, or simple formatting, a more cost-effective and faster model like Claude Sonnet or even Claude Haiku (which is cheaper and smaller) might be better. Structuring conversations based on task type (e.g., Opus for brainstorming, Sonnet for source searching, Haiku for final formatting) can significantly reduce token consumption and improve efficiency. Eighth, leverage multi-agent logic, especially with tools like OpenClow. If using multi-agent systems, configure them to use smaller, local, or open-source models (like Llama 3.5 or Gemma 4) for less complex tasks, while reserving powerful cloud-based models for the "brain" or more demanding parts. This can drastically reduce token costs and reliance on remote data centers, shifting processing to local machines and saving money. Ninth, manage plugins and connectors carefully. Many AI platforms allow connections to services like Gmail, Google Drive, or Canva. These connections, even when not actively used, add to the system prompt and are transmitted with every conversation, increasing token consumption. The recommendation is to adopt a "zero-connector" policy, activating specific connectors only when needed for a particular task and deactivating them afterward. Tenth, lighten system prompts. Older AI models often required extremely detailed and repetitive prompts to ensure compliance. However, modern LLMs are more intelligent and capable of autonomous thinking. Overly verbose prompts are now counterproductive, consuming unnecessary tokens and potentially hindering performance. Prompts should be concise and direct. This is especially true for real-time vocal assistants, where recent AI versions perform better with simpler instructions. Eleventh, avoid using LLMs for web searches. LLMs like ChatGPT, Gemini, or Claude are knowledge models, not search engines. Their knowledge base is limited to the data they were trained on, which has a cutoff date. For real-time web searches, dedicated tools like Perplexity are designed to index the web and then use an LLM to synthesize the results, which is far more token-efficient. Using an LLM for web search involves it generating tokens to "scrape" web content, which is inefficient and costly. Twelfth, externalize references using RAG (Retrieval Augmented Generation) or vectorized databases. Instead of embedding large documents directly into prompts, which consumes many tokens, store them in a reference system. The AI can then be instructed to query this database for specific information as needed, retrieving only relevant snippets (tokens) at a much lower cost. Project modes in platforms like ChatGPT offer a basic form of this, allowing documents to be stored as a reference base without being transmitted with every prompt. Thirteenth, pre-process context. For professional users, particularly those with workflows or custom agents, having pre-defined system prompts or contexts for different tasks is crucial. Instead of using a single, generic prompt, tailor the system prompt to each specific task, making it lighter, shorter, and more effective. This ensures the AI receives only the necessary context, reducing token usage. Fourteenth, utilize caching for batch automations. When performing repetitive tasks in batches (e.g., summarizing many lines in a spreadsheet), a significant portion of the prompt (instructions, format) remains constant. AI platforms like Claude, Gemini, and OpenAI offer caching mechanisms that allow these constant parts of the prompt to be stored in memory after the first execution. Subsequent iterations then only incur token costs for the variable parts, leading to massive cost savings (often 10x cheaper) for batch processing. Fifteenth, limit context to the minimum vital for multi-agent systems. When designing agents and sub-agents (e.g., in Cowork or OpenClow), specialize them by task type rather than by broad roles. For example, instead of a "marketing agent," create "report summarizer," "statistics analyzer," or "content creator" sub-agents. Each specialized sub-agent can then be assigned the most appropriate (and often cheaper) AI model and a minimal, task-specific system prompt, drastically reducing token consumption and improving efficiency. Finally, the sixteenth commandment is to measure token consumption. Regularly monitor token usage and costs through platform dashboards (e.g., OpenAI or Anthropic platforms, not just ChatGPT or Claude interfaces). For custom-built tools or workflows, integrate token tracking directly. This allows users to identify high-consumption areas, understand the cost implications of different models and practices, and optimize their AI usage effectively. By adhering to these principles, users can significantly reduce token consumption, thereby saving money, minimizing environmental impact, and enhancing the overall performance and reliability of AI interactions. The speaker advocates for widespread adoption of these practices, particularly in professional settings, and encourages exploring open-source and local AI solutions as a sustainable European alternative to current dominant models.