onnxruntime-genai-bnlang
onnxruntime-genai-এর জন্য Bnlang বাইন্ডিং — Microsoft-এর LLM text generation-এর জন্য বিশেষায়িত fast runtime (Qwen, Llama, Phi, Mistral, Gemma, …)। GenAI-এর C/C++ API wrap করে; পুরো generation loop (KV cache, sampling, beam search) C++-এ native চলে।
onnxruntime-bnlang (general-purpose ORT বাইন্ডিং)-এর পাশাপাশি থাকে, উপরে নয়। transformers-bnlang যখন engine: "onnxruntime-genai" পায় তখন এটাই ব্যবহার করে।
এটি একটি বেসরকারি তৃতীয় পক্ষীয় বাইন্ডিং। Microsoft-এর সাথে সম্পর্কিত নয়।
Read this in English — README.en.md
দ্রুত শুরু
import "onnxruntime-genai-bnlang" as ortg;
ধরি জেন = ortg.জেনারেটর.খুলুন("./models/qwen-0.5b-dml-int4");
ধরি ফল = জেন.তৈরি_করুন("Hello", {
max_new_tokens: 100,
temperature: 0.7,
top_k: 50,
do_sample: true
});
লিখুন(ফল["generated_text"]);
জেন.বন্ধ_করুন();ধরি হলো bnl-এর var-এর বাংলা; লিখুন হলো print-এর বাংলা — দুটোই bnl core-এর অংশ।
অথবা transformers-bnlang pipeline-এর মাধ্যমে (সুপারিশকৃত — engine বদলেও একই public API):
import "transformers-bnlang" as bt;
ধরি জেন = bt.পাইপলাইন("text-generation", "./models/qwen-0.5b-dml-int4",
{ engine: "onnxruntime-genai" });
লিখুন(জেন.চালান("Hello", { max_new_tokens: 100 })["generated_text"]);
জেন.বন্ধ_করুন();Export-গুলো
| নাম | ধরন |
|---|---|
সংস্করণ |
string |
জেনারেটর.খুলুন(model_dir) |
factory → generator object |
জেন.তৈরি_করুন(text, opts) |
function → result map |
জেন.বন্ধ_করুন() |
function |
কেন আলাদা package
| বিষয় | onnxruntime-bnlang | onnxruntime-genai-bnlang |
|---|---|---|
| উদ্দেশ্য | যেকোনো ONNX model | LLM decoder-only generation |
| Generation loop | নিজে লিখতে হয় | built-in, native |
| KV cache management | manually | built-in |
| Sampling | manually | built-in (greedy/top-k/top-p/temperature/beam) |
| LLM-এর জন্য speed | আপনার loop-এর উপর নির্ভর | best-in-class |
Python ecosystem-এর একই layering: general inference-এর জন্য onnxruntime, specialized LLM fast-path-এর জন্য onnxruntime-genai।
লোকাল বিল্ড
bnl script/install.bnl # GenAI + ORT NuGet package দুটো নামিয়ে merge করে
.\build.ps1 # configure + build (এই host triple-এর জন্য)./build.shinstall script-এ tar দরকার (Windows 10+ এটা সাথেই দেয়; Linux/macOS-এ সবসময়ই আছে)।
Model requirements
GenAI-এর জন্য genai_config.json আছে এমন model directory চাই। যেসব HF export এটা সঙ্গে আনে:
microsoft/Phi-3-*-onnx,microsoft/Phi-3.5-*-onnxmicrosoft/Llama-3-*-onnxxiaoyao9184/Qwen2.5-*-onnx-genai(community)- HF Hub-এ
tags:onnxruntime-genaiখুঁজুন
GenAI-ready নয় এমন export-এর জন্য onnxruntime_genai.models.builder চালিয়ে missing file গুলো তৈরি করতে হবে (Python + PyTorch দরকার, এক-বারের কাজ)।
অবস্থা
- ✅ যেকোনো GenAI-ready model dir থেকে load + generate
- ✅ Sampling options:
do_sample,temperature,top_k,top_p,max_new_tokens - ⏳ Streaming output (token-by-token callback) — straightforward addition
- ⏳ Adapter / LoRA — GenAI সমর্থন করে; এখনও expose করা হয়নি
- ⏳ Multi-modal (image/audio) — শুধু Phi-3-Vision-এর মতো multi-modal LLM-এর ক্ষেত্রে দরকার
- ⏳
install.bnl-এ Linux/macOS deps (এখন Windows-x64 DML করা আছে)
লেআউট
bnl.json manifest (main + targets)
CMakeLists.txt build config
CMakePresets.json
build.ps1 build.sh convenience wrapper
lib/
index.bnl public API (ইংরেজি + বাংলা re-export)
src/ C++ source (publish-এ আসে না)
bnl/plugin.h C ABI contract
main.cpp bnl_load entry
generator.{h,cpp} GenAI C++ API wrap (model, tokenizer, generator loop)
deps/<triple>/ downloaded prebuilt (gitignored)
build/<triple>/ cmake output (gitignored)
script/
install.bnl Microsoft.ML.OnnxRuntimeGenAI + .OnnxRuntime NuGet নামায় + merge করে
install-metadata.bnlলাইসেন্স
MIT. Microsoft GenAI / ORT prebuilt গুলো Microsoft-এর MIT লাইসেন্সে। তৃতীয় পক্ষীয় attribution NOTICES.md-এ।
onnxruntime-genai-bnlang
Bnlang binding for onnxruntime-genai — Microsoft's specialized fast-path runtime for LLM text generation (Qwen, Llama, Phi, Mistral, Gemma, …). Wraps the GenAI C/C++ API; runs the full generation loop (KV cache, sampling, beam search) natively in C++.
Sits alongside onnxruntime-bnlang (the general-purpose ORT binding), not on top of it. Used by transformers-bnlang when you pass engine: "onnxruntime-genai".
Unofficial third-party binding. Not affiliated with Microsoft.
এই README-এর বাংলা সংস্করণ — README.md
Quick start
import "onnxruntime-genai-bnlang" as ortg;
var gen = ortg.Generator.open("./models/qwen-0.5b-dml-int4");
var out = gen.generate("Hello", {
max_new_tokens: 100,
temperature: 0.7,
top_k: 50,
do_sample: true
});
print(out["generated_text"]);
gen.close();Or through the transformers-bnlang pipeline (recommended — same public API across engines):
import "transformers-bnlang" as bt;
var gen = bt.pipeline("text-generation", "./models/qwen-0.5b-dml-int4",
{ engine: "onnxruntime-genai" });
print(gen.run("Hello", { max_new_tokens: 100 })["generated_text"]);
gen.close();Exports
| Name | Kind |
|---|---|
version |
string |
Generator.open(model_dir) |
factory → generator object |
gen.generate(text, opts) |
function → result map |
gen.close() |
function |
Why a separate package
| Concern | onnxruntime-bnlang | onnxruntime-genai-bnlang |
|---|---|---|
| Designed for | any ONNX model | LLM decoder-only generation |
| Generation loop | you / your wrapper write it | built-in, native |
| KV cache management | manual | built-in |
| Sampling | manual | built-in (greedy/top-k/top-p/temperature/beam) |
| Speed for LLMs | depends on your loop | best-in-class |
Same Microsoft layering as the Python ecosystem: onnxruntime for general inference, onnxruntime-genai for specialized LLM fast-paths.
Build (local)
bnl script/install.bnl # downloads + merges GenAI + ORT NuGet packages
.\build.ps1 # configure + build for the current triple./build.shThe install script needs tar (Windows 10+ ships it; Linux/macOS always).
Model requirements
GenAI requires a model directory with a genai_config.json. HF model exports that ship this:
microsoft/Phi-3-*-onnx,microsoft/Phi-3.5-*-onnxmicrosoft/Llama-3-*-onnxxiaoyao9184/Qwen2.5-*-onnx-genai(community)- search HF Hub for
tags:onnxruntime-genai
For non-GenAI exports, run onnxruntime_genai.models.builder to generate the missing files (needs Python + PyTorch, a one-off step).
Status
- ✅ Loading and generating from any GenAI-ready model dir
- ✅ Sampling options:
do_sample,temperature,top_k,top_p,max_new_tokens - ⏳ Streaming output (token-by-token callback) — straightforward add via
OgaGenerator::GenerateNextTokenloop on bnl side - ⏳ Adapters / LoRA — supported by GenAI; not yet exposed
- ⏳ Multi-modal (images/audio) — only relevant for multi-modal LLMs (Phi-3-Vision etc.)
- ⏳ Linux/macOS deps in
install.bnl(Windows-x64 DML wired today)
Layout
bnl.json manifest (main + targets)
CMakeLists.txt build config
CMakePresets.json
build.ps1 build.sh convenience wrappers
lib/
index.bnl public API (English + Bangla re-exports)
src/ C++ binding (excluded from published tarball)
bnl/plugin.h C ABI contract
main.cpp bnl_load entry
generator.{h,cpp} GenAI C++ API wrap (model, tokenizer, generator loop)
deps/<triple>/ downloaded prebuilts (gitignored)
build/<triple>/ cmake output (gitignored)
script/
install.bnl downloads + merges Microsoft.ML.OnnxRuntimeGenAI + .OnnxRuntime
install-metadata.bnlLicense
MIT. Microsoft GenAI / ORT prebuilts are MIT-licensed by Microsoft. See NOTICES.md for third-party attribution.
MIT License
Copyright (c) 2026 Bnlang | Mamun
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Third-party notices — onnxruntime-genai-bnlang
This package's binary plugin dynamically links to and consumes prebuilt
binaries from the projects below. They are downloaded at install time by
script/install.bnl from their official NuGet feeds and remain governed
by their original licenses.
ONNX Runtime GenAI
- Project: Microsoft onnxruntime-genai
- Upstream: https://github.com/microsoft/onnxruntime-genai
- License: MIT
- Copyright: Copyright (c) Microsoft Corporation
The specialized LLM generation runtime. We download Microsoft's prebuilt
onnxruntime-genai.dll (from the
Microsoft.ML.OnnxRuntimeGenAI.DirectML NuGet package) at install time
and dynamically link to it from our plugin. We do not redistribute the
GenAI binaries; users fetch them from Microsoft's official packages.
The full MIT license text is available at https://github.com/microsoft/onnxruntime-genai/blob/main/LICENSE
ONNX Runtime
- Project: Microsoft ONNX Runtime
- Upstream: https://github.com/microsoft/onnxruntime
- License: MIT
- Copyright: Copyright (c) Microsoft Corporation
The underlying inference engine that GenAI uses. We fetch the matching
ORT DLL (from the Microsoft.ML.OnnxRuntime.DirectML NuGet package) and
ship it alongside GenAI. Same MIT terms apply.
The full MIT license text is available at https://github.com/microsoft/onnxruntime/blob/main/LICENSE
DirectML
- Project: DirectML
- Upstream: part of Microsoft Windows (since Windows 10 1903)
- License: Microsoft Software License Terms (Windows component)
DirectML.dll is shipped with Windows itself and is not redistributed by
this package.