onnxruntime-genai-bnlang

onnxruntime-genai-এর জন্য Bnlang বাইন্ডিং — Microsoft-এর LLM text generation-এর জন্য বিশেষায়িত fast runtime (Qwen, Llama, Phi, Mistral, Gemma, …)। GenAI-এর C/C++ API wrap করে; পুরো generation loop (KV cache, sampling, beam search) C++-এ native চলে।

onnxruntime-bnlang (general-purpose ORT বাইন্ডিং)-এর পাশাপাশি থাকে, উপরে নয়। transformers-bnlang যখন engine: "onnxruntime-genai" পায় তখন এটাই ব্যবহার করে।

এটি একটি বেসরকারি তৃতীয় পক্ষীয় বাইন্ডিং। Microsoft-এর সাথে সম্পর্কিত নয়।

Read this in English — README.en.md

দ্রুত শুরু

import "onnxruntime-genai-bnlang" as ortg;

ধরি জেন = ortg.জেনারেটর.খুলুন("./models/qwen-0.5b-dml-int4");
ধরি ফল  = জেন.তৈরি_করুন("Hello", {
    max_new_tokens: 100,
    temperature:    0.7,
    top_k:          50,
    do_sample:      true
});
লিখুন(ফল["generated_text"]);
জেন.বন্ধ_করুন();

ধরি হলো bnl-এর var-এর বাংলা; লিখুন হলো print-এর বাংলা — দুটোই bnl core-এর অংশ।

অথবা transformers-bnlang pipeline-এর মাধ্যমে (সুপারিশকৃত — engine বদলেও একই public API):

import "transformers-bnlang" as bt;

ধরি জেন = bt.পাইপলাইন("text-generation", "./models/qwen-0.5b-dml-int4",
                       { engine: "onnxruntime-genai" });
লিখুন(জেন.চালান("Hello", { max_new_tokens: 100 })["generated_text"]);
জেন.বন্ধ_করুন();

Export-গুলো

নাম	ধরন
`সংস্করণ`	string
`জেনারেটর.খুলুন(model_dir)`	factory → generator object
`জেন.তৈরি_করুন(text, opts)`	function → result map
`জেন.বন্ধ_করুন()`	function

কেন আলাদা package

বিষয়	onnxruntime-bnlang	onnxruntime-genai-bnlang
উদ্দেশ্য	যেকোনো ONNX model	LLM decoder-only generation
Generation loop	নিজে লিখতে হয়	built-in, native
KV cache management	manually	built-in
Sampling	manually	built-in (greedy/top-k/top-p/temperature/beam)
LLM-এর জন্য speed	আপনার loop-এর উপর নির্ভর	best-in-class

Python ecosystem-এর একই layering: general inference-এর জন্য onnxruntime, specialized LLM fast-path-এর জন্য onnxruntime-genai।

লোকাল বিল্ড

bnl script/install.bnl       # GenAI + ORT NuGet package দুটো নামিয়ে merge করে
.\build.ps1                   # configure + build (এই host triple-এর জন্য)

./build.sh

install script-এ tar দরকার (Windows 10+ এটা সাথেই দেয়; Linux/macOS-এ সবসময়ই আছে)।

Model requirements

GenAI-এর জন্য genai_config.json আছে এমন model directory চাই। যেসব HF export এটা সঙ্গে আনে:

microsoft/Phi-3-*-onnx, microsoft/Phi-3.5-*-onnx
microsoft/Llama-3-*-onnx
xiaoyao9184/Qwen2.5-*-onnx-genai (community)
HF Hub-এ tags:onnxruntime-genai খুঁজুন

GenAI-ready নয় এমন export-এর জন্য onnxruntime_genai.models.builder চালিয়ে missing file গুলো তৈরি করতে হবে (Python + PyTorch দরকার, এক-বারের কাজ)।

অবস্থা

✅ যেকোনো GenAI-ready model dir থেকে load + generate
✅ Sampling options: do_sample, temperature, top_k, top_p, max_new_tokens
⏳ Streaming output (token-by-token callback) — straightforward addition
⏳ Adapter / LoRA — GenAI সমর্থন করে; এখনও expose করা হয়নি
⏳ Multi-modal (image/audio) — শুধু Phi-3-Vision-এর মতো multi-modal LLM-এর ক্ষেত্রে দরকার
⏳ install.bnl-এ Linux/macOS deps (এখন Windows-x64 DML করা আছে)

লেআউট

bnl.json              manifest (main + targets)
CMakeLists.txt        build config
CMakePresets.json
build.ps1  build.sh   convenience wrapper

lib/
  index.bnl           public API (ইংরেজি + বাংলা re-export)

src/                  C++ source (publish-এ আসে না)
  bnl/plugin.h        C ABI contract
  main.cpp            bnl_load entry
  generator.{h,cpp}   GenAI C++ API wrap (model, tokenizer, generator loop)

deps/<triple>/        downloaded prebuilt (gitignored)
build/<triple>/       cmake output (gitignored)

script/
  install.bnl         Microsoft.ML.OnnxRuntimeGenAI + .OnnxRuntime NuGet নামায় + merge করে
  install-metadata.bnl

লাইসেন্স

MIT. Microsoft GenAI / ORT prebuilt গুলো Microsoft-এর MIT লাইসেন্সে। তৃতীয় পক্ষীয় attribution NOTICES.md-এ।

onnxruntime-genai-bnlang

Bnlang binding for onnxruntime-genai — Microsoft's specialized fast-path runtime for LLM text generation (Qwen, Llama, Phi, Mistral, Gemma, …). Wraps the GenAI C/C++ API; runs the full generation loop (KV cache, sampling, beam search) natively in C++.

Sits alongside onnxruntime-bnlang (the general-purpose ORT binding), not on top of it. Used by transformers-bnlang when you pass engine: "onnxruntime-genai".

Unofficial third-party binding. Not affiliated with Microsoft.

এই README-এর বাংলা সংস্করণ — README.md

Quick start

import "onnxruntime-genai-bnlang" as ortg;

var gen = ortg.Generator.open("./models/qwen-0.5b-dml-int4");
var out = gen.generate("Hello", {
    max_new_tokens: 100,
    temperature:    0.7,
    top_k:          50,
    do_sample:      true
});
print(out["generated_text"]);
gen.close();

Or through the transformers-bnlang pipeline (recommended — same public API across engines):

import "transformers-bnlang" as bt;

var gen = bt.pipeline("text-generation", "./models/qwen-0.5b-dml-int4",
                      { engine: "onnxruntime-genai" });
print(gen.run("Hello", { max_new_tokens: 100 })["generated_text"]);
gen.close();

Exports

Name	Kind
`version`	string
`Generator.open(model_dir)`	factory → generator object
`gen.generate(text, opts)`	function → result map
`gen.close()`	function

Why a separate package

Concern	onnxruntime-bnlang	onnxruntime-genai-bnlang
Designed for	any ONNX model	LLM decoder-only generation
Generation loop	you / your wrapper write it	built-in, native
KV cache management	manual	built-in
Sampling	manual	built-in (greedy/top-k/top-p/temperature/beam)
Speed for LLMs	depends on your loop	best-in-class

Same Microsoft layering as the Python ecosystem: onnxruntime for general inference, onnxruntime-genai for specialized LLM fast-paths.

Build (local)

bnl script/install.bnl       # downloads + merges GenAI + ORT NuGet packages
.\build.ps1                   # configure + build for the current triple

./build.sh

The install script needs tar (Windows 10+ ships it; Linux/macOS always).

Model requirements

GenAI requires a model directory with a genai_config.json. HF model exports that ship this:

microsoft/Phi-3-*-onnx, microsoft/Phi-3.5-*-onnx
microsoft/Llama-3-*-onnx
xiaoyao9184/Qwen2.5-*-onnx-genai (community)
search HF Hub for tags:onnxruntime-genai

For non-GenAI exports, run onnxruntime_genai.models.builder to generate the missing files (needs Python + PyTorch, a one-off step).

Status

✅ Loading and generating from any GenAI-ready model dir
✅ Sampling options: do_sample, temperature, top_k, top_p, max_new_tokens
⏳ Streaming output (token-by-token callback) — straightforward add via OgaGenerator::GenerateNextToken loop on bnl side
⏳ Adapters / LoRA — supported by GenAI; not yet exposed
⏳ Multi-modal (images/audio) — only relevant for multi-modal LLMs (Phi-3-Vision etc.)
⏳ Linux/macOS deps in install.bnl (Windows-x64 DML wired today)

Layout

bnl.json              manifest (main + targets)
CMakeLists.txt        build config
CMakePresets.json
build.ps1  build.sh   convenience wrappers

lib/
  index.bnl           public API (English + Bangla re-exports)

src/                  C++ binding (excluded from published tarball)
  bnl/plugin.h        C ABI contract
  main.cpp            bnl_load entry
  generator.{h,cpp}   GenAI C++ API wrap (model, tokenizer, generator loop)

deps/<triple>/        downloaded prebuilts (gitignored)
build/<triple>/       cmake output (gitignored)

script/
  install.bnl         downloads + merges Microsoft.ML.OnnxRuntimeGenAI + .OnnxRuntime
  install-metadata.bnl

License

MIT. Microsoft GenAI / ORT prebuilts are MIT-licensed by Microsoft. See NOTICES.md for third-party attribution.

MIT License

Copyright (c) 2026 Bnlang | Mamun

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

onnxruntime-genai-bnlang

onnxruntime-genai-bnlang

দ্রুত শুরু

Export-গুলো

কেন আলাদা package

লোকাল বিল্ড

Model requirements

অবস্থা

লেআউট

লাইসেন্স

onnxruntime-genai-bnlang

Quick start

Exports

Why a separate package

Build (local)

Model requirements

Status

Layout

License

Third-party notices — onnxruntime-genai-bnlang

ONNX Runtime GenAI

ONNX Runtime

DirectML