transformers-bnlang

onnxruntime-bnlang-এর উপরে Bnlang transformers — pipeline API। দ্রুত LLM generation-এর জন্য optionally onnxruntime-genai-bnlang।

v0.1.0-তে text generation সমর্থিত।

এটি একটি বেসরকারি তৃতীয় পক্ষীয় বাইন্ডিং। Hugging Face বা Microsoft-এর সাথে সম্পর্কিত নয়।

Read this in English — README.en.md

দ্রুত শুরু

import "transformers-bnlang" as bt;

ধরি জেন = bt.পাইপলাইন("text-generation", "./models/Qwen2.5-0.5B-Instruct");
ধরি ফল  = জেন.চালান("Hello", {
    max_new_tokens: 100,
    temperature:    0.7,
    top_k:          50,
    do_sample:      true
});
লিখুন(ফল["generated_text"]);
জেন.বন্ধ_করুন();

দুটি engine, একই public API

// Default: নিজেদের লেখা native loop। যেকোনো decoder-only ONNX model
// (Qwen, Llama, Mistral, Phi, …) চলবে। স্ট্যান্ডার্ড HF লেআউট পড়ে
// (config.json, tokenizer.json, onnx/model.onnx)।
bt.পাইপলাইন("text-generation", "./models/Qwen2.5-0.5B-Instruct");

// Fast path: onnxruntime-genai-এর মধ্য দিয়ে (Microsoft-এর specialized LLM
// runtime)। ~১০× দ্রুত, কিন্তু genai_config.json-সহ একটি model dir চাই।
// একই run/chat/close API।
bt.পাইপলাইন("text-generation", "./models/qwen-0.5b-dml-int4",
            { engine: "onnxruntime-genai" });

Export-গুলো

নাম	ধরন
`সংস্করণ`	string
`পাইপলাইন(task, model_dir, options)`	factory → pipeline object
`অটো_টোকেনাইজার.মডেল_থেকে(dir)`	factory → tokenizer
`অটো_টোকেনাইজার.ফাইল_থেকে(path)`	factory → tokenizer
`tok.এনকোড_করুন(text)`	function → list of ids
`tok.ডিকোড_করুন(ids)`	function → string
`tok.বিশেষ_আইডি(name)`	function → int
`tok.বন্ধ_করুন()`	function
`gen.চালান(text, opts)`	function → result map
`gen.আলাপ_করুন(messages, opts)`	function → result map
`gen.বন্ধ_করুন()`	function
`স্থাপত্য_নিবন্ধন(name, descriptor)`	function — নতুন architecture যোগ

Pipeline option

Option	ধরন	Default	অর্থ
`model`	string	`onnx/model.onnx`	ONNX file-এর path (relative অথবা absolute)
`config`	string	`config.json`	model config-এর path
`tokenizer`	string	`tokenizer.json`	tokenizer config-এর path
`architecture`	map	(config থেকে পড়ে)	per-arch descriptor: layer, KV head, EOS, chat template, …
`engine`	string	(our loop)	`"onnxruntime-genai"` হলে fast path
`execution_providers`	list	`["CPU"]`	ORT EP-গুলোর priority list, যেমন `["DML", "CPU"]`
`log_severity_level`	int	`3`	ORT log verbosity (0=verbose … 4=fatal)

Run-time option (`gen.চালান(text, opts)`)

Option	Default	অর্থ
`max_new_tokens`	32	prompt-এর পরে কতগুলো token generate হবে
`do_sample`	`false`	`true` হলে temperature / top-k / top-p sampling চালু
`temperature`	`1.0`	softmax temperature
`top_k`	`0` (off)	শীর্ষ-K candidate-এ সীমাবদ্ধ
`top_p`	`1.0` (off)	nucleus filter
`seed`	`0`	non-zero দিলে sampler-কে reseed

চ্যাট

ধরি ফল = জেন.আলাপ_করুন([
    { role: "system", content: "তুমি একজন সহায়ক assistant।" },
    { role: "user",   content: "বাংলাদেশের রাজধানী কী?" }
], { max_new_tokens: 64 });
লিখুন(ফল["new_text"]);

Chat template architecture descriptor থেকে আসে (Qwen-এর জন্য chatml, Llama/Mistral-এর জন্য llama2)। কাস্টম template 0.2-এ আসবে।

সমর্থিত architecture (`config.json` `model_type` → descriptor)

`model_type`	Layer	KV head	Head dim	EOS	Template
`qwen2`	24	2	64	151643, 151645	chatml
`llama`	32	8	128	2	llama2
`mistral`	32	8	128	2	llama2

নতুন model family যোগ করতে একটি entry — দেখুন lib/architectures.bnl এবং স্থাপত্য_নিবন্ধন(name, descriptor) escape hatch।

লোকাল বিল্ড

# Windows
.\build.ps1        # cmake configure + build  ->  build/windows-x64/transformers-bnlang.dll

# macOS / Linux
./build.sh

লেআউট

bnl.json                 manifest (main + targets ম্যাপ)
CMakeLists.txt           build config
CMakePresets.json        প্রতি platform-এ একটি preset

lib/
  index.bnl              public API (ইংরেজি + বাংলা re-export)
  pipeline.bnl           pipeline("text-generation", ...) dispatch + engine switch
  tokenizer.bnl          AutoTokenizer.from_pretrained / from_file
  architectures.bnl      Qwen2, Llama, Mistral-এর descriptor (extensible)
  chat_template.bnl      chatml + llama2 template
  generation.bnl         KV-cache loop + sampling ("our loop" engine)

src/                     C++ source (publish-এ আসে না)
  bnl/plugin.h           C ABI
  main.cpp               bnl_load + argmax_last + sample_last native
  bpe.{h,cpp}            byte-level BPE tokenizer (~700 LOC)
  external/json.hpp      vendored nlohmann/json single-header

build/<triple>/          cmake output (gitignored)
test/
  smoke.bnl              dtype + tokenizer round-trip

লাইসেন্স

MIT. ORT এবং GenAI prebuilt গুলো Microsoft-এর MIT লাইসেন্সে। তৃতীয় পক্ষীয় attribution NOTICES.md-এ — বিশেষত nlohmann/json statically embed করা আছে।

transformers-bnlang

Bnlang transformers — pipeline API on top of onnxruntime-bnlang (and optionally onnxruntime-genai-bnlang for fast LLM generation).

v0.1.0 supports text generation.

Unofficial third-party binding. Not affiliated with Hugging Face or Microsoft.

এই README-এর বাংলা সংস্করণ — README.md

Quick start

import "transformers-bnlang" as bt;

var gen = bt.pipeline("text-generation", "./models/Qwen2.5-0.5B-Instruct");
var out = gen.run("Hello", {
    max_new_tokens: 100,
    temperature:    0.7,
    top_k:          50,
    do_sample:      true
});
print(out["generated_text"]);
gen.close();

Two engines, one public API

// Default: our hand-rolled native loop. Runs ANY decoder-only ONNX
// model (Qwen, Llama, Mistral, Phi, …). Reads standard HF layout
// (config.json, tokenizer.json, onnx/model.onnx).
bt.pipeline("text-generation", "./models/Qwen2.5-0.5B-Instruct");

// Fast path: routes through onnxruntime-genai (Microsoft's
// specialized LLM runtime). ~10× faster, requires a model dir
// with genai_config.json. Same run/chat/close API.
bt.pipeline("text-generation", "./models/qwen-0.5b-dml-int4",
            { engine: "onnxruntime-genai" });

Exports

Name	Kind
`version`	string
`pipeline(task, model_dir, options)`	factory → pipeline object
`AutoTokenizer.from_pretrained(dir)`	factory → tokenizer
`AutoTokenizer.from_file(path)`	factory → tokenizer
`tok.encode(text)`	function → list of ids
`tok.decode(ids)`	function → string
`tok.special_id(name)`	function → int
`tok.close()`	function
`gen.run(text, opts)`	function → result map
`gen.chat(messages, opts)`	function → result map
`gen.close()`	function
`register_architecture(name, descriptor)`	function — add a new architecture

Pipeline options

Option	Type	Default	Meaning
`model`	string	`onnx/model.onnx`	path to the ONNX file (relative or absolute)
`config`	string	`config.json`	path to the model config
`tokenizer`	string	`tokenizer.json`	path to the tokenizer config
`architecture`	map	(read from config)	per-arch descriptor: layers, KV heads, EOS, chat template, …
`engine`	string	(our loop)	`"onnxruntime-genai"` to use the fast path
`execution_providers`	list	`["CPU"]`	ORT EPs to try in order, e.g. `["DML", "CPU"]`
`log_severity_level`	int	`3`	ORT log verbosity (0=verbose … 4=fatal)

Run-time options (`gen.run(text, opts)`)

Option	Default	Meaning
`max_new_tokens`	32	tokens to generate after the prompt
`do_sample`	`false`	`true` enables temperature / top-k / top-p sampling
`temperature`	`1.0`	softmax temperature
`top_k`	`0` (off)	restrict to top-K candidates
`top_p`	`1.0` (off)	nucleus filter
`seed`	`0`	non-zero reseeds the sampler

Chat

var out = gen.chat([
    { role: "system", content: "You are a helpful assistant." },
    { role: "user",   content: "What is the capital of Bangladesh?" }
], { max_new_tokens: 64 });
print(out["new_text"]);

Chat templates are picked from the architecture descriptor (chatml for Qwen, llama2 for Llama / Mistral). Custom templates land in 0.2.

Supported architectures (`config.json` `model_type` → descriptor)

`model_type`	Layers	KV heads	Head dim	EOS	Template
`qwen2`	24	2	64	151643, 151645	chatml
`llama`	32	8	128	2	llama2
`mistral`	32	8	128	2	llama2

Add a new model family with one entry — see lib/architectures.bnl and the register_architecture(name, descriptor) escape hatch.

Build (local)

# Windows
.\build.ps1        # cmake configure + build  ->  build/windows-x64/transformers-bnlang.dll

# macOS / Linux
./build.sh

Layout

bnl.json                 manifest (main + targets map)
CMakeLists.txt           build config
CMakePresets.json        one preset per platform

lib/
  index.bnl              public API (English + Bangla re-exports)
  pipeline.bnl           pipeline("text-generation", ...) dispatch + engine switch
  tokenizer.bnl          AutoTokenizer.from_pretrained / from_file
  architectures.bnl      descriptors for Qwen2, Llama, Mistral (extensible)
  chat_template.bnl      chatml + llama2 templates
  generation.bnl         KV-cache loop + sampling (the "our loop" engine)

src/                     C++ — BPE tokenizer + argmax/sample (excluded from published tarball)
  bnl/plugin.h           C ABI contract
  main.cpp               bnl_load + argmax_last + sample_last natives
  bpe.{h,cpp}            byte-level BPE tokenizer (~700 LOC)
  external/json.hpp      vendored nlohmann/json single-header

build/<triple>/          cmake output (gitignored)
test/
  smoke.bnl              dtype + tokenizer round-trip

License

MIT. Underlying ORT and GenAI prebuilts are MIT-licensed by Microsoft. See NOTICES.md for third-party attribution — in particular nlohmann/json, which is statically embedded.

MIT License

Copyright (c) 2026 Bnlang | Mamun

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Third-party notices — transformers-bnlang

This package contains code from third parties. Two distinct relationships:

Statically embedded — code compiled into our binary .dll/.so/.dylib. The original license requires we ship the copyright + permission notice alongside the binary. That's the full text below for nlohmann/json.
Transitive runtime dependency — code loaded at runtime via the onnxruntime-bnlang plugin (which itself dynamically links to ONNX Runtime). Attribution + upstream pointer is below.

nlohmann/json — statically embedded in the binary

Project: JSON for Modern C++
Upstream: https://github.com/nlohmann/json
Version vendored: 3.11.3 (at src/external/json.hpp in our source tree)
License: MIT
SPDX: MIT

Used by the BPE tokenizer to parse tokenizer.json. Because the library is header-only, its compiled code ends up inside our shipped binary.

Full license text

MIT License

Copyright (c) 2013-2023 Niels Lohmann

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to furnish persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

ONNX Runtime — transitive (via onnxruntime-bnlang)

Project: Microsoft ONNX Runtime
Upstream: https://github.com/microsoft/onnxruntime
License: MIT
Copyright: Copyright (c) Microsoft Corporation

This package depends on onnxruntime-bnlang, which downloads and dynamically links to ORT's prebuilt binaries. We do not bundle ORT itself; see onnxruntime-bnlang/NOTICES.md for the full attribution.

The full MIT license text is available at https://github.com/microsoft/onnxruntime/blob/main/LICENSE

transformers-bnlang

transformers-bnlang

দ্রুত শুরু

দুটি engine, একই public API

Export-গুলো

Pipeline option

Run-time option (gen.চালান(text, opts))

চ্যাট

সমর্থিত architecture (config.json model_type → descriptor)

লোকাল বিল্ড

লেআউট

লাইসেন্স

transformers-bnlang

Quick start

Two engines, one public API

Exports

Pipeline options

Run-time options (gen.run(text, opts))

Chat

Supported architectures (config.json model_type → descriptor)

Build (local)

Layout

License

Third-party notices — transformers-bnlang

nlohmann/json — statically embedded in the binary

Full license text

ONNX Runtime — transitive (via onnxruntime-bnlang)

Run-time option (`gen.চালান(text, opts)`)

সমর্থিত architecture (`config.json` `model_type` → descriptor)

Run-time options (`gen.run(text, opts)`)

Supported architectures (`config.json` `model_type` → descriptor)