Name: bert-base-NER
Author: dslim

Technical Overview

dslim/bert-base-NER is a fine‑tuned bert-base‑cased model that performs Named Entity Recognition (NER) on English text. The model classifies each token into one of nine BIO‑style tags (O, B‑LOC, I‑LOC, B‑ORG, I‑ORG, B‑PER, I‑PER, B‑MISC, I‑MISC) and therefore can extract four entity types – locations, organizations, persons and miscellaneous entities – from raw sentences.

Key features & capabilities

Ready‑to‑use with the pipeline("ner") API from 🤗 Transformers.
State‑of‑the‑art F1 score of 0.926 on the CoNLL‑2003 test split.
Compact 110 M‑parameter footprint (bert‑base) while retaining high accuracy.
Available in both cased and uncased variants for downstream flexibility.

Architecture highlights

12‑layer Transformer encoder with 768 hidden units and 12 attention heads.
Pre‑trained on the original BERT‑cased corpus (BooksCorpus + English Wikipedia).
Fine‑tuned on the CoNLL‑2003 NER dataset using a token‑classification head (linear layer + softmax).
Supports PyTorch, TensorFlow, JAX, ONNX and safetensors formats.

Intended use cases

Information extraction from news articles, reports, or any English prose.
Pre‑processing step for downstream tasks such as relation extraction, knowledge‑graph construction, or question answering.
Real‑time NER in chat‑bots, virtual assistants, and document automation pipelines.

Benchmark Performance

The most relevant benchmark for a token‑classification model is the CoNLL‑2003 NER test set. The README reports the following verified metrics:

Accuracy: 0.9118
Precision: 0.9212
Recall: 0.9306
F1‑score: 0.9259
Loss: 0.4833

These numbers place bert-base-NER on par with other BERT‑base NER baselines and ahead of smaller models such as DistilBERT‑NER (≈0.90 F1). The high recall indicates the model rarely misses entities, while the precision shows it keeps false positives low – a crucial balance for downstream pipelines that rely on clean entity spans.

Hardware Requirements

VRAM for inference – The model’s 110 M parameters occupy roughly 420 MB of GPU memory when loaded in FP32. Using 16‑bit (FP16) or torch.float16 reduces this to ~210 MB, allowing inference on a single 4 GB GPU.

Recommended GPU: NVIDIA V100, RTX 3080, or any GPU with ≥6 GB VRAM for comfortable batch‑size ≥ 8.
CPU: Modern x86_64 CPU with at least 8 GB RAM; inference speed scales with core count but is not a bottleneck for typical sentence‑level NER.
Storage: Model files (~420 MB for FP32, ~210 MB for FP16) plus tokenizer files (~30 MB). SSD storage is recommended for fast loading.
Performance: On a V100, token‑level latency is ~1 ms per token (batch = 1). Larger batches (e.g., 32 sentences) can achieve >200 tokens/s.

Use Cases

Primary applications include any scenario that needs to extract structured entities from unstructured English text.

News & media monitoring: Detect people, places, and organizations in real‑time streams.
Legal document analysis: Highlight parties, locations, and miscellaneous entities in contracts or case files.
Customer support automation: Identify product names, user names, and locations in tickets for routing.
Healthcare record anonymization: Flag personal identifiers before de‑identification.

The model can be integrated via the 🤗 Transformers pipeline API, exported to ONNX for edge deployment, or wrapped in a REST API using FastAPI or Flask.

Training Details

The model was fine‑tuned on a single NVIDIA V100 GPU using the hyper‑parameters suggested in the original BERT paper (learning rate ≈ 2e‑5, batch size ≈ 32, 3–4 epochs). The training pipeline:

Base model: bert-base-cased (12 layers, 110 M parameters).
Dataset: English CoNLL‑2003 (≈203 k training tokens, 51 k dev tokens, 46 k test tokens).
Loss: Cross‑entropy over the nine BIO tags.
Evaluation: Accuracy, Precision, Recall, F1, and loss on the test split.
Fine‑tuning capability: Users can continue training on domain‑specific NER corpora (e.g., biomedical or financial) by loading the checkpoint with AutoModelForTokenClassification and providing a new Trainer configuration.

Related Papers

The README references two key publications:

BERT: Pre‑training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) – the foundational architecture.
CoNLL‑2003 Shared Task: Language‑Independent Named Entity Recognition (Tjong Kim Sang & De Meulder, 2003) – the benchmark dataset used for fine‑tuning.

These works underpin the model’s design: BERT provides contextual embeddings, while the CoNLL‑2003 corpus supplies high‑quality BIO‑tagged NER annotations.

Licensing Information

The model card lists the MIT license for the underlying code and data, but the overall license: unknown field indicates that the exact distribution terms for the fine‑tuned weights are not explicitly stated on the hub.

In practice, the MIT license permits:

Free commercial and non‑commercial use.
Modification, redistribution, and integration into proprietary software.
Requirement to retain the original copyright notice and license text.

Because the license is marked “unknown”, users should:

Check the model card for any updates.
Contact the author (dslim) if a commercial deployment raises legal concerns.
Provide attribution to “dslim/bert-base-NER” and the MIT‑licensed source code.

bert-base-NER

Run bert-base-NER locally on a Q4KM hard drive

Technical Overview

Benchmark Performance

Hardware Requirements

Use Cases

Training Details

Licensing Information

Pre-loaded AI models. Ready to run.

bert-base-NER

Run bert-base-NER locally on a Q4KM hard drive

Technical Overview

Benchmark Performance

Hardware Requirements

Use Cases

Training Details

Related Papers

Licensing Information

Related Token Classification Models

Pre-loaded AI models. Ready to run.