AllenNLP started before transformers, and so it provided high level abstractions to experiment with model architectures, which is where much of NLP research was happening at the time. Transformers definitely changed the playing field, as it became the basis for most models!
I'll give you specific examples where AllenNLP overdid it, while HuggingFace was better just by keeping it simple.
Vocabulary class. HuggingFace just used a python dictionary. I can't think of one person who said they needed higher level abstraction. Turns out a python dictionary is pickle-able, saving to a text file is one line code, while the AbstractSinglettonProxyVocabulary is not and no one wants to care in the first place.
Tokenizer class. HuggingFace just used a python dictionary to return strings and integers. I can't think of one person frustrated by it. It's printable, picklable, and everything in between people can fiddle with. And boy where do I start about AllenNLP's overdoing of Tokenizers.
Trainer class. vs. HuggingFace example scripts. The scripts are just much more readable, tweakable, debuggable etc. HF didn't bother with AbstractBaseTrainer class bs.
It just shows they never understood the playing field.
- First, I don't think anyone thought AllenNLP was a good choice for high performance production systems. Again HuggingFace clearly understood the problem and built a fast tokenizer in Rust.
- A math, physics, linguistics, or even CS PhD student who know basics of coding would prefer bare bone scripts. They just want to hack it off and focus on research. Writing good code is not their objective.
AllenNLP was written for research, not for production. Many of the design choices reflect that.
As far as the vocabulary goes, a lot of AllenNLP components are about experimenting with ways to turn text into vectors. Constructing the vocabulary is part of that. When pre-trained transformers became a thing, this wasn't needed anymore. That's part of why we decided to deprecate the library: Very few people experiment with how to construct vocabularies anymore, so we don't want to live with the complexity anymore.
Hugging Faces APIs really aren't that great, I hear lots of people complain about them. All HF did was make transformers very accessible and sharable with a neat UI.
When we started AllenNLP, PyTorch was just starting to emerge as a competitor to Tensorflow and we made the difficult decision to support PyTorch. In hindsight this was a great decision as the majority of top research is done in PyTorch today.
Tango primarily supports PyTorch, but unlike AllenNLP, is flexible enough to support other deep learning libraries as well. For example, we're adding support for JAX so we can easily leverage TPUs.
For what I've seen Tango is a general dag/pipeline that happens to have some facilities for PyTorch. I don't see any deep learning specific. You could execute sklearn or whatever.
Maybe we need to re-work the docs if the DAG aspects stick out to you so much. The main functionality is the cache. If you have a complex experiment, you can still write the code as if all the steps were fast, and let them be slow only the first time you run it. The DAG stuff is also nice, but less important.
That said, you could execute sklearn. If that's what your experiment needs, it's the right thing to do. This is why it gives us the flexibility to also support Jax: https://github.com/allenai/tango/pull/313
The DL-specific stuff is in the components we supply. Like the trainer, dataset handling stuff, file formats, and increasingly, https://github.com/allenai/catwalk.
AllenNLP has only ever supported Torch. At the moment, Tango only supports Torch as well, but Jax support is well underway.
And yeah, Tango is a lot like a build script. In fact, I used to manage my experiments with Makefiles. Tango is better though. Results don't have to be single files, and they don't have to live in one filesystem either, so I can run the GPU-heavy parts of my experiments on one machine, and the CPU-heavy parts on another. The way you version your code is better than what Makefiles can do. You have actual control beyond file modification time. And of course, there is the whole Python integration stuff.
HuggingFace Transformers is a huge high quality open source repo of pre trained models & associated code. People combine that with Pytorch-Lightning or Fairseq most of the time afaik.
It depends on what you use AllenNLP for. AllenNLP has a ton of functionality for vectorizing text. Most of the tokenizer/indexer/embedder stuff is about that. But these days we all use transformers for that, so there isn't much of a need to experiment with ways to vectorize.
If you like the trainer, or the configuration language, or some of the other components you should check out Tango (https://github.com/allenai/tango). One of Tango's origins is the question "What if AllenNLP supported workflow steps other than read -> train -> evaluate?". We noticed that a lot of work in NLP no longer fit that simple pattern, so we needed a new tool that can support more complex experiments.
If you like the metrics, try torchmetrics. Torchmetrics has almost exactly the same API as AllenNLP metrics.
If you like any of the nn components, please get in touch with the Tango team (on GitHub). We recently had some discussion around rescuing a few of those, since there seems to be some excitement.