Instead of manually curating .bib files, regenbib (available on Github/PyPI) makes it easy to auto-generate tidy .bib files using high-quality metadata obtained from online providers such as dblp.
Motivation
The gist of regenbib is as follows.
Instead of manually maintaining a references.bib file with a bunch of entries like this …
@inproceedings{streamlet,
author = "Chan, Benjamin Y. and Shi, Elaine",
title = "Streamlet: Textbook Streamlined Blockchains",
booktitle = "{AFT}",
pages = "1--11",
publisher = "{ACM}",
year = "2020"
}
… you should maintain a references.yaml file with corresponding entries like that:
entries:
- bibtexid: streamlet
dblpid: conf/aft/ChanS20
The tool regenbib can then automatically (re-)generate the references.bib from the references.yaml in a consistent way by retrieving high-quality metadata information from the corresponding online metadata provider (in the example above: dblp’s entry conf/aft/ChanS20).
The tool regenbib-import helps you maintain the references.yaml file. Using LaTeX’s .aux file, it determines entries that are cited but are currently missing from the references.yaml file. It then helps you determine an appropriate online reference through an interactive lookup right from the command line. In the lookup process, you can use an old (possibly messy) references.bib file to obtain starting points for the search (eg, title/author in an old references.bib entry can be used to lookup the paper on dblp).
Example Usage
Suppose we have an old references.bib file with this entry (and suppose it does not have a corresponding entry in our references.yaml file):
@misc{streamlet,
author = {Chan and Shi},
title = {Streamlet Textbook Streamlined Blockchains}
}
We can easily import a corresponding entry to our references.yaml file with regenbib-import:
$ regenbib-import --bib references.bib --aux _build/main.aux --yaml references.yaml
Importing entry: streamlet
-> Current entry: Entry('misc',
fields=[
('title', 'Streamlet Textbook Streamlined Blockchains')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Chan'), Person('Shi')])]))
-> Import method? [0=skip, 1=dblp-free-search, 2=arxiv-manual-id, 3=eprint-manual-id, 4=current-entry, 5=dblp-search-title, 6=dblp-search-authorstitle]: 6
-----> The search returned 2 matches:
-----> (1) Benjamin Y. Chan, Elaine Shi:
Streamlet: Textbook Streamlined Blockchains. AFT 2020
https://doi.org/10.1145/3419614.3423256 https://dblp.org/rec/conf/aft/ChanS20
-----> (2) Benjamin Y. Chan, Elaine Shi:
Streamlet: Textbook Streamlined Blockchains. IACR Cryptol. ePrint Arch. (2020) 2020
https://eprint.iacr.org/2020/088 https://dblp.org/rec/journals/iacr/ChanS20
-----> Intended publication? [0=abort]: 1
As you see, regenbib-import uses the messy/incomplete information from the old references.bib file to help us quickly determine the appropriate dblp entry. This adds the following entry to references.yaml:
entries:
- bibtexid: streamlet
dblpid: conf/aft/ChanS20
We can then re-generate a tidy references.bib file based on the references.yaml file:
$ regenbib --yaml references.yaml --bib references.bib
DblpEntry(bibtexid='streamlet', dblpid='conf/aft/ChanS20')
Entry('inproceedings',
fields=[
('title', 'Streamlet: Textbook Streamlined Blockchains'),
('booktitle', '{AFT}'),
('pages', '1--11'),
('publisher', '{ACM}'),
('year', '2020')],
persons=OrderedCaseInsensitiveDict([('author', [Person('Chan, Benjamin Y.'), Person('Shi, Elaine')])]))
$ cat references.bib
@inproceedings{streamlet,
author = "Chan, Benjamin Y. and Shi, Elaine",
title = "Streamlet: Textbook Streamlined Blockchains",
booktitle = "{AFT}",
pages = "1--11",
publisher = "{ACM}",
year = "2020"
}
Installation/Outlook
For the latest version and instructions on how to install regenbib, head over to Github or PyPI. Currently, regenbib supports dblp, arXiv and IACR ePrint as metadata providers. As a fallback, entries can always be specified in raw .bib format. Adding metadata providers is easy with just a few lines of code. Please let me know what metadata providers you commonly use. I hope to soon add support for IEEE Xplore, ACM Digital Library, and blogs/websites.




