Recent advances in the self-referencing embedded strings (SELFIES) library

Alston Lo*, Robert Pollice, Akshat Kumar Nigam, Andrew D. White, Mario Krenn, Alán Aspuru-Guzik

*Corresponding author voor dit werk

    Onderzoeksoutputpeer review

    1 Citaat (Scopus)
    17 Downloads (Pure)

    Samenvatting

    String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencing embedded strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation called selfies. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints, and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of selfies, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of selfies (version 2.1.1) in this manuscript. Our library, selfies, is available at GitHub (https://github.com/aspuru-guzik-group/selfies).

    Originele taal-2English
    Pagina's (van-tot)897-908
    Aantal pagina's12
    TijdschriftDigital Discovery
    Volume2
    Nummer van het tijdschrift4
    DOI's
    StatusPublished - 1-aug.-2023

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Recent advances in the self-referencing embedded strings (SELFIES) library'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit