New workflow for QSAR model development from small data sets: Small Dataset Curator and Small Dataset Modeler. Integration of data curation, exhaustive double cross-validation, and a set of optimal model selection techniques

Pravin Ambure , Agnieszka Gajewicz-Skrętna , M. Natalia D. S. Cordeiro , Kunal Roy


Quantitative structure–activity relationship (QSAR) modeling is a well-known in silico technique with extensive applications in several major fields such as drug design, predictive toxicology, materials science, food science, etc. Handling small-sized datasets due to the lack of experimental data for specialized end points is a crucial task for the QSAR researcher. In the present study, we propose an integrated workflow/scheme capable of dealing with small dataset modeling that integrates dataset curation, “exhaustive” double cross-validation and a set of optimal model selection techniques including consensus predictions. We have developed two software tools, namely, Small Dataset Curator, version 1.0.0, and Small Dataset Modeler, version 1.0.0, to effortlessly execute the proposed workflow. These tools are freely available for download from We have performed case studies employing seven diverse datasets to demonstrate the performance of the proposed scheme (including data curation) for small dataset QSAR modeling. The case studies also confirm the usability and stability of the developed software tools.
Author Pravin Ambure
Pravin Ambure,,
, Agnieszka Gajewicz-Skrętna (FCh / DEChR / LECh)
Agnieszka Gajewicz-Skrętna,,
- Laboratory of Environmental Chemometrics
, M. Natalia D. S. Cordeiro
M. Natalia D. S. Cordeiro,,
, Kunal Roy
Kunal Roy,,
Journal seriesJournal of Chemical Information and Modeling, ISSN 1549-9596, e-ISSN 1549-960X, (N/A 100 pkt)
Issue year2019
Publication size in sheets0.5
ASJC Classification3309 Library and Information Sciences; 1500 General Chemical Engineering; 1600 General Chemistry; 1706 Computer Science Applications
Languageen angielski
Score (nominal)100
Score sourcejournalList
ScoreMinisterial score = 100.0, 12-03-2020, ArticleFromJournal
Publication indicators WoS Citations = 0; Scopus SNIP (Source Normalised Impact per Paper): 2018 = 1.163; WoS Impact Factor: 2018 = 3.966 (2) - 2018=4.297 (5)
Citation count*3 (2020-05-19)
Share Share

Get link to the record

* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.
Are you sure?