Matching AI methods to available data


Machine learning can only be as good as the data. SOme algorithms require very dense data, others can tolerate less. Do you believe it is important to understand the data first before choosing a mchine learning algorithm, or an alterantive algorithm? For example, toxicity requires a lot of data to enable a machine to learn. We have found that if 3D molecular structure is not preserved, data fidelity is lost, but if it is preserved, one may have to choose somehting other than deep learning due to data density. Agree or disagree?

Ed Addison
11 months ago

2 answers


Hi Ed, I think a flexible approach to machine learning is necessary. While the output of AI tools is strongly dependant on the quality of input data, it is critical that data scientists consider the importance of a flexible framework to work with, one for instance, that includes pattern matching, automatic backtracking, and tree-based data structuring mechanisms.

Polina Pomerants
11 months ago

I agree with you on the point that correct data is of utmost importance before taking the logical step of feeding these to get AI driven results.
• The correct choice would be to have an in house R&D do all the experiments and feed them a "standard" data and optimize the algorithm.
• Next, they would have to collaborate with a series of external academics who would perform the same exact experiments and then the data would be fed to the algorithm to obtain results. This would create an error percentage and after determining the cut-off, the algorithm would be ready to perform for the respective work.
• Another strategy would be to outsource the same experiments to various academics and feed the data to the algorith to create a standard. Then the standard is tested with the in-house R&D test results.
• The data set and number of data set can vary quite a bit depending on the stringency of the algorithm.

Krishnendu Chatterjee Ph.D.
10 months ago

Have some input?