استفاده از الگوریتمهای یادگیری ماشین برای پیشبینی وقوع ورم پستان بالینی در گاوهای هلشتاین. (Persian)
In: Journal of Animal Production, Jg. 25 (2023-07-01), Heft 2, S. 123-132
academicJournal
Zugriff:
Introduction: Mastitis is one of the most frequent and costly diseases of the dairy cattle industry and causes many economic losses, which negatively affects milk yield and composition, fertility, longevity and welfare of cows. The best solution for reducing the economic and biological consequences is early and accurate prediction of mastitis based on indicator factors. So far, various statistical methods have been used to predict mastitis such as linear and multiple regression, and threshold models. Machine learning is another method that has recently widely been used to predict farm profitability, reproductive traits, longevity and abortion in dairy cow. Machine learning is defined as a set of methods for automatically finding patterns in data and then using those patterns to predict possible future data. Material and Methods: In this research, the performance of four machine learning algorithms including random forest, decision tree, Naïve Bayes and logistic regression and two sampling methods, over-sampling and under-sampling, were compared to predict risk of clinical mastitis based on data collected in two Holstein dairy herds in Isfahan province. Final dataset included 393504 records on cows calved during 2007 to 2017 of which 13653 cases (3.47%) were infected and 379851 cases (96.53%) were healthy. Factors related to mastitis, including parity, daily milk production, calving season, lactation stage, history of mastitis and somatic cell score. After editing the data with SQL Server software, the modeling process was implemented to predict mastitis using WEKA 3.8 software. The performance of algorithms (accuracy, sensitivity, specificity, and AUC) in predicting infected cases and distinguishing them from healthy cases was evaluated according to the preprocessing method used. The sampling techniques used included Under Sampling (SpreadSubSampling) and Synthetic minority oversampling technique (SMOTE). Results and Discussion: Results showed that the best performance among the algorithms was related to the random forest in the case of using the low-sampling method with the accuracy, sensitivity, detection and AUC rates of 84.30%, 94.80%, 73.80% and 90.90%, respectively. In the case of not using sampling, the power to detect sick cases (sensitivity in percentage) in random forest, decision tree, Naïve Bayes and logistic regression algorithms was 1.67, 0, 12.29 and 2.06, respectively, which compared to sampling was considerably weaker. This was due to the unbalanced number of cases in two classes, healthy and sick, and indicated the necessity of using sampling methods. The decision tree algorithm in the case of low-sampling method with a small difference after the random forest has the best performance with accuracy, sensitivity, detection and AUC 84.00%, 94.20%, 73.90% and 90%, respectively. Comparing the models obtained from the four algorithms Decision Tree, Logistic, Naïve Bayes and Random Forest in three modes without preprocessing, with SpreadSubSample preprocessing and with SMOTE preprocessing, among the preprocessing modes, preprocessing by SMOTE method can significantly improve the performance of the algorithms. Among the algorithms that were pre-processed with this method, the Random Forest algorithm has shown the best performance with an accuracy of 99.2% and an area under ROC curve (AUC) of 0.99. Decision Tree algorithm has performed very close to Random Forest with accuracy of 98.9 and AUC of 0.99. Likewise, Naïve Bayes algorithm with accuracy and AUC of 0.92 and 82.9 and Logistic algorithm with accuracy and AUC of 83.7 and 0.91, respectively had acceptable performances after the other two algorithms. Conclusion: Due to the high performance of the Random Forest algorithm using the SMOTE preprocessing method, in predicting mastitis cases, the use of this model can be suggested to predict cases of mastitis in dairy cattle herds, especially in herds with high rates of mastitis. Because of higher computational cost of random forest compared to random tree, in large dataset, decision tree probably should be a better choice. [ABSTRACT FROM AUTHOR]
Copyright of Journal of Animal Production is the property of University of Tehran and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Titel: |
استفاده از الگوریتمهای یادگیری ماشین برای پیشبینی وقوع ورم پستان بالینی در گاوهای هلشتاین. (Persian)
|
---|---|
Autor/in / Beteiligte Person: | محمد تقی فیاضی کی ; محمد داد پسند ; کشاورزی, حمیده |
Zeitschrift: | Journal of Animal Production, Jg. 25 (2023-07-01), Heft 2, S. 123-132 |
Veröffentlichung: | 2023 |
Medientyp: | academicJournal |
ISSN: | 2008-6776 (print) |
DOI: | 10.22059/jap.2023.349388.623708 |
Sonstiges: |
|