Clustering Large-Scale Text Data Using BERT Embeddings and Spectral Methods
Abstract
Text clustering is a fundamental task in natural language processing, often challenged by high dimensionality and semantic ambiguity. This paper explores the integration of BERT embeddings with spectral clustering methods for large-scale text data. By leveraging BERT’s contextualized representations, the approach captures semantic nuances, while spectral clustering ensures robust partitioning. Experiments on news articles and customer reviews datasets reveal improved clustering coherence and scalability, making the proposed method suitable for applications in sentiment analysis and topic discovery.
References
Kotler, P., & Keller, K. L. (2012). Marketing management (14th ed.). Pearson Education.
Krugman, P. R., & Obstfeld, M. (2009). International economics: Theory and policy (8th ed.). Pearson.
Maxwell, J. A. (2013). Qualitative research design: An interactive approach (3rd ed.). Sage.
Mintzberg, H. (1994). The rise and fall of strategic planning. Free Press.
Porter, M. E. (1980). Competitive strategy: Techniques for analyzing industries and competitors. Free Press.
Sekaran, U., & Bougie, R. (2016). Research methods for business: A skill-building approach (7th ed.). Wiley.
Yin, R. K. (2018). Case study research and applications: Design and methods (6th ed.). Sage.
Adusumilli, S., Damancharla, H., & Metta, A. (2020). Artificial Intelligence-Driven Predictive Analytics for Educational Behavior Assessment. Transactions on Latest Trends in Artificial Intelligence, 1(1). Retrieved from https://www.ijsdcs.com/index.php/TLAI/article/view/638
Adusumilli, S., Damancharla, H., & Metta, A. (2020). Machine Learning Algorithms for Fraud Detection in Financial Transactions. International Journal of Sustainable Development in Computing Science, 2(1). Retrieved from https://www.ijsdcs.com/index.php/ijsdcs/article/view/639
Adusumilli, S., Damancharla, H., & Metta, A. (2021). Deep Learning Techniques for Image Recognition in Autonomous Vehicles. (2021). International Meridian Journal, 3(3). https://meridianjournal.in/index.php/IMJ/article/view/94
Adusumilli, S., Damancharla, H., & Metta, A. (2021). Integrating Machine Learning and Blockchain for Decentralized Identity Management Systems. (2021). International Journal of Machine Learning and Artificial Intelligence, 2(2). https://jmlai.in/index.php/ijmlai/article/view/46
Adusumilli, S., Damancharla, H., & Metta, A. (2022). Blockchain-Based Secure Framework for IoT Data Management. International Journal of Sustainable Development in Computing Science, 4(1). Retrieved from https://www.ijsdcs.com/index.php/ijsdcs/article/view/640
Adusumilli, S., Damancharla, H., & Metta, A. (2022). Optimizing Supply Chain Efficiency Through Blockchain and Smart Contracts. (2022). International Numeric Journal of Machine Learning and Robots, 6(6). https://injmr.com/index.php/fewfewf/article/view/183
Adusumilli, S. B. K., Damancharla, H., & Metta, A. R. (2021). AI-Powered Cybersecurity Solutions for Threat Detection and Prevention. International Journal of Creative Research In Computer Technology and Design, 3(3).
Adusumilli, S. B. K., Damancharla, H., & Metta, A. R. (2020). Leveraging AI for Real-Time Sentiment Analysis in Social Media Networks. International Numeric Journal of Machine Learning and Robots, 4(4).
Dhaiya, S., Pandey, B. K., Adusumilli, S. B. K., & Avacharmal, R. (2021). Optimizing API Security in FinTech Through Genetic Algorithm based Machine Learning Model.
Adusumilli, S. B. K. Mitigating Cybersecurity Risks in Embedded Systems A Software-First Approach.
Whig, P., & Adusumilli, S. B. K. (2022). Machine Learning Applications in Healthcare Supply Chains: Improving Efficiency, Resilience, and Patient Outcomes. Transactions on Recent Developments in Health Sectors, 5(5).
