Wheat yield is positively correlated with the number of wheat spikes in the field, which is an essential index for plant breeders. To efficiently calculate this index, there is a high demand for precise and automatic plant phenotyping methods that quantify images. As a tool for estimating crop yields, machine vision is rapidly advancing. In the majority of crop phenotyping fields, however, annotating thousands of small objects using bounding boxes or polygons is extremely time- and labor-intensive. This study aims to develop a state-of-the-art framework for localizing and counting wheat spikes using dotted annotation datasets in conjunction with Gaussian and constant density map generation algorithms. In addition, we developed hybrid UNet architectures as the computational component, including VGG16-UNet, ResNet34-UNet, ResNet50-UNet, and ResNeXt-UNet. Furthermore, we improved the performance of wheat counting by employing a hybrid density map estimation-non-maximal supervision algorithm. Wheat images from the ACID and GWHD datasets are used to evaluate the proposed models. With F1 score and Mean Absolute Percentage Error (MAPE) of 0.96 and 1.68%, respectively, the results from the ACID dataset demonstrate a significant improvement in spike localization and counting compared to previous research studies. Additionally, to assess the proposed models’ generalizability in real-world modeling scenarios, the ACID-based pretrained models are used to predict the more complex in-field GWHD dataset. With an F1 score of 0.57 and a MAPE of 2.302%, the pretrained models were able to localize and count the wheat spikes in the new dataset. Following that, the ACID-based pretrained models are retrained for a few epochs using the GWHD dataset’s small training sample size. The results demonstrate a significant improvement in the proposed model’s performance, with an F1 score of 0.91 and a MAPE of 1.56%.