In affective computing applications, access to labeled spontaneous affective data is essential for testing the designed algorithms under naturalistic and challenging conditions. Most databases available today are acted or do not contain audio data. We present a spontaneous audio-visual affective face database of affective and mental states. The video clips in the database are obtained by recording the subjects from the frontal view using a stereo camera and from the half-profile view using a mono camera. The subjects are first shown a sequence of images and short video clips, which are not only meticulously fashioned but also timed to evoke a set of emotions and mental states. Then, they express their ideas and feelings about the images and video clips they have watched in an unscripted and unguided way in Turkish. The target emotions, include the six basic ones (happiness, anger, sadness, disgust, fear, surprise) as well as boredom and contempt. We also target several mental states, which are unsure (including confused, undecided), thinking, concentrating, and bothered. Baseline experimental results on the BAUM-1 database show that recognition of affective and mental states under naturalistic conditions is quite challenging. The database is expected to enable further research on audio-visual affect and mental state recognition under close-to-real scenarios.