Methodological Aspects of the use of the Sleep-EVAL system

The comparability between epidemiological surveys of sleep disorders has been encumbered due to the array of methodologies used from study to study.

It has been difficult to appreciate the epidemiology of sleep disorders until this time as both the number and comparability of surveys is limited.
Consequently, from the dozen or so surveys which have been published on the prevalence of sleep complaints and disorders (1,13), we cannot know if discrepancies in the rates across populations are in fact true, or simply due to methodological variations.
Based upon these observations, we planned a series of epidemiological studies which aim to document the prevalence of sleep disorders in the general population according to DSM-IV (14) and ICSD-90 (15) criteria.
More specifically, the prevalence of sleep disorders in persons complaining of insomnia and/or excessive daytime sleepiness and the impact of these disorders on cognitive and daytime functioning are determined. In addition, the prevalence of psychotropic drug consumption relative to these sleep disorders is measured.



Most epidemiological surveys on sleep have been carried out by means of household interviews, postal questionnaires, or a mix of these methods.
The choice of the interview method for our studies rested on several considerations:
- The samples have to be representative of the target country (or city) population;
- the surveys have to be completed in a short time period;
- the interviews have to be computer-driven; and
- production costs have to be minimized while maximizing the data collected.

Although telephone interviews have seldom been used in the past, several features render it very attractive.
One is the possibility it affords of covering an entire country, including small and remote localities from the same site.
This is also possible with household interviews, but not without certain difficulties, such as the need to create several sites across the country for the recruitment of local interviewers and the greater amount of time required to complete the study owing to the smaller number of interviews that can be performed in the same day and transportation time.

Another attractive aspect of the telephone interview method is the possibility it offers to ensure close supervision of interviewers.
In our study, interviewers are monitored at different times by supervisors.
This provides an opportunity to give the interviewer feedback on his or her interview style, pointing out strengths and weaknesses.
This type of constant supervision is not possible with household interviews unless the interviewer records the interview with the permission of the interviewee.

Finally, telephone interviews grant absolute control in the application of the selection procedure.


Each sample is drawn using a two-stage design.
At the first stage, the population is divided into homogeneous groups (strata) from which the random sample will be selected. Stratification improves the precision of epidemiological inquiries by decreasing fluctuations in sampling.
One of the prerequisites of stratification is to know the value of the stratification criteria for all units before drawing the sample.
Official census data contain the necessary information to apply an efficient stratification based on socio-demographic information.
Census data will serve to stratify the samples a priori according to geographical criteria (geographical areas and size of settlements).
Telephone numbers are subsequently pulled according to this stratification.

At the second stage, a controlled selection method is applied to limit the within-sampling unit noncoverage error.
Therefore, the household member to be interviewed is selected using the Kish procedure (18).
This selection method is considered to be the most rigorous for epidemiological inquiries and is a standard for within-household selection of individuals (19).
The Kish method is based on the utilization of eight selection tables that serve to select the household member for the interview, while maintaining a representative sample.
This technique is rarely used in telephone inquiries because of the logistics involved and its intrusive nature.
It requires that the inquirer collect the age and gender of all eligible subjects and then sort the men and then the women by decreasing age.
Next, the inquirer has to refer to the appropriate table (previously assigned to the household) to identify the person to interview.

The Kish method will be used in our study.
For each telephone number drawn, a Kish table number will be assigned with respect to the prescribed proportions.
There are, however, certain limitations to this procedure.
The method leads to a small under-representation of the youngest eligible household members.
Furthermore, it can be quite time-consuming when there are many eligible persons in the same household, and less experienced or skillful interviewers may get high refusal rates.
To minimize this risk for our study and alleviate the work of the interviewers, the Kish procedure is integrated in the Sleep-Eval software.
After explaining that a technique of sample selection must be applied, the interviewer enters the age and gender of eligible household members in the computer as the respondent lists them.
The computer instantly designates the household member selected to be interviewed.

The computerization of the Kish method has several advantages: Age and gender of all eligible household members are registered in the data file which allows verification of the parameters that governed the choice of the subject; the interviewer does not intervene in the selection of the respondent and the selection is strictly controlled by the software.



Since telephone inquiries are performed, telephone directories are used for the initial set of telephone numbers of the targeted populations. However, the simple utilization of these directories raises several problems. Indeed, a non-negligible bias can be introduced in the sample as telephone directories do not include new and unlisted numbers. Therefore, it is necessary to incorporate an alternative strategy in order to include these numbers. Several strategies exist: a complete random generation of telephone numbers; a semi-random generation (keeping a certain number of digits fixed, for example, the two or three first digits of a telephone number which are habitually specific to a region, and then generating the remaining numbers randomly) (20); an added-digit technique (for example, raising the last digit of a telephone number by 1) (21). This last method is employed in our study on account of its simplicity. A pool of telephone numbers equal to the sample size is first drawn, and the added-digit technique is subsequently applied with numbers no longer in service, refusals and rejected numbers.
Interviewers explain the goals of the study to potential participants before requesting verbal consent. Respondents are given the opportunity to call a member of the research team if they want further details before deciding to participate. Subjects with insufficient fluency in the national language or with a hearing or speech impairment or a physical illness precluding an interview are disqualified. Subjects who refuse to participate or who give up before completing at least half the interview are classified as refusals. Phone numbers are dropped and replaced only after a minimum of 10 unsuccessful dial attempts are made at different times and on different days, including weekdays and weekends.
The participation rate is calculated by dividing the number of completed interviews by the number of eligible participants. Eligible participants include the following: (1) subjects who accept to be interviewed; (2) subjects who refuse to be interviewed after a first categorical refusal or after a second refused call; and (3) subjects for whom the interviewer is unable to determine whether an exclusion criterion is met. The following are excluded from the calculation of eligible subjects: (1) business and fax numbers; (2) numbers not in service; (3) subjects meeting an exclusion criterion; and (4) numbers dropped after ten unsuccessful dial attempts. This method of calculating the completion rate has been referred to as the "most reasonable completion rate" (22).


Interviewers are selected according to several criteria. On the one hand, personal style is considered: voice intonation, courtesy over the telephone, fluency in reading questions, quality of diction and ability to clarify points for respondents. On the other hand, understanding the methodology and a willingness to strictly respect the research protocol also determines the selection of interviewers. These interviewers have at least 14 years of schooling but experience in psychiatric assessment is not required. They receive special training in the use of the Sleep-Eval Knowledge Based System (23,24) of a duration of about three hours. Two to four days are used to train interviewers in interviewing techniques through role playing.
The computers are located at a single site where all interviews are performed. The interviews take place from 10 a.m. to 10 p.m. and involve two work shifts. Interviewers work a maximum of 32 hours per week. No telephone contact are initiated after 8:30 p.m. unless requested by the interviewee. It is possible to complete the interview over two or more phone sessions. The team of interviewers is monitored daily by two supervisors whose duty is to listen in on calls in progress to ensure that questions are correctly asked and data properly entered.


To meet the surveys' objectives, the assessment tool requires the following features: (1) administerable by lay interviewers in order to limit costs; (2) broad coverage of sleep habits including sleep/wake schedule and sleep hygiene; (3) inclusion of medication consumption and its relationship to sleep disorders; (4) ability to identify psychiatric disorders frequently associated with sleep disorders; and (5) ability to formulate sleep diagnoses according to DSM-IV and ICSD-90 criteria. Several valuable assessment tools exist but none meet all of these requirements. With such prerequisites, it became evident that a paper-and-pencil questionnaire would be impossible to manage. Therefore, the use of a computerized tool appears necessary.
Since 1983, the project manager (M.O.) has been developing the Adinfer expert system for the assessment of psychiatric disorders (25). Written in C++ language, two other expert systems were developed: Expertal for forensic psychiatry and Sleep-EVAL (@ M. Ohayon, 1992) for the assessment of sleep disorders. These expert systems operate with the same functioning and share the same assessment of psychiatric disorders, which means that production rules leading to psychiatric disorder diagnoses are the same. The Sleep-EVAL system is composed of a compiled knowledge base, an inference engine, a mathematical preprocessor, a neural network driving the fuzzy logic set of rules and a user interface.


The knowledge base contains the knowledge representation of DSM-IV and ICSD-90 classifications.

This representation, transformed into a symbolic system, allows the description, understanding, handling and rationalization of the diagnostic classifications.

The symbolic system is expressed in the knowledge base in the form of compiled production rules readily interpretable by the inference engine.
A production rule is the expression of nosological entities (e.g., symptoms or diagnoses) in a logical form.
In its simplest representation, a production rule is written in the form of "IF" and "THEN" statements.
Table 1 summarizes the main topics includes in the knowledge base of the Sleep-EVAL system.

Overall, the system contains 1543 questions for the subject.
Obviously, no subject will answer the totality of these questions


The inference engine, or knowledge processor, reads and interprets the knowledge base.

The causal reasoning mode allows the system to pose a series of diagnostic hypotheses that are later confirmed or rejected through further questioning or deductions (non-monotonic, level-2 feature).

The system first calls up a series of questions that are put to the entire sample.
Once the answers are analyzed, the system runs through all diagnostic trees and looks for possible diagnostic hypotheses.
Once a hypothesis is found, the system begins to explore it.

This exploratory process may require the subject to answer other questions.
It pursues the diagnostic exploration until a final decision is obtained.
Once the value of this hypothesis is known, it looks for another possible diagnosis.
This process is repeated until all diagnostic trees have been explored.

The neural network manages any uncertainty in the subject's answers as well as in diagnoses.
This means, in the end, that each explored object (including diagnoses) will have a degree of certainty ranging from 0.4 (completely present) to -0.4 (completely absent).

The mathematical preprocessor is called on by the inference engine when it meets special instructions in the knowledge base.
It permits diverse temporal information to be compared and allows adequate diagnostic decisions which incorporate this type of information and reasoning.
It is also used for several mathematical operations such as:
- to convert an age into months or weeks and
- hours into minutes or seconds,
- to compare duration of symptoms or
- discrepancies between hours, and
- to set the range of a numerical keyboard answer.

For a subject with no sleep problems and no psychiatric disorders, the interview is brief (20 to 30 minutes).
A subject with sleep problems, however, is submitted to a more exhaustive interview (60 to 120 minutes in some cases). The computer program selects all of the questions.

The role of the human interviewer is to pose the question on the computer screen by phone to the subject being interviewed.
Samples and directives on to how answer a question are also provided.

Questions are answered according to several formats.
Some questions are entered using the keyboard, for example to specify an illness or a duration.
In this last case, the keyboard access is limited to numbers in order to avoid unexploitable answers.

Other questions are quoted on a five point scale, usually to assess the severity of a symptom and ranged from "no effect" to "severe impairment" and/or to assess its frequency.

Finally, some questions are answered on a "Yes-No" or on a "Present-Absent-unknown" basis.


To date (1993), three validation studies were conducted: - The first study involved 114 subjects from general practice (26, 27). In this study, the diagnosis of 3 of 4 psychiatrists was used as a gold standard: a subject received a diagnosis when at least 3 psychiatrists agreed on which diagnosis this subject should receive (consensus was achieved for 88 subjects) and then the diagnosis obtained was compared with the expert system when used by a psychologist. The rationale underlying this methodology was that it is difficult to test the reliability of decisional trees from the expert system when psychiatrists do not agree on the diagnosis. Obviously, when the decisional trees of the expert system are correctly expressed, this methodology is likely to give high agreement. Consequently, the overall agreement (kappa coefficient) was of .97. Otherwise, when the diagnoses obtained with the expert system are compared against those of individual psychiatrists (without regarding how psychiatrists agreed together), as most tool validation studies use to do, kappas were lower: .78 with psychiatrist A; .75 with psychiatrist B; .73 with psychiatrist C; and .44 psychiatrist D. Psychiatrists between themselves received kappas ranging from .39 to .78. A second study compared the diagnoses of the expert system used by a psychologist against those of 10 psychiatrists in a forensic hospital (Philippe Pinel Institute) and involved 91 patients (28). Close to 60% were diagnosed with a psychotic disorder. The kappa between the expert system and the psychiatrists was .48 for specific diagnoses of psychotic disorders, mostly schizophrenia. The third study was conducted in the general population with 150 subjects. The diagnoses obtained by two lay interviewers using Sleep-EVAL were compared against those obtained by two clinician psychologists. A kappa of .85 was obtained in the recognition of sleep problems and of .70 for insomnia disorders.

- Sleep-EVAL was also used by 127 general practitioners (mean of 93 interviews for each general practitioners), of which 113 completed a questionnaire regarding the use of Sleep-EVAL. Only one third of these physicians (32.7%) were computer-literate yet the vast majority reported that Sleep-EVAL was easy to use (89.4%) and found that its reasoning was both easy to trace (92.9%) and coherent (95.6%). The main criticisms concerned the length of the interview and the prohibited access to the system's diagnoses. For all studies performed with Sleep-EVAL, access to diagnoses and decisional trees was disabled for confidentiality reasons.


Expert systems should be differentiated from computerized assessment tools such as Sleep Expert (29), a computer medical decision support system, and the Computerized Diagnostic Interview Schedule or C-DIS (30).

Both share many characteristics but possess different architectures.
Some of the advantages of using computerized tools include:
- absence of omitted answers,
- uniformity of administration,
- suppression of data coding and transcription errors, and
- coverage of diagnoses that are less often considered because of their rarity.

Computerized tools possess limited reasoning capacities and operate only according to diagnostic algorithms or determined decisional trees implemented in the software.

Expert systems, on the other hand, have all the features of the computerized tool and in addition, operate a reasoning process throughout the interview, building decisional trees during the consultation.
Yet, the expert system is not without limitations.
Both computerized assessment tools and expert systems are unable to interpret nonverbal signs.
They are often faulted for their rigidity in wording.
They are also generally unable to organize temporal relationships between different symptoms, which is often of importance in the differential diagnosis process unless , like Sleep-Eval, they incorporate a mathematical preprocessor that permits comparison of diverse temporal information. This feature allows adequate diagnostic decisions incorporating this type of information and reasoning.


The size of each sample is determined according to the following criteria: an expected prevalence of insomnia disorders evaluated at 10% by Ford & Kamerow (8) and a 95% confidence interval with a precision of 1%.
These criteria require a sample of at least 3457 individuals. Because the population N is very large compared with the size of the sample n, the following formula will be used:
n = (z2 / k2) / (q / p)
where p = expected proportion (10%), q = 1 - p, z = the probability level (1.96 for an alpha of 5%), k = the expected precision (10%).
Consequently, each target country should include 3500 subjects or over. The pooling of all the samples should allow us to reliably document rare disorders with a good precision.
A posteriori stratification is undertaken to correct for the discrepancies between the samples and the official census data. This weighting procedure adjusts for sample design according to geographical distribution, age and gender.


The methodology employed shares common features with other epidemiological surveys, but is different in the way of how data are collected: by telephone interviews and with the help of Sleep-EVAL, an expert system.

These studies permit a series of phenomena to be explored which are not well documented in the general population until now (31,32).

Also, we examined how insomnia and excessive daytime sleepiness complaints are related to psychiatric and sleep diagnoses in the general population (33).
Such epidemiological surveys are useful in the way that they allow verification of how results observed in sleep disorders clinics can be extended to the general population and where differences lie.
In addition broad surveys such as ours permit inter and intra country comparisons as the same criteria and definitions are applied.
Furthermore, this permit the analysis of rare disorders with a greater precision. Indeed, a reliable estimation of a phenomenon that concerns about 1% of the population needs a sample of about 38,000 subjects (precision of 10% and probability level of 0.95).

Future epidemiological surveys in the sleep medicine field should extend their analyses to the different subtypes of insomnia or daytime sleepiness, for example, little is known about transient and seasonal patterns of insomnia or daytime sleepiness in the general population.


  1. Bixler EO, Kales A, Soldatos CR, Kales JD, Healey S. Prevalence of sleep disorders in the Los Angeles metropolitan area. Am J Psychiatry, 1979; 136:1257-1262.
  2. Karacan I, Thornby JI, William R. Sleep disturbance: a community survey. In: Guilleminault C, Lugaresi E, Eds. Sleep/Wake disorders: Natural History, Epidemiology, and Long-Term Evolution. New-York, NY: Raven Press, 1983:37-60.
  3. Welstein L, Dement WC, Redington D, Guilleminault C. Insomnia in the San Fransisco Bay area: a telephone survey. In: Guilleminault C, Lugaresi E, Eds. Sleep/Wake disorders: Natural History, Epidemiology, and Long-Term Evolution. New-York, NY: Raven Press, 1983: 29-35.
  4. Mellinger GD, Balter MB, Uhlenhuth EH. Insomnia and its treatment: Prevalence and correlates. Arch Gen Psychiatry 1985; 42: 225-232.
  5. Gislason T, Almqvist M. Somatic diseases and sleep complaints: an epidemiological study of 3201 Swedish men. Acta Med Scand 1987; 221:475-481.
  6. Klink M, Quan SF. Prevalence of reported sleep disturbances in a general adult population and their relationship to obstructive airways diseases. Chest, 1987; 91:540-546.
  7. Liljenberg B, Almqvist M, Hetta J, Roos BE, Agren H. The prevalence of insomnia: the importance of operationally defined criteria. Ann Clin Res 1988; 20:393-398.
  8. Ford DE, Kamerow DB. Epidemiologic study of sleep disturbances and psychiatric disorders. An opportunity for prevention? JAMA, 1989; 262:1479-1484.
  9. Husby R, Lingjaerde O. Prevalence of reported sleeplessness in northern Norway in relation to sex, age and season. Acta Psychiatr Scand 1990; 542-547
  10. Quera-Salva MA, Orluc A, Goldenberg F, Guilleminault C. Insomnia and use of hypnotics: Study of a French population. Sleep. 1991; 14: 386-391.
  11. Henderson S, Jorm AF, Scott LR, Mackinnon AJ, Christensen H, Korten AE. Insomnia in the elderly: its prevalence and correlates in the general population. Med J Aust 1995; 162:22-24.
  12. Foley DJ, Monjan AA, Brown SL, Simonsick EM, Wallace RB, Blazer DG. Sleep complaints among elderly persons: an epidemiologic study of three communites. Sleep 1995; 18:425-432.
  13. Janson C, Gislason T, De Backer W, et al. Prevalence of sleep disturbances among young adults in three European countries. Sleep 1995; 18:589-597.
  14. APA (American Psychiatric Association). Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV). Washington: The American Psychiatric Association, 1994.
  15. Diagnostic Classification Steering Committee, Thorpy MJ, Chairman. International Classification of Sleep Disorders: Diagnostic and Coding Manual (ICSD). Rochester, Minnesota: American Sleep Disorders Association, 1990.
  16. Groves RM, Kahn RL. Surveys by telephone: A national comparison with personal interviews. New-York: Academic Press, 1979.
  17. Aquilino WS. Telephone versus face-to-face interviewing for household drug use surveys. Int J Addict. 1992; 27: 71-91.
  18. Kish L. Survey sampling. New York: John Wiley & sons, 1965.
  19. Maklan D. & Waksberg J. Within household coverage in RDD surveys In R.M. Groves et al (Eds) Telephone survey methodology (pp. 51-72). New-York: John Wiley, 1989.
  20. Sudman S. The uses of telephone directories for survey sampling. J Market Res 1973; 10: 204-207.
  21. Landon EL, Banks SK. Relative efficiency and bias of plus-one telephone sampling. J Market Res 1977; 14:294-299.
  22. Lavrakas PJ. Telephone survey methods: sampling, selection and supervision, Newbury Park, Sage Publication, 1993.
  23. Ohayon M. Système à base de connaissances EVAL: arbres décisionnels et questionnaire. Quebec: Bibliothèque Nationale du Québec, 1995.
  24. Ohayon M. Use of an expert system (EVAL) in mental health epidemiological surveys. In Barahona P, Veloso M, Bryant J. (Eds) Proceedings of the Twelfth International Congress on Medical Informatics. Lisbon: MIE, 1994: 174-179.
  25. Ohayon M. Saisie logique de données en Psychiatrie. Ann Med Psychol 1985; 143:577-585.
  26. Ohayon M. Validation of a knowledge based system (ADINFER) versus human experts. In Barahona P, Veloso M, Bryant J. (Eds) Proceedings of the Twelfth International Congress on Medical Informatics (pp 90-95). Lisbon: MIE, 1994.
  27. Ohayon M. Validation of expert systems: Examples and considerations. Medinfo 1995; 8: 1071-1075.
  28. St-Onge B, Ohayon M. L'utilisation du système Expertal dans un milieu de psychiatrie légale. Abrégés du Congrès de Psychiatrie et de Neurologie de Langue Française 1994, 112.
  29. Korpinen L, Frey H. Sleep Expert: an intelligent medical decision support system for sleep disorders. Med Inform 1993; 18: 163-170.
  30. Blouin AG, Perez EL, Blouin JH. Computerized administration of the Diagnostic Interview Schedule. Psychiatr Res 1988; 23:335-344.
  31. Ohayon MM, Priest RG, Caulet M, Guilleminault C. Hypnagogic and hypnopompic hallucinations: pathological phenomena? Brit J Psychiatry 1996; 169: 459-467.
  32. Ohayon MM, Caulet M, Priest RG. Violent Behaviour During Sleep. J Clin Psychiatry 58: In press.
  33. Ohayon MM, Caulet M, Philip P, Guilleminault C, Priest RG. How Sleep and Mental Disorders are Related to Daytime Sleepiness Complaints. Arch Int Med 1997 - In press.
For figures and tables and full extent of the article, please consult:
Ohayon MM, Guilleminault C, Paiva T, Priest RG, Rapoport DM, Sagales T, Smirne S, Zulley J. An international study on sleep disorders in the general population: methodological aspects of the use of the Sleep-EVAL system. Sleep 1997;20:1086-92.