The general population represents. General and sample populations. Sampling method

Population (in English - population) - a set of all objects (units) regarding which a scientist intends to draw conclusions when studying a specific problem.

The population consists of all objects that are subject to study. The composition of the population depends on the objectives of the study. Sometimes the general population is the entire population of a certain region (for example, when studying the attitude of potential voters towards a candidate), most often several criteria are specified that determine the object of the study. For example, men 30-50 years old who use a certain brand of razor at least once a week and have an income of at least $100 per family member.

Sampleor sample population- a set of cases (subjects, objects, events, samples), using a certain procedure, selected from the general population to participate in the study.

Sample characteristics:

· Qualitative characteristics samples - who exactly do we select and what methods of sampling do we use for this.

· Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Necessity of sampling

· The object of study is very extensive. For example, consumers of a global company’s products are represented by a huge number of geographically dispersed markets.

· There is a need to collect primary information.

Sample size

Sample size- the number of cases included in the sample population. For statistical reasons, it is recommended that the number of cases be at least 30 to 35.

Dependent and independent samples

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent. Examples of dependent samples:

· pairs of twins,

· two measurements of any trait before and after experimental exposure,

· husbands and wives

· etc.

If there is no such relationship between samples, then these samples are considered independent, For example:

· men and women,

· psychologists and mathematicians.

Accordingly, dependent samples always have the same volume, while the volume of independent samples may differ.

Comparison of samples is made using various statistical criteria:

· Student's t-test

· Wilcoxon test

· Mann-Whitney U test

· Sign criterion

· etc.

Representativeness

The sample may be considered representative or non-representative.

Example of a non-representative sample

In the United States, one of the most famous historical examples of non-representative sampling occurs during the 1936 presidential election. The Literary Digest, which had successfully predicted the events of several previous elections, was wrong in its predictions by sending out ten million test ballots to its subscribers, as well as to people selected from the telephone books of the whole country and people from car registration lists. In 25% of returned ballots (almost 2.5 million), the votes were distributed as follows:

· 57% preferred Republican candidate Alf Landon

· 40% chose then-Democratic President Franklin Roosevelt

In the actual elections, as is known, Roosevelt won, gaining more than 60% of the votes. The Literary Digest's mistake was this: wanting to increase the representativeness of the sample - since they knew that most of their subscribers considered themselves Republicans - they expanded the sample to include people selected from telephone books and registration lists. However, they did not take into account the realities of their time and, in fact, recruited even more Republicans: during the Great Depression, it was mainly representatives of the middle and upper classes who could afford to own telephones and cars (that is, most Republicans, not Democrats).

Types of plan for constructing groups from samples

There are several main types of group building plans:

1. A study with experimental and control groups, which are placed in different conditions.

2. Study with experimental and control groups using a pairwise selection strategy

3. A study using only one group - an experimental one.

4. A study using a mixed (factorial) design - all groups are placed in different conditions.

Sampling types

Samples are divided into two types:

· probabilistic

· non-probabilistic

Probability samples

1. Simple probability sampling:

oSimple resampling. The use of such a sample is based on the assumption that each respondent is equally likely to be included in the sample. Based on the list of the general population, cards with respondent numbers are compiled. They are placed in a deck, shuffled and a card is taken out at random, the number is written down, and then returned back. Next, the procedure is repeated as many times as the sample size we need. Disadvantage: repetition of selection units.

The procedure for constructing a simple random sample includes the following steps:

1. must be received full list members of the population and number this list. Such a list, recall, is called a sampling frame;

2. determine the expected sample size, that is, the expected number of respondents;

3. extract as many numbers from the random number table as we need sample units. If there should be 100 people in the sample, 100 random numbers are taken from the table. These random numbers can be generated by a computer program.

4. select from the base list those observations whose numbers correspond to the written random numbers

· Simple random sampling has obvious advantages. This method is extremely easy to understand. The results of the study can be generalized to the population being studied. Most approaches to statistical inference involve collecting information using a simple random sample. However, the simple random sampling method has at least four significant limitations:

1. It is often difficult to create a sampling frame that would allow simple random sampling.

2. Simple random sampling may result in a large population, or a population distributed over a large geographic area, which significantly increases the time and cost of data collection.

3. the results of using a simple random sample are often characterized by low accuracy and greater standard error than the results of applying other probabilistic methods.

4. As a result of using SRS, a non-representative sample may be formed. Although samples obtained by simple random sampling, on average, adequately represent the population, some of them are extremely misrepresentative of the population being studied. This is especially likely when the sample size is small.

· Simple non-repetitive sampling. The sampling procedure is the same, only the cards with respondent numbers are not returned to the deck.

1. Systematic probability sampling. It is a simplified version of simple probability sampling. Based on the list of the general population, respondents are selected at a certain interval (K). The value of K is determined randomly. The most reliable result is achieved with a homogeneous population, otherwise the step size and some internal cyclic patterns of the sample may coincide (sampling mixing). Disadvantages: the same as in a simple probability sample.

2. Serial (cluster) sampling. Selection units are statistical series (family, school, team, etc.). The selected elements are subject to a complete examination. The selection of statistical units can be organized as random or systematic sampling. Disadvantage: Possibility of greater homogeneity than in the general population.

3. Regional sampling. In the case of a heterogeneous population, before using probability sampling with any selection technique, it is recommended to divide the population into homogeneous parts, such a sample is called district sampling. Zoning groups can include both natural formations (for example, city districts) and any feature that forms the basis of the study. The characteristic on the basis of which the division is carried out is called the characteristic of stratification and zoning.

4. "Convenience" sample. The “convenience” sampling procedure consists of establishing contacts with “convenient” sampling units - a group of students, a sports team, friends and neighbors. If you want to get information about people's reactions to a new concept, this type of sampling is quite reasonable. Convenience sampling is often used to pretest questionnaires.

Non-probability samples

Selection in such a sample is carried out not according to the principles of randomness, but according to subjective criteria - availability, typicality, equal representation, etc.

1. Quota sampling - the sample is constructed as a model that reproduces the structure of the general population in the form of quotas (proportions) of the characteristics being studied. The number of sample elements with different combinations of studied characteristics is determined so that it corresponds to their share (proportion) in the general population. So, for example, if our general population consists of 5,000 people, of which 2,000 are women and 3,000 are men, then in the quota sample we will have 20 women and 30 men, or 200 women and 300 men. Quota samples are most often based on demographic criteria: gender, age, region, income, education, and others. Disadvantages: usually such samples are not representative, because it is impossible to take into account several social parameters at once. Pros: readily available material.

2. Snowball method. The sample is constructed as follows. Each respondent, starting with the first, is asked for contacts of his friends, colleagues, acquaintances who would fit the selection conditions and could take part in the study. Thus, with the exception of the first step, the sample is formed with the participation of the research objects themselves. The method is often used when it is necessary to find and interview hard-to-reach groups of respondents (for example, respondents with high incomes, respondents belonging to the same professional group, respondents who have any similar hobbies/interests, etc.)

3. Spontaneous sampling – sampling of the so-called “first person you come across”. Often used in television and radio polls. The size and composition of spontaneous samples is not known in advance, and is determined only by one parameter - the activity of respondents. Disadvantages: it is impossible to establish which population the respondents represent, and as a result, it is impossible to determine representativeness.

4. Route survey – often used when the unit of study is the family. On the map of the locality in which the survey will be carried out, all streets are numbered. Using a table (generator) of random numbers, large numbers are selected. Each large number is considered as consisting of 3 components: street number (2-3 first numbers), house number, apartment number. For example, the number 14832: 14 is the street number on the map, 8 is the house number, 32 is the apartment number.

5. Regional sampling with selection of typical objects. If, after zoning, a typical object is selected from each group, i.e. an object that is close to the average in terms of most of the characteristics studied in the study, such a sample is called regionalized with the selection of typical objects.

Group Building Strategies

Selection of groups for their participation in psychological experiment carried out through various strategies that are needed to ensure that internal and external validity are maintained to the greatest possible extent.

· Randomization (random selection)

· Pairwise selection

· Stratometric selection

· Approximate Modeling

· Attracting real groups

Randomization, or random selection, is used to create simple random samples. The use of such a sample is based on the assumption that each member of the population is equally likely to be included in the sample. For example, to make a random sample of 100 university students, you can put pieces of paper with the names of all university students in a hat, and then take 100 pieces of paper out of it - this will be a random selection (Goodwin J., p. 147).

Pairwise selection- a strategy for constructing sampling groups, in which groups of subjects are made up of subjects who are equivalent in terms of secondary parameters that are significant for the experiment. This strategy is effective for experiments using experimental and control groups with the best option- attracting twin pairs (mono- and dizygotic), as it allows you to create...

Stratometric selection - randomization with the allocation of strata (or clusters). At this method sampling, the general population is divided into groups (strata) that have certain characteristics(gender, age, political preferences, education, income level, etc.), and subjects with appropriate characteristics are selected.

Approximate Modeling - drawing limited samples and generalizing conclusions about this sample to the wider population. For example, with the participation of 2nd year university students in the study, the data of this study applies to “people aged 17 to 21 years”. The admissibility of such generalizations is extremely limited.

Approximate modeling is the formation of a model that, for a clearly defined class of systems (processes), describes its behavior (or desired phenomena) with acceptable accuracy.

The concept of representativeness. Conceptual object and population. Designed object. Projected and real population.

We know that sociological science deals not with the fluid immediacy of life, but with data organized according to certain rules in the space of features. By data we mean the values ​​of variables assigned to units of study - objects. These objects - communities, institutions, people, texts, things - form diverse and often bizarre configurations in the space of attributes, giving the researcher the opportunity to make generalizing judgments about reality.

As soon as it comes to reality, it turns out that the data obtained relate, strictly speaking, only to registration documents(questionnaires, interview forms, observation protocols, etc.). There are no guarantees that the reality outside the laboratory windows (say, on the other side of the scales) will not be different. We have not yet reached the sampling procedure, but the question of the representativeness of the data already arises: is it possible to extend the information obtained during the survey to objects located outside of our specific experience? The answer is clear: you can. Otherwise, our observations would not go beyond the here-now-totality. They would not apply to Muscovites, but to those who had just been interviewed by telephone in Moscow; not to the readers of the Nedelya newspaper, but to those who sent a completed tear-off coupon to the editor by mail. After completing the survey, we are obliged to assume that both “Muscovites” and “readers” have remained the same. We believe in the stability of the world because scientific observations reveal amazing constancy.

Any single observation extends to a wider field of observation, and the problem of representation is to establish the degree of correspondence between the parameters of the surveyed population and the “real” characteristics of the object. The sampling procedure is intended precisely to reconstruct the real object of study and the general population from individual momentary observations.

The concept of sample representativeness is close to the concept of external validity; only in the first case is there an extrapolation of the same characteristic to a wider set of units, and in the second - a transition from one semantic context to another. The sampling procedure is carried out by each person a thousand times a day, and no one really thinks about the representativeness of the observations. Experience replaces calculation. To find out if the porridge is well salted, it is not at all necessary to eat the entire pan - methods here are more effective non-destructive testing, including spot checking: you need to try one spoonful. At the same time, you need to be sure that the porridge is well mixed. If the porridge is poorly mixed, it makes sense to take not one measurement, but a series, that is, try in different places in the pan - this is already a sample. It is more difficult to make sure that the student's answer on the exam represents his knowledge and is not a random success or failure. To do this, several questions are asked. It is assumed that if a student answered all possible questions on a subject, the result would be “true,” i.e., reflect real knowledge. But then no one would be able to pass the exam.



The basis of the sampling procedure is always “if” - the assumption that extrapolation of observations will not significantly change the result obtained. Therefore, the population can be defined as the “objective possibility” of the sample population.

The problem becomes somewhat more complicated if we understand what is meant by the object of study. Having studied a fairly large population of people, the sociologist comes to the conclusion that the variable “radicalism-conservatism” is positively correlated with age: in particular, older generations are more conservative than revolutionary. But the surveyed object - the sample population - does not exist in reality as such. It is constructed by the procedure for selecting respondents and conducting interviews, and then immediately disappears, dissolves in the array. Indeed, the sample population from which data is directly “removed” is generated by the procedure, but at the same time it is dissolved in a larger population, which it represents or represents with varying degrees of accuracy and reliability. Sociological conclusions do not apply to the respondents surveyed last week, but to idealized objects: “older generations,” “youth,” those who exhibit “radicalism” or “conservatism.” We are talking about categorical generalizations that are not limited by spatiotemporal circumstances. In this regard, the selective procedure helps to free oneself from observations and move into the world of ideas.

Thus, we have the opportunity to distinguish between the object of research and the general population: an object is not just a collection of units, but a concept in accordance with which the identification and selection of research units is carried out. In this regard, Hegel’s injunction to consider true only that being that corresponds to its concept is correct. Theoretically, the volume of the concept denoting the object of study should correspond to the volume of the general population. However, such correspondence is achieved extremely rarely.

We will need a concept conceptual object - ideal construct denoting the framework of the topic. “Russians”, “audience of central newspapers”, “electorate”, “democratic public” - these are the typical objects of research interest of sociologists. Undoubtedly, a completely real general population must correspond to the conceptual object. To do this, it is necessary to provide another object of study - designed object. The designed object is a set of units available to the researcher. The challenge is to identify groups that are inaccessible or difficult to access for data collection.

It is obvious that it is almost impossible to examine an object designated as “Russians”. Among Russians, many people are in prisons, correctional labor institutions, pre-trial detention centers and other places that are difficult for the interviewer to reach. This group will have to be “subtracted” from the designed object. Many patients will have to be “subtracted” psychiatric hospitals, children, some of the elderly. It is unlikely that a civilian sociologist will be able to provide normal chances for military personnel to be included in the sample. Similar problems accompany surveys of readers, voters, residents of small towns, and theater visitors.

The listed difficulties are only a small part of those often insurmountable obstacles that a sociologist faces at the field stage of research. The specialist must anticipate these difficulties and not create illusions about the complete implementation of the designed object. Otherwise, he will be disappointed.

So, the object of study does not coincide with the general population in much the same way as a map of an area does not coincide with the area itself.

We thought and wondered for a long time, The generals wrote everything on a large sheet of paper. It was smooth on paper, but they forgot about the ravines, And walking along them -

These words from an old soldier's song are quite applicable to sample design, given that you will have to walk from apartment to apartment.

Of course, the population is the population from which the units are sampled. However, it only seems so. The sample is drawn from the population from which the actual selection of respondents is made. Let's call her real. The differences between the projected and real populations can be seen firsthand by comparing the lists of “projected” respondents and those actually interviewed.

The real object is the totality that was formed at the stage of field research, taking into account limitations in the availability of primary sociological information. In addition to prisoners, military personnel and the sick, residents of villages remote from transport communications are less likely to be included in the sample, especially if the survey is carried out in the fall; those who, as a rule, are not at home, are not inclined to talk with strangers, etc. It happens that interviewers, taking advantage of the lack of control, neglect to accurately fulfill their duties and interview not those who are supposed to be interviewed according to instructions, but those who is easier to “get.” For example, interviewers were ordered to visit respondents’ apartments in the evenings, when it is easier to find them at home. If the study is carried out, say, in November, then already at five o’clock in the evening at middle lane It's completely dark outside in Russia. In many cities, signs with street names and house numbers are not often found. If the duties of interviewers are performed by students of a local pedagogical institute, one can imagine the degree of deviation of the real object from the designed one. Sometimes researchers do it even simpler: they fill out the questionnaires themselves. These difficulties are one source of so-called sampling bias.

There are enough effective ways control over the completion of questionnaires and sampling repair techniques, in particular “weighing” the main typological groups of respondents: the groups of those who are missing increase, and the excess groups decrease. This way the real array is adjusted to the designed one and this is quite justified.

The distribution of a random variable contains all the information about its statistical properties. How many values ​​of a random variable do you need to know to construct its distribution? To do this you need to explore it general population.

The population is the set of all values ​​that a given random variable can take.

The number of units in a population is called its volume N. This value can be finite or infinite. For example, if the growth of the inhabitants of a certain city is studied, then the size of the population will be equal to the number of inhabitants of the city. If any physical experiment is performed, then the volume of the general population will be infinite, because number of all possible values of any physical parameter is equal to infinity.

Studying a general population is not always possible or advisable. It is impossible if the volume of the population is infinite. But even with finite volumes, a complete study is not always justified, since it requires a lot of time and labor, and absolute accuracy of the results is usually not required. Less accurate results, but with significantly less effort and money, can be obtained by studying only part of the general population. Such studies are called sampling.

Statistical studies conducted only on part of the population are called sampling, and the part of the population being studied is called a sample.

Figure 7.2 symbolically shows the population and sample as a set and its subset.

Figure 7.2 Population and sample

Working with a certain subset of a given population, often constituting an insignificant part of it, we obtain results that are quite satisfactory in accuracy for practical purposes. Studying a larger portion of the population only increases accuracy, but does not change the essence of the results if the sample is taken correctly from a statistical point of view.

In order for the sample to reflect the properties of the population and the results to be reliable, it must be representative(representative).

For some general populations, any part of them is representative due to their nature. However, in most cases special measures must be taken to ensure representative samples.

One One of the main achievements of modern mathematical statistics is the development of the theory and practice of the random sampling method, ensuring the representativeness of data selection.

Sample studies are always inferior in accuracy to studies of the entire population. However, this can be reconciled if the magnitude of the error is known. Obviously, the closer the sample size is to the population size, the smaller the error will be. It is clear from this that problems of statistical inference become especially relevant when working with small samples ( N ? 10-50).

Population– a set of elements that satisfy certain specified conditions; also called the study population. General population (Universe) - the entire set of objects (subjects) of research, from which objects (subjects) are selected (can be selected) for a survey (survey).

SAMPLE or sample population(Sample) is a set of objects (subjects) selected in a special way for examination (survey). Any data obtained on the basis of a sample survey (survey) is probabilistic in nature. In practice, this means that during the study, it is not a specific value that is determined, but the interval in which the determined value is located.

Sample characteristics:

Qualitative characteristics of the sample - what exactly we choose and what methods of sampling we use for this.

Quantitative characteristics of the sample - how many cases we select, in other words, sample size.

Sampling Need:

The object of study is very extensive. For example, consumers of a global company’s products are represented by a huge number of geographically dispersed markets.

There is a need to collect primary information.

Sample size- the number of cases included in the sample population.

Dependent and independent samples.

When comparing two (or more) samples, an important parameter is their dependence. If a homomorphic pair can be established (that is, when one case from sample X corresponds to one and only one case from sample Y and vice versa) for each case in two samples (and this basis of relationship is important for the trait being measured in the samples), such samples are called dependent.

If there is no such relationship between samples, then these samples are considered independent.

Types of sampling.

Samples are divided into two types:

Probabilistic;

Not probabilistic;

Representative sample- a sample population in which the main characteristics coincide with the characteristics of the general population. Only for this type of sample can the results of a survey of some units (objects) be extended to the entire population. Prerequisite to build a representative sample - the availability of information about the general population, i.e. either a complete list of units (subjects) of the general population, or information about the structure according to characteristics that significantly influence the relationship to the subject of research.

17. Discrete variation series, ranking, frequency, particularity.

Variation series(statistical series) – is a sequence of options written in ascending order and their corresponding weights.

The variation series can be discrete(sampling of values ​​of a discrete random variable) and continuous (interval) (sampling of values ​​of a continuous random variable).

The discrete variation series has the form:

The observed values ​​of the random variable x1, x2, ..., xk are called options, and changing these values ​​is called by variation.

Sample(sample) – a set of observations selected randomly from the population.

The number of observations in a population is called its volume.

N– volume of the general population.

n– sample size (sum of all frequencies of the series).

Frequency options xi is called the number ni (i=1,...,k), showing how many times this option occurs in the sample.

Frequency(relative frequency, share) of variants xi (i=1,…,k) is the ratio of its frequency ni to the sample size n.
w i=n i/n

Ranking of experimental data- an operation consisting in the fact that the results of observations on a random variable, i.e., the observed values ​​of a random variable, are arranged in non-decreasing order.

Discrete variation series distribution is a ranked set of options xi with their corresponding frequencies or particulars.

Totality homogeneous objects are often examined in relation to some characteristic that characterizes them, measured quantitatively or qualitatively.

For example, if there is a batch of parts, then the quantitative characteristic may be the size of the part according to GOST, and the qualitative characteristic may be the standard of the part.

If it is necessary to check them for compliance with standards, they sometimes resort to a complete examination, but in practice this is used extremely rarely. For example, if the general population contains a huge number of studied objects, then it is almost impossible to conduct a continuous survey. In this case, a certain number of objects (elements) are selected from the entire population and examined. Thus, there is a general population and a sample population.

General is the totality of all objects that are subject to inspection or study. The general population, as a rule, contains a finite number of elements, but if it is too large, then in order to simplify mathematical calculations it is assumed that the entire population consists of an infinite number of objects.

A sample or sampling frame is a portion of the selected elements from the entire population. The sample can be repeated or non-repetitive. In the first case, it is returned to the general population, in the second - not. In practice, non-repetitive random selection is more often used.

The population and the sample must be related to each other by representativeness. In other words, in order to confidently determine the characteristics of the entire population based on the characteristics of the sample population, it is necessary that the sample elements represent them as accurately as possible. In other words, the sample must be representative (representative).

A sample will be more or less representative if it is drawn at random from a very large number the whole set. This can be stated on the basis of the so-called law of large numbers. In this case, all elements have an equal probability of being included in the sample.

Available various options selection. All these methods can basically be divided into two options:

  • Option 1. Elements are selected when the population is not divided into parts. This option includes simple random repeated and non-repetitive selections.
  • Option 2. The general population is divided into parts and elements are selected. These include typical, mechanical and serial sampling.

Simple random - selection in which elements are selected one at a time from the entire population at random.

Typical is a selection in which elements are selected not from the entire population, but from all its “typical” parts.

Mechanical selection is when the entire population is divided into the number of groups equal to the number elements that should be in the sample, and, accordingly, one element is selected from each group. For example, if you need to select 25% of the parts produced by a machine, then every fourth part is selected, and if you need to select 4% of the parts, then every twenty-fifth part is selected, and so on. It must be said that sometimes mechanical selection may not provide sufficient

Serial is a selection in which elements are selected from the entire population in “series”, subjected to continuous research, and not one at a time. For example, when parts are manufactured a large number automatic machines, then a comprehensive survey is carried out only in relation to the products of several machines. Serial selection is used if the trait under study has insignificant variability in different series.

In order to reduce the error, estimates of the general population are used using a sample. Moreover, sampling control can be either single-stage or multi-stage, which increases the reliability of the survey.



 
Articles By topic:
Treatment of stalking mania: symptoms and signs Can stalking mania go away over time?
Persecutory mania is a mental dysfunction that can also be called persecutory delusion. Psychiatrists consider this disorder to be the fundamental signs of mental insanity. By mania, psychiatry understands a disorder of mental activity,
Why did you dream about champagne?
Whatever we see in our dreams, everything, without exception, is symbols. All objects and phenomena in dreams carry symbolic meanings - from simple and familiar to bright and fantastic. But sometimes just ordinary, familiar things have a more important meaning than
How to remove chin irritation in women and men Skin irritation on the chin
Red spots that appear on the chin can occur for various reasons. As a rule, their appearance does not indicate a serious health threat, and if they disappear on their own over time, then there is no cause for concern. Red spots on the chin appear
Valentina Matvienko: biography, personal life, husband, children (photo)
Term of office *: September 2024 Born in April 1949. In 1972 she graduated from the Leningrad Chemical and Pharmaceutical Institute. From 1984 to 1986 worked as first secretary of the Krasnogvardeisky district committee of the CPSU of Leningrad. In 1985