# Frequently Asked Questions

#### ETE modeling team (Laboratoire MIVEGEC, CNRS, IRD, Université de Montpellier)

#### 28-07-2020

**What do the graphs represent?**

The first two graphs show the number of temporal reproduction (denoted \(\mathcal{R}(t)\)), i.e., at a given time \(t\), the average number of people an infected person infects over the course of his or her infection. It is therefore an estimate of the **potential spread** of the epidemic. The shaded areas indicate the confidence interval and the line is the median.

The third graph represents the incidence data, i.e. here the number of new cases detected each day. Depending on the choice in the menu, this incidence can refer to screenings, hospitalizations, ICU admissions, or deaths.

The fourth graph allows you to zoom in on a particular time period by dragging the small sliders (move the left one to the right to see the recent values better).

**If \(\mathcal{R}(t)\) is smaller than one, then everything’s okay?**

Not necessarily: \(\mathcal{R}(t)\) represents the trend of the epidemic but it does not reflect its current state. For instance, an epidemic can be declining but there can hundreds of thousands of infected people and overwhelmed ICU services.

**Why two different charts for \(\mathcal{R}(t)\)?**

These two graphs are based on two slightly different methods, which are described in the Model tab.

The *EpiEstim* method is less sensitive to time variations (it calculates a 3-day average) and uses a more detailed model to forecast recent values.

The *R0* method is more sensitive to changes in the incidence curve and the most recent value can have a significant effect on the curve.

**Is one of the two methods to be preferred?**

Yes, but it depends on the piece of information you are interested in.

The *R0* method is to be preferred if you are moving away from the present — and therefore if you are looking for an estimate for a past date.

Conversely, for a recent estimate of \(\mathcal{R}(t)\) we will rather turn to the method *EpiEstim*.

**Which serial interval to choose?**

The serial interval is used to estimate the average number of days between the date of symptoms onset in an “infecter” and the date of symptoms onset in an “infectee”. This data is needed to measure the reproduction number. Unfortunately, **the serial interval is still largely unknown** for epidemics in France and Europe.

We therefore offer the user to choose between several serial interval distributions measured on COVID-19 epidemics in Asia or estimated in the light of previous epidemics.

**Why do the incidence data differ slightly from those found on other websites?**

In order to overcome the weekend reporting period for new cases, the data were smoothed using a 7-day moving average.

Each point is therefore the mean of the last 7 days.

For purists, we used the function \(\texttt{zoo::rollmean(x, k = 7, align = "right", fill = NA)}\).

Note that this smoothing produces decimal values in the incidence series.

**Why aren’t there \(\mathcal{R}(t)\) estimates for more recent dates?**

There is always a **lag** between the state of the epidemic and what can be estimated from the data. This gap is due to the fact that the events observed (screening, hospitalisation, entry into intensive care units, death) only occurs a certain number of days after *individuals have been infected* — the event most likely to characterise the epidemic in real time, but in practice never known.

These time lags vary from one individual to another, but for the sake of simplicity we have set them at times that we consider to be representative based on a model from our team (Sofonea *et al.*, *medRxiv*):

- 10 days lag for newly detected cases;
- 12 days lag for hospitalizations;
- 14 days lag for the ICU admissions;
- 28 days lag for deaths.

**What is the most reliable type of data?**

Incidence data on PCR detection are generally the least reliable because they are very sensitive to variations in detection policies. For example, in France sampling was limited at the beginning of the epidemic but now lots of tests are being performed. This increase in screening mechanically translates into an increase in the number of cases detected, and therefore an increase in the reproduction number, even though the epidemic could be strongly decreasing.

The most reliable data for measuring \(\mathcal{R}(t)\) is therefore the one where the screening policy has varied the least over time. In France, this would therefore correspond to data on new ICU admissions. However this data is not always available.

**What is the difference between departments and countries?**

National data aggregates departmental data. This addition makes them less sensitive to stochastic fluctuations. The estimates are therefore more robust with narrower confidence intervals.

On the other hand, if only a few departments account for the majority of cases, then the national data may poorly reflects the situation of the epidemic in the least affected departments.

**Why are not all countries/regions/departments in the proposed lists?**

In order to be processed by the packages used, the data needs to satisfy several conditions:

- the first data point is non-zero;
- there are no missing values;
- the number of consecutive observations must be greater than eight.