Media capture and constraints

The media part of WebRTC covers how to access hardware capable of capturing video and audio, such as cameras and microphones, as well as how media streams work. It also covers display media, which is how an application can do screen capturing.

Media devices

All cameras and microphones that are supported by the browser are accessed and managed through the navigator.mediaDevices object. Applications can retrieve the current list of connected devices and also listen for changes, since many cameras and microhpones connect through USB and can be connected and disconnected during the lifecycle of the application. Since the state of a media device can change at any time, it is recommended that applications register for device changes in order to properly handle changes.

Constraints

When accessing media devices, it is a good practice to provide as detailed constraints as possible. While it is possible to open the default camera and microphone with a simple constraint, it might deliver a media stream that is far from the most optimal for the application.

The specific constraints are defined in a MediaTrackConstraint object, one for audio and one for video. The attributes in this object are of type ConstraintLong, ConstraintBoolean, ConstraintDouble or ConstraintDOMString. These can either be a specific value (e.g., a number, boolean or string), a range (LongRange or DoubleRange with a minimum and maximum value) or an object with either an ideal or exact definition. For a specific value, the browser will attempt to pick something as close as possible. For a range, the best value in that range will be used. When exact is specified, only media streams that exactly match that constraint will be returned.

Near

// Camera with a resolution as close to 640x480 as possible
{
    "video": {
        "width": 640,
        "height": 480
    }
}

Range

// Camera with a resolution in the range 640x480 to 1024x768
{
    "video": {
        "width": {
            "min": 640,
            "max": 1024
        },
        "height": {
            "min": 480,
            "max": 768
        }
    }
}

Exact

// Camera with the exact resolution of 1024x768
{
    "video": {
        "width": {
            "exact": 1024
        },
        "height": {
            "exact": 768
        }
    }
}

To determine the actual configuration a certain track of a media stream has, we can call MediaStreamTrack.getSettings() which returns the MediaTrackSettings currently applied.

It is also possible to update the constraints of a track from a media device we have opened, by calling applyConstraints() on the track. This lets an application re-configure a media device without first having to close the existing stream.

Display media

An application that wants to be able to perform screen capturing and recording must use the Display Media API. The function getDisplayMedia() (which is part of navigator.mediaDevices is similar to getUserMedia() and is used for the purpose of opening the content of the display (or a portion of it, such as a window). The returned MediaStream works the same as when using getUserMedia().

The constraints for getDisplayMedia() differ from the ones used for regular video or audio input.

{
    video: {
        cursor: 'always' | 'motion' | 'never',
        displaySurface: 'application' | 'browser' | 'monitor' | 'window'
    }
}

The code snipet above shows how the special constraints for screen recording works. Note that these might not be supported by all browsers that have display media support.

Streams and tracks

A MediaStream represents a stream of media content, which consists of tracks (MediaStreamTrack) of audio and video. You can retrieve all the tracks from MediaStream by calling MediaStream.getTracks(), which returns an array of MediaStreamTrack objects.

MediaStreamTrack

A MediaStreamTrack has a kind property that is either audio or video, indicating the kind of media it represents. Each track can be muted by toggling its enabled property. A track has a Boolean property remote that indicates if it is sourced by a RTCPeerConnection and coming from a remote peer.