Recursive Neural Networks are a type of artificial neural network designed to process data with a hierarchical or tree-like structure. They work by applying the same set of weights recursively over structured inputs, such as sentences broken into phrases or sub-phrases. This allows the network to capture relationships and meanings within complex data structures, making…
Category: Deep Learning
Perceiver Architecture
Perceiver Architecture is a type of neural network model designed to handle many different types of data, such as images, audio, and text, without needing specialised components for each type. It uses attention mechanisms to process and combine information from various sources. This flexible design allows it to work on tasks that involve multiple data…
Masked Modelling
Masked modelling is a technique used in machine learning where parts of the input data are hidden or covered, and the model is trained to predict these missing parts. This approach helps the model to understand the relationships and patterns within the data by forcing it to learn from the context. It is commonly used…
Activation Functions
Activation functions are mathematical formulas used in neural networks to decide whether a neuron should be activated or not. They help the network learn complex patterns by introducing non-linearity, allowing it to solve more complicated problems than a simple linear system could handle. Without activation functions, neural networks would not be able to model tasks…
Language Modelling Heads
Language modelling heads are the final layers in neural network models designed for language tasks, such as text generation or prediction. They take the processed information from the main part of the model and turn it into a set of probabilities for each word in the vocabulary. This allows the model to choose the most…
Diffusion Models
Diffusion models are a type of machine learning technique used to create new data, such as images or sounds, by starting with random noise and gradually transforming it into a meaningful result. They work by simulating a process where data is slowly corrupted with noise and then learning to reverse this process to generate realistic…
Positional Encoding
Positional encoding is a technique used in machine learning models, especially transformers, to give information about the order of data, like words in a sentence. Since transformers process all words at once, they need a way to know which word comes first, second, and so on. Positional encoding adds special values to each input so…
Capsule Networks
Capsule Networks are a type of artificial neural network designed to better capture spatial relationships and hierarchies in data, such as images. Unlike traditional neural networks, capsules group neurons together to represent different properties of an object, like its position and orientation. This structure helps the network understand the whole object and its parts, making…
Residual Connections
Residual connections are a technique used in deep neural networks where the input to a layer is added to its output. This helps the network learn more effectively, especially as it becomes deeper. By allowing information to skip layers, residual connections make it easier for the network to avoid problems like vanishing gradients, which can…
Mixture of Experts
A Mixture of Experts is a machine learning model that combines several specialised smaller models, called experts, to solve complex problems. Each expert focuses on a specific part of the problem, and a gating system decides which experts to use for each input. This approach helps the overall system make better decisions by using the…